On the 8th and 9th of December, the New Direction 2019 conference was held at the PACIFICO conference center in Yokohama, Japan. The British Council, as the organizer of this event, also a considerable number of speakers from its Assessment Research Group (ARG), Global Assessments and the Assessment Solutions Team (EAAST) covering activities ranging from a plenary presentation, two workshops and several breakout sessions over the course of three days.
In this article, I will focus on the experiences of Johnathan Cruise and Trevor Breakspear presenting their joint session: “Investigating the effect of technological capability on operationalizing the speaking construct in auto-rated assessment”.
Johnathan’s View:
In the sprawling and plush environs of the Pacifico venue in the high-tech, yet serene, bay area of Yokohama, Japan, Trevor Breakspear and I presented our findings on how effective an AI engine was at rating candidates’ performances in a new speaking test when compared to human raters.
The talk was given in the roughly 40-seater “Technology and Consequences” breakaway room, which was crammed with a sympathetic group of well-wishers who were obviously intrigued with the potential of AI in assessment. Their welcome level of sympathy, however, did nothing to allay my own particular feeling of “imposter syndrome”, which I personally felt as I raced through a lot of statistics that showed how consistent two sets of raters were in marking a big set of samples to “feed the machine” (i.e. the AI engine). Luckily, the statistics and the method we used to do the reliability study was tried and tested, and pretty robust; so, thankfully, I managed to get through my part with no one looking outraged (apart from perhaps with the garbled method of delivery).
Trevor’s View:
Up against the potentially soporific 1pm breakout slot, it was refreshing to see standing room only as delegates filed in from lunch to listen to Johnathan Cruise and I share initial research on the training of an automatic rating engine for speaking. After I ploughed through an introduction to the context and features of the assessment solution, Johnathan moved onto the meat of the presentation, demonstrating how we were able to measure and thus ensure that the rating quality of a few select examiners was sufficient to start training an automated engine that would be subsequently grading literally thousands of learners. I then continued with a summary of the strong human-machine agreement findings of an initial internal validation, and finished the presentation highlighting the need to start training engines measuring specific features of speaking, thus pushing technology to provide more meaningful feedback to learners. High-quality questions from the floor allowed for a brief discussion on how new black box technologies were driving more ambitious auto-rating before the axe dropped and time was called. Often the best part about presenting is the informal conversations that ensue afterwards and ND 2019 was no exception. Several delegates approached expressing interest in the work whilst suggesting for perhaps a longer timeslot the next time!
Hopefully, New Directions 2020 in Singapore will see a similarly strong lineup of presenters!
by Johnathan Cruise, Trevor Breakspear and Jan Langeslag
|