ELPA21 Introduces Autoscoring to Our Dynamic Screener Assessment

November 14, 2024

The ELPA21 Governing Board voted in May 2024 to adopt artificial intelligence (AI) scoring—also known as “autoscoring”—for the traditionally hand-scored Step Three speaking and writing items on the ELPA21 Dynamic Screener. We see this as a great benefit to the ELPA21 Assessment System and the screening of students in our partner states.

The assessment enhancement for autoscoring Step Three of the ELPA21 Dynamic Screener was developed in partnership with our test delivery provider, Cambium Assessment, and has been researched and deliberated by ELPA21 and Cambium since 2019. ELPA21 and Cambium staff carefully reviewed the research and study outcomes and presented the information to the ELPA21 Governing Board and the Technical Advisory Committee multiple times in recent years as AI has continued to develop in the educational space.

Studies have shown autoscoring is just as, if not more, accurate than traditional scoring and it helps prevent the possibility of human error or bias affecting results in the writing and speaking domains. To further ensure the accuracy of the autoscoring feature, the testing system will flag “low confidence” results, and scores will be verified with additional vendor-provided handscoring.

This was a procedural change for most participating states and resulted in updates to the ELPA21 Dynamic Screener documentation, such as the Test Administration Manual. Test administration remains the same, and, as before, the test will automatically end after Step Two for a majority of students, with results provided for those students within a few hours.

For students who continue on to Step Three, the speaking and writing constructed response questions will now be scored by the trained test engine instead of continuing to the handscoring center. Autoscoring will speed up the availability of screener scores, and the results for most students who continue to Step Three of the screener should become available within one day—if not hours—after the screener test is submitted.

How does it work?

Using operational data, Cambium, ELPA21’s testing vendor, trained the autoscoring engine to score speaking and writing constructed response items in a way that closely mirrors traditional human handscoring results. To achieve this, Cambium used three academic years of anonymous ELPA21 data from six participating states. Utilizing data from all grade bands, student demographics, and levels of proficiency, Cambium was able to train the engine to score like a human handscorer.

From there, Cambium tested the autoscoring feature by comparing the autoscored results to the human scored results. Overall, the autoscoring engine scored as accurately, if not more accurately, as the human scorers, indicating that the autoscoring feature is a reliable way to score Step Three speaking and writing constructed response items.

How reliable is the test engine?

Overall, the test engine autoscoring feature is able to mimic a human reader’s accuracy in scoring. However, to ensure quality, 20% of all low confidence results, meaning those responses that aren’t scoring as well as expected, will be sent to the handscoring vendor for additional review. Moving forward, Cambium plans to improve the test engine scoring using current test data and will continue to collaborate with the ELPA21’s Governing Board, Assessment Design Team, and Technical Advisory Committee (TAC) in the development of the test engine.

What’s changing?

Nothing in terms of test administration is changing. As before, the test will automatically end after Step Two for most students, and the results will be provided within a few hours. For students who move on to Step Three, the speaking and writing constructed response questions will now be scored by the trained test engine instead of the scoring center, and test results for these students will also be available within one day. The use of autoscoring means that test administrators will no longer have to wait up to seven days for Step Three test results.

Thank you!

ELPA21 would like to thank each and every person who has contributed to the development of this feature and anyone who has contacted us to with questions about its use in our assessments. We recognize that the use of large language models (sometimes known as artificial intelligence) can be a source of concern for some, but we want to stress that the use of autoscoring reduces the possibility of human error or bias affecting scoring results, which will lead to fairer outcomes for students. For more information about the use of autoscoring in your state, contact your state’s ELPA21 representative.