Skip to content

Certification, scoring, and the AI powered future?

07 Apr 2025
Share:

In this interview, we ask Learnosity EVP and Questionmark Founder John Kleeman for his thoughts on the future of AI scoring in certification and other higher-stakes exams.

What is the most promising innovation that AI is likely to bring to the world of tests and exams for certification?

We are at a really exciting time in the world of assessment. For the most part, the way that we have assessed people hasn’t changed much in several decades. While we’ve moved exams from paper to computer, allowing for far more flexibility and efficiency, the questions themselves haven’t changed much from what our parents and grandparents might have answered.

This isn’t necessarily an inherently bad thing. Testing is a driver for educational achievement. It gives teachers and stakeholders in education and training the data that allows them to improve learning. And, with advancements in computerizing assessment methods, we also have made it more accessible and widespread than ever before. Further, it’s also paved the way for more interactive testing methods including video and observational assessments, to better engage learners and verify skills. 

Still, despite all the technological developments, the basic way we measure and assess people is largely unchanged. What has changed, however, is that we now have AI at our fingertips and inextricably involved in all aspects of our lives, including education. 

For example, there is widespread discussion about using AI to generate questions. From building out item banks faster to generating tailored questions based on your specific material, AI-powered authoring is a fantastic innovation.  One we have dreamed of achieving for ages but couldn’t until the advent of LLMs like ChatGPT. However, although it is amazing, it simply improves the way we did things before.

Still, despite all the technological developments, the basic way we measure and assess people is largely unchanged. What has changed, however, is that we now have AI at our fingertips and inextricably involved in all aspects of our lives, including education.

John Kleeman,

Learnosity EVP and Questionmark Founder

What really excites me about AI in assessment, however, is the use of AI to score assessments. I think this is going to dramatically change the way we design assessments and is going to be the future of many kinds of assessments, including those in certification and credentialing.

Ebook: Certifying in an uncertain world

Discover the key to certification programs that stand the test of time with this deep-dive eBook.

Can you explain more? What kinds of questions will AI score?

Historically, we’ve had two choices for how we test people.

We can deliver exams with simple, objective questions that computers can score. These can be multiple choice, or other kinds of questions like hotspot, fill-in-the-blanks, and so on. They are efficient and relatively easy to design and score. Psychometricians have developed reasonably good ways of ensuring that the results of these types of tests are broadly valid as a way of measuring competence or selecting people for certain job roles.

The second approach is that we can deliver exams that humans have to score. These are very costly. For example, according to the UK regulator Ofqual, nearly 70,000 manual graders are needed to score school exams every year. There are also significant accuracy concerns with human scoring; graders make mistakes when scoring questions or, at the very least, don’t always agree with one another.

The challenge here is how to score tests fairly at scale. And that’s where AI comes in.

John Kleeman,

Learnosity EVP and Questionmark Founder

And there are bias concerns too, including for those from marginalized communities and those with disabilities. For example, Ofqual produced research on teacher scoring and found that “bias against pupils with special educational needs (or in favour of those without) was a common finding.”

AI-augmented scoring can potentially make the scoring of unstructured questions more efficient, accurate, and even less biased.  Currently, AI can supplement human scoring but in time, it may be able to take more of a lead (with some human involvement), potentially leading to a huge improvement in how tests and exams are delivered.

In some professions, it’s common that high-stakes exams include a practical element. For example, if you are a pilot or a doctor in training, you will be observed doing practical tasks by other humans before you are allowed to fly a plane or operate on a patient. But if you are taking an IT or vocational exam, very often you will just answer objective questions.

Although there is some performance testing in the IT world and some use of observational tests in vocational training and compliance, it’s so much easier to ask multiple choice or other objective questions and very expensive to fairly score unstructured or observational questions.

Practical tests

AI is exciting for scoring because it can effectively score complex tests like observational assessments, practical tasks, and performance-based testing at scale. And importantly, it might well end up being more accurate than human scoring given enough time.

Why do you think this is so important for certification programs?

I think there are three important trends in certification that make this so important.
The first of these is validity. Many real-world certification exams ask for a lot of knowledge recall. At junior levels, there may be some validity here. People starting a job career might be able to Google facts but still need a basic understanding of the field. In more advanced job levels, however, there is a growing view that you need more sophisticated questions to assess capability. Certification exams and the qualifications that come from them need to be built on questions that genuinely assess job skills. 

And related to this, job skills are changing. With the advent of AI and other technology, skills like practical ability, judgment, and critical thinking are becoming increasingly important. You can test these with well-written objective questions, but it’s much easier to test them with a practical exam. For example, if you are assessing salespeople, have them give a sales presentation and score that, rather than asking knowledge-recall questions on sales principles. 

What really excites me about AI in assessment, however, is the use of AI to score assessments. I think this is going to dramatically change the way we design assessments and is going to be the future of many kinds of assessments, including those in certification and credentialing.

John Kleeman,

Learnosity EVP and Questionmark Founder

And the third trend is AI itself, which is increasingly being used by test takers to help answer questions. Statistics in the certification market are hard to get but a 2025 university survey suggests that 88% of university students use generative AI in some way within assessments. This is up from 53% the year before. There is obviously a real risk that certification exam takers are also using AI. This can be prevented to some extent by good proctoring and good technology, but it is a threat and worth considering nonetheless.

Unstructured questions—where you have to do a task that is broadly similar to a job task—are much more difficult to cheat at and so help mitigate the potential misuse of AI more so than standard objective questions. If well-designed, unstructured questions can measure current job skills more realistically than pure knowledge recall questions.  Moreover, they may ask for skills that AI can’t replicate yet. 

Essentially, there is pressure for credentialing and certification exams to be more valid by having candidates perform job-related tasks in exams, rather than recall knowledge or answer simple questions. There is also pressure for such exams to be resistant to candidates using gen-AI to answer questions, which also encourages more free-form and unstructured tasks.

But the challenge here is how to score tests fairly at scale. And that’s where AI comes in.

What does the future of AI-powered scoring look like in practice?

AI has been used to score essays for decades. ETS were pioneers with their e-rater program over 20 years ago. Traditionally, using AI for scoring has been quite ponderous. You have to train the AI model on hundreds of examples of scored essays and this makes it expensive and only useful for large-scale exams.

However, generative AI (programs like ChatGPT or Claude) doesn’t require this training. With the right rubrics and prompts in place, AI can score essays and other unstructured questions reasonably well. While it’s still not as good as AI which is trained on specific questions, it is a promising area that’s improving. 

The promise of the future is:

  • You ask someone to perform a job-related task.
  • AI records and evaluates what is done and scores it against a rubric.
  • Human oversight is in place to spot check and review unusual events.
  • The result is a reliable and valid score that certifies ability.

We are not there yet. Currently, the latest technology is often only used as a support to human graders, proposing a grade for human review or as a secondary grader to double-check a human-given score. Sometimes it’s also used to give feedback in low-stakes or practice tests.

The commonly raised concern about using AI to score is that AI is not explainable or transparent. But if you define scoring by rubrics or categories, AI can both score in each rubric and give feedback or rationale as to why it selects the categories.

There are risks. AI can make errors, sometimes called hallucinations. AI can have bias; it might score some categories of answers less well than it should, even if on the whole it scores reasonably accurately.  There are also regulations to follow, but in my view, this is a hugely exciting area and the risks are not insurmountable. 

As leaders in AI-powered learning, Learnosity has developed a tool called Feedback Aide, which provides AI-powered grading that augments human marking. This allows the scoring of unstructured questions but also can give useful feedback. The currently released version of Feedback Aide is aimed at essays, but we are looking at short answer questions, accountancy questions, review of video interviews (e.g. for sales or customer service certification), and much more.

AI is improving so fast, this is an area for everyone to watch.

Discover more about the role of AI in certification, the challenges of building effective certification programs, and the ingredients needed to create programs that stand the test of time in our latest eBook: Certifying in an uncertain world.

Related resources

Get in touch

Talk to the team to start making assessments a seamless part of your learning experience.