100 year old assessment tools?
Here’s why neuropsychology is surprisingly AI-resistant — and where meaningful innovation might finally come from.
There’s a surprising amount of inertia in cognitive assessment. We’ve been using essentially the same types of tests for almost a century. The Wechsler tools — based on aptitude measures from the 1930s and 40s — still anchor modern practice: pen-and-paper tasks, verbal reasoning, visual puzzles and block design.
Why so little change?
Because developing new tests isn’t the hard part — norming them is.
Collecting large, stratified datasets across age, education, language and culture is an enormous undertaking. Courts and regulators rely on recognised norms, and even major tech companies don’t have normative samples of this depth.
This makes neuropsychology unusually AI-resistant.
Any tool even slightly less sensitive than current gold standards wouldn’t survive medicolegal scrutiny. And performance validity testing — central to good assessment — is extremely difficult to automate.
And many of our “simple” tasks are simple by design.
Their simplicity isolates core cognitive processes cleanly and avoids the confounds introduced by more complex digital tasks (device variability, motor demands, cultural load). Sometimes the simplest tools are the most robust.
Neuropsychology is also not just measurement — it’s interpretation.
Two people with identical scores can have very different explanations for their difficulties. Distinguishing these relies on behavioural observation, contextual judgement, effort evaluation and pattern integration. This interpretive layer is profoundly human.
There are ethical issues too.
Cognitive assessment affects treatment, compensation and autonomy. AI systems risk embedding bias, lack transparency (“black-box” reasoning), and raise unresolved questions about accountability and data governance. Errors here can cause real harm.
Change is coming at the edges: AI-assisted note-taking, automated scoring, and new digital paradigms such as VR tasks, gamified memory measures and ecological multitasking. These can capture fine-grained data like reaction times and micro-errors.
But norms remain the bottleneck.
True disruption will only occur if someone builds:
• large, global normative datasets
• validated and secure digital tasks
• built-in effort testing
• medicolegal acceptance
• integration with imaging or biomarkers
And a final barrier: cost.
Creating global norms is more a financial challenge than a scientific one — requiring huge investment and infrastructure. It’s not yet clear who would fund it.
If that day comes, the field may shift rapidly — but it still feels a long way off.