The Future of Intelligence: Exploring assessments of AI capabilities

On Day 4 of the 2021 International Conference on Artificial Intelligence in Work, Innovation, Productivity and Skills, four AI experts discussed the approach of the OECD’s Future of Skills project to assessing AI capabilities and gave examples of alternative tests available from education, occupational certification, cognitive psychology, and animal cognition.

Why is AI evaluation important?

Jose Hernandez-Orallo, professor at the Universitat Politécnica de València, kicked off the discussion by affirming that AI cannot simply be regarded as another emergent technology. Rather, it must be regarded as intelligence. He further suggested that the question of AI evaluation will become increasingly about intelligence, for it is intelligence that allows us to transform the world around us and so it will radically transform our future, regardless of whether this intelligence is our own or that of a machine. It is merely a question of time-scale.

AI will transform every sphere of society, yet in order to oversee this transformation in a responsible, sustainable way, we must be able to identify what and how much AI can do, what will change and in which areas. Moreover, we can begin to think about what skills can AI systems acquire by themselves, and how that may be used for our benefit. The “substitution narrative” that is often adopted in conversations around AI is too narrow: AI will not simply replace humans in areas of our life that we can already handle well by ourselves. It is not about performing tasks; it’s about capabilities that exceed mere questions of efficiency.

How do we assess intelligence and human abilities?

Patrick Kyllonen, Distinguished Presidential Appointee at Educational Testing Service, introduced examples of cognitive ability items from intelligence testing and education measurements. He outlined the different types of intelligence that we can identify, from fluid intelligence (including induction, sequential and deductive reasoning and quantitative reasoning), crystallized intelligence (including reading comprehension, vocabulary and cloze) and creativity or idea generation. Through a variety of different tasks, questions and challenges, human abilities can be assessed and categorised according to relative strengths and weaknesses.

Similarly, Britta Rüschoff, Professor at the University of Applied Sciences Düsseldorf for competence assessment in Vocational Education and Training (VET) in Germany, spoke on the nature of performance-based tasks, which are similarly used to assess human abilities, however in a specific context. She pointed out that these have a strong practical focus, and assess competence with immediate relevance to a profession. These types of assessments thus are excellent to investigate suitability for a specific job or tasks as they test concrete job-related behaviour and capabilities.

Lucy Cheke, of the Leverhulme Centre for the Future of Intelligence at Cambridge University, introduced “low-level” cognitive skills, speaking about animal cognition and early children’s cognitive abilities. She gave example of basic “common sense” skills, such as recognising object permanence, or basic navigating skills. As such skills are primarily intuitive in nature, testing these “low-level” skills would give us a better understanding of the complexity of AI’s intelligence beyond the ability to complete tasks in a predictable, structured environment.

If AI could pass these different tests, what could we learn about the capabilities of AI?

The panel agreed that while such assessment types can test intelligence, abilities and performance in humans, these conclusions cannot be transferred quite so easily to AI. It is not possible to make the same interference one could make with a human passing these different tests. In the work context, AI’s are generally specialized for a specific task. Using a test to assess AI’s capability to perform this task would only prove that AI can solve this test, but no inference can be make for performances in different contexts, even if these differ only slightly. For General AI, there could be such transferability. Yet the predictability ratio remains completely different to humans, and identifying it will be crucial for us to understand how AI differs from the human mind.

The discussion ended on the question of whether we should be using human minds to model AI, and what the alternative could be. The panel emphasised that our intelligence is our only reference point for intelligence. As it is the only model available to us, our best proxy, we have to ultimately fall back on it to measure intelligence. It is our starting point for understanding intelligence, our starting point for successfully integrating AI into the workplace, designing it for ethical decision-making and protecting humans from a potentially unpredictable, unsafe future. Hence we must continue to explore the different ways of assessing AI capabilities to make sense of what our future with AI may hold.

by Stina J. Nölken