This research explores a collection of seemingly simple problems that consistently challenge Large Language Models (LLMs). Despite their impressive capabilities on complex tasks, these models struggle with certain basic reasoning challenges that humans find intuitive.
Pass rates for each question, ordered from hardest to easiest for LLMs to solve.
Benchmark results comparing various LLMs against human-level performance.
Heatmap showing how different AI models' answers correlate with each other.
Visualization of which questions each model answered correctly (green) or incorrectly (red).
© 2025 | Research by Sean Williams and James Huckle