Easy Problems That LLMs Get Wrong

Research

This research explores a collection of seemingly simple problems that consistently challenge Large Language Models (LLMs). Despite their impressive capabilities on complex tasks, these models struggle with certain basic reasoning challenges that humans find intuitive.

Key Findings

  • Identified 30 simple problems that challenge LLMs
  • Analyzed patterns in model failures
  • Compared performance across different models
  • Explored implications for AI reasoning capabilities

Research Impact

  • Reveals blind spots in current AI systems
  • Provides benchmark for reasoning evaluation
  • Suggests directions for model improvement
  • Highlights gaps between human and AI reasoning

Research Visualizations

Pass Rate of Each Question (Hardest to Easiest)

Question Difficulty Analysis

Pass rates for each question, ordered from hardest to easiest for LLMs to solve.

LLM Linguistic Benchmark Performance

Model Performance Comparison

Benchmark results comparing various LLMs against human-level performance.

Model Answer Correlation Heatmap

Model Answer Correlation

Heatmap showing how different AI models' answers correlate with each other.

Question Correctness Map

Question Correctness Map

Visualization of which questions each model answered correctly (green) or incorrectly (red).

© 2025 | Research by Sean Williams and James Huckle