Only ~700 Hours of Human Thinking Data Exists

After decades of cognitive science research, humanity has collected less than 700 hours of concurrent think-aloud transcripts — real-time verbalized reasoning that captures how humans actually think through problems.

~700h
Total Global Data
<1,000
Participants Ever
~10
Public Datasets
Understanding the Data

What is Concurrent Think-Aloud Data?

Concurrent think-aloud is a research methodology where participants verbalize their thoughts in real-time while solving problems. Unlike retrospective explanations, this captures the actual cognitive process as it unfolds.

This data is the closest approximation to raw human cognition that can be externalized. It includes:

  • Real-time problem decomposition
  • Hypothesis generation and testing
  • Dead-ends, backtracking, and pivots
  • Metacognitive monitoring ("this doesn't seem right")
  • Uncertainty and confidence signals
  • Strategy selection and adaptation

Example Think-Aloud Transcript

"Okay so I need to make 24 from 8, 3, 8, and 1... Let me try multiplication first... 8 times 3 is 24, that works! But wait, I still have 8 and 1 left... Hmm, can I use those to get back to 24? 8 times 1 is 8... that doesn't help. Let me think differently... what if I divide? 8 divided by... no, that gives decimals. Actually wait — 8 minus 1 is 7... no. What about (8 - 1) times... no, 7 times 3 is 21. Okay new approach: can I make 3 from these? 8 divided by 8 is 1... 1 plus 1 is... I don't have another 1. Oh! 3 times 8 is 24, then 8 divided by 8 is 1... So (3 × 8) × (8 ÷ 8) = 24 × 1 = 24!"

This raw stream captures cognitive exploration that no polished explanation would include.

The Scarcity Problem

Why Does So Little of This Data Exist?

Unlike speech corpora (thousands of hours in LibriSpeech) or chat logs, concurrent think-aloud collection faces unique challenges that have kept it at tiny scale for decades:

Time-Intensive Collection

Each session requires 30-90 minutes of focused recording with a trained researcher present. No shortcuts.

Skilled Participants Needed

Not everyone can effectively verbalize their thinking. It requires practice and the right cognitive tasks.

Manual Transcription & Coding

Raw audio must be transcribed and often coded for cognitive events — historically a manual, expensive process.

Ethical & Privacy Concerns

Think-aloud reveals personal reasoning patterns. IRB approval and consent requirements limit sharing.

No Centralized Repository

Data is scattered across university servers, institutional repos, and paper supplements with inconsistent formats.

Limited Incentive to Share

Researchers collect for specific studies, not public datasets. Publication incentives don't reward data sharing.

Current State

All Known Public Think-Aloud Datasets

This is effectively everything publicly available. The bulk (~360+ hours) comes from a single 2025 study.

CMU Statistical Reasoning Think-Aloud Interviews

Hour-long interviews where students verbalize reasoning while solving stats problems. One of the cleaner, most accessible examples.

Domain: Introductory statistics problem-solvingN: ~31 studentsSource: Carnegie Mellon University / Kilthub
~42 hours
hours

Verbal Cognitive Reflection Test (vCRT)

Focuses on dual-process reasoning and whether verbalizing affects performance. Multiple short tasks with concurrent verbalization.

Domain: Verbal insight/reflection problemsN: ~149 totalSource: OSF Repository
~20-40 hours
hours

Open Corequisite Writing Think-Aloud Sessions

897 double-spaced pages of combined transcripts. More process-oriented than pure problem-solving.

Domain: Textbook authoring/revisionN: Small cohortSource: Figshare
~15-30 hours
hours

Scaling Up Think-Aloud (Game of 24 Study)

The largest single documented effort. Automated transcription + coding into search graphs. Median 34 minutes per session.

Domain: Mathematical reasoning puzzlesN: 640 participantsSource: 2025 arXiv / GitHub
~300-400 hours
hours
AI Applications

Why This Data is Critical for Training AI

Think-aloud transcripts are invaluable for improving AI reasoning — they offer advantages over synthetic Chain-of-Thought or post-hoc rationales because they capture authentic, messy, incremental human cognition.

Supervised Fine-Tuning on Natural Reasoning

Train models to generate human-naturalistic chains with exploration, partial ideas, and self-correction — far better than polished Chain-of-Thought.

Imitation Learning of Human Search

Teach models realistic search strategies: how humans explore, backtrack, prune, and navigate problem spaces.

Reward Modeling & RLHF Alignment

Human verbal reports reveal effort, stuck points, and error recognition — signals for preference data that value metacognition.

Authentic Reasoning Traces

Capture hesitations, dead-ends, self-corrections, and strategy shifts that synthetic CoT completely lacks.

Think-Aloud vs Synthetic Chain-of-Thought

FeatureThink-AloudSynthetic CoT
Hesitations & pauses
Dead-ends & backtracking
Self-correction in real-time
Metacognitive language
Natural strategy shifts
Uncertainty expressions
Scalable to billions of tokens
Authentic human cognition

Help Us Change This

Omega Quest is building the world's largest open dataset of human reasoning traces. By sharing your thinking process on challenging problems, you're contributing to a resource that will help AI understand how humans actually think.