AGI Explained: What Artificial General Intelligence Actually Means in 2026
Ask ten AI researchers what AGI is and you will get eleven answers, two of them from Yann LeCun. The acronym stands for artificial general intelligence, but there is no agreed-upon technical threshold for it, no benchmark that anyone treats as definitive, and no shared definition between the people building toward it and the people warning about it. What there is, instead, is a steadily growing set of systems that do more of the things humans used to do alone, and a debate about whether the line between those systems and "general" intelligence is a meaningful one or a marketing one.
This guide unpacks the term: where it came from, what current models can and cannot do that bears on it, what the timelines actually look like once you read past the headlines, and why the whole framing might be the wrong question to ask in 2026.
Table of contents
- The definition problem
- Narrow AI vs AGI vs ASI
- Where current models actually are
- Benchmarks that matter, and the ones that do not
- Timelines: what serious researchers say
- What changes if AGI arrives
- The honest case for and against the framing
- Frequently asked questions
- The bottom line
The definition problem
The phrase "artificial general intelligence" was popularised in the early 2000s as a way to distinguish ambitious cognitive-architecture research from the narrow, task-specific machine learning of the time. The original instinct behind it was clear: a system is "general" when its competence does not collapse the moment you take it out of the training distribution.
That instinct is harder to formalise than it looks. Humans are general in the sense that we transfer skills across domains, but we are catastrophically narrow in others, including most areas of mathematics, most physical sports we have not personally trained for, and most languages we do not speak. If "general" means "competent at everything a human can do", almost no human qualifies.
OpenAI's working definition, in its 2023 charter and reaffirmed in the 2024 superalignment papers, is "highly autonomous systems that outperform humans at most economically valuable work". DeepMind's 2023 framework breaks the question into levels (Emerging, Competent, Expert, Virtuoso, Superhuman) crossed with a generality axis (Narrow vs General). Anthropic has been more cautious, treating AGI as a useful direction but not a meaningful destination.
The honest summary is that AGI is a research aspiration with at least four serious technical definitions and a thousand journalistic ones, and the differences between those definitions matter enormously when you read claims about progress.
Narrow AI vs AGI vs ASI
The standard three-rung ladder runs Narrow -> General -> Super. Narrow AI does one thing well; everything we have ever shipped in production fits here. AGI is the system that can pick up a new task at human level without being retrained from scratch. ASI, artificial superintelligence, is the speculative tier where a system outperforms the best humans at essentially all cognitive work, including the work of building better systems.
Two things often confused with each other:
- Multi-task is not general. A model trained on text, images, audio and video is multi-modal, not general. It still fails predictably outside its training distribution.
- Human-level on a benchmark is not human-level in the world. A model that matches a doctor on a written exam may still be unable to take a patient history in a noisy clinic. The benchmark is a sample of what the doctor does, not the job.
For a deeper map of the standard taxonomy, see our guide to the types of AI, which covers the older "reactive / limited-memory / theory-of-mind / self-aware" framing alongside the modern Narrow/General/Super version.
Where current models actually are on the spectrum
The frontier models of 2025-2026 (GPT-5, Claude 4 Opus, Gemini 2.5 Pro, and the open-weights leaders that trail by twelve to eighteen months) sit somewhere strange. They are not narrow in the old sense. A single model handles legal drafting, code generation, image analysis, scientific summarisation, customer support and conversational tutoring without specialised retraining. By the 2015 definition of "narrow", they are not narrow.
They are not general in the strong sense either. They still fail in characteristic ways: long-horizon planning across many steps, self-correction when an early step is wrong, learning from a single example in a permanent way, and most reliably, knowing what they do not know. Hallucination rates have fallen, but the models still confidently produce wrong answers in domains where they have shallow coverage.
The most useful framing is DeepMind's 2023 levels paper: in 2026, frontier models are "Competent" (50th percentile of skilled human) at general tasks, with "Expert" (90th percentile) performance in specific subdomains where the training data is dense, like Python coding, common-law contract review, or undergraduate physics problems. They are not yet "Virtuoso" generally.
The benchmarks that matter (and the ones that do not)
If you have to evaluate a single claim about AGI progress, evaluate the benchmark behind it. Most popular benchmarks have been saturated and stopped meaning what they used to mean.
| Benchmark | What it tests | 2026 status |
|---|---|---|
| MMLU | Multi-domain knowledge questions | Saturated above 90% by frontier models; no longer informative |
| HumanEval | Python coding from docstring | Saturated; superseded by SWE-bench Verified |
| SWE-bench Verified | Real GitHub issues, end-to-end fix | Frontier ~70% in 2025, the meaningful coding benchmark |
| ARC-AGI-2 | Visual reasoning puzzles unseen in training | Frontier still well below human; the most cited "no AGI yet" benchmark |
| GPQA Diamond | Graduate-level science questions | Frontier reaches PhD median; informative for expert-level claims |
| FrontierMath | Research-grade mathematics | Frontier solves a small fraction; the current ceiling test |
Two patterns are worth taking from this. First, every benchmark gets saturated faster than its authors expected. ARC-AGI was meant to be a long-standing wall and has already been chipped. Second, the benchmarks that are hardest to game (open-ended scientific reasoning, novel puzzle structures) remain the ones models do worst on. Whether those are pointing at "general intelligence" or just "the things we have not yet figured out how to train for" is the open question.
Timelines: what serious researchers actually say
Three sources cover most of the credible timeline distribution. Metaculus, the forecasting platform, has a long-running AGI question whose median dropped from "around 2050" in 2020 to roughly "early 2030s" by mid-2025. Surveys of AI researchers conducted by AI Impacts in 2022 and 2023 saw the aggregate "50% probability of high-level machine intelligence" date pull forward from 2061 to 2047 in a single year. The big lab leaders have their own range: Anthropic's CEO has publicly speculated about "powerful AI" in the 2026-2027 window, Demis Hassabis has spoken about a five-to-ten-year horizon, and Yann LeCun has consistently argued the framing is incoherent.
Three things to keep in mind when reading any timeline:
- The question being forecast varies wildly. "AGI", "transformative AI", "human-level machine intelligence", and "AI that can do most economically valuable work remotely" are not the same target.
- Researchers building the systems consistently shorten their timelines as compute scales, and consistently lengthen them after a hard problem (like reliable agentic behaviour) takes longer than expected to crack.
- Forecasts have been wrong in both directions. Self-driving cars in 2018 were forecast to be widespread by 2022; conversational AI of GPT-4 quality was forecast for 2030 as late as 2020.
For the policy-side reading on what serious institutions are doing about these timelines, see our guide to AI ethics, governance and best practices, which covers the EU AI Act and NIST framework in depth.
What changes if AGI arrives
The interesting question is not "when AGI" but "which threshold matters for what". Different definitions of AGI imply very different consequences.
Economic threshold. If a system can do most knowledge work remotely at human cost, the labour market changes shape, even if the system never has a "general" cognitive profile. Most of the 2024-2025 worry about AI and jobs sits at this threshold and does not require AGI in any deep sense.
Research threshold. If a system can do original research at the level of a competent PhD, the pace of science changes, including the pace of AI research itself. This is the threshold most AI labs are explicitly aiming at, because a system that improves the next system compresses every subsequent timeline.
Strategic threshold. If a system can plan and act in long-horizon goal-directed ways more capably than any human, alignment becomes a forcing problem rather than an academic one. The Anthropic and OpenAI safety teams orient their work around this threshold, not the others.
The three thresholds are not the same and arrive in different orders depending on which pieces of the problem fall first. The 2024-2026 evidence suggests the economic threshold is being approached one knowledge-work category at a time, the research threshold is creeping closer in narrow scientific subdomains, and the strategic threshold remains somewhat ahead.
The honest case for and against the framing
The case for taking AGI seriously as a frame. Capability scaling has been remarkably smooth since 2018. Each generation of large language model, doubled in compute and parameters, has produced predictable performance gains across heterogeneous tasks. If that scaling continues, and if the kinds of failures models still make turn out to be fixable by post-training techniques (RLHF, agentic scaffolding, retrieval), then "general" capability arrives by extrapolation rather than by any new science. Several lab leaders take this view explicitly.
The case against. Scaling laws describe loss on training distributions; they do not describe robustness, reliability, novelty, or the harder cognitive tasks. The remaining gaps are real and may not be the kind of gaps that scale closes. Worse, "AGI" works as a moving target: every benchmark frontier models pass becomes "well, that wasn't really general intelligence", which is exactly the failure mode the term was supposed to avoid. Yann LeCun, Melanie Mitchell, and Gary Marcus have made versions of this argument in book-length form.
There is a third position, increasingly common, that the AGI framing matters less than the practical question of whether systems are getting reliable, controllable and economically integrated. From this view, a model that handles half of all knowledge work reliably is more transformative than a system labelled "AGI" that cannot be trusted in production. The 2026 reality is closer to the first scenario than the second.
Frequently asked questions
Has AGI already been achieved?
No, by any of the four major working definitions. Frontier models in 2026 are competent generalists in a way nothing was in 2020, but they fail at long-horizon planning, novel reasoning outside their training distribution, and reliable self-correction. By DeepMind's 2023 levels framework they are around "Competent" general performance, well below "Virtuoso". By OpenAI's "outperforms humans at most economically valuable work" definition, they outperform on a growing minority of tasks, not most.
What is the difference between AGI and superintelligence (ASI)?
AGI is human-level competence across most tasks; ASI is superhuman competence across most tasks. The interesting claim from researchers who take ASI seriously is the recursive part: if an AGI can improve itself, the gap between AGI and ASI may be small in calendar time. Whether that recursion actually works, or runs into the same long-horizon planning limits current models hit, is one of the biggest unknowns in the field.
How close are GPT-5 and Claude 4 to AGI?
Closer than anything before them on most benchmarks, but still failing the diagnostic tests that the term was originally meant to flag. They pass MMLU, GPQA Diamond and SWE-bench Verified at expert level. They struggle on ARC-AGI-2 and FrontierMath, the benchmarks specifically designed to require generality and novelty. Treat any "GPT-5 is AGI" claim as marketing; treat any "they will never be AGI" claim as overconfident in the other direction.
Is AGI a useful concept or just hype?
It is a real research target with at least four serious technical definitions. It is also the most marketed term in AI, used to raise capital and to sell concerns of varying credibility. The way to keep yourself honest is to refuse to discuss "AGI" without nailing down which definition you mean, and to focus on benchmark trends and economic deployment rather than headline claims.
Will AGI take my job?
The economic threshold of AGI matters more for jobs than the cognitive one. By 2026 the visible pattern is task-level automation rather than role-level: knowledge workers spend less time on the parts of their job a model handles and more time on the parts it does not. Roles that are bundles of tasks where every task is automatable shrink first; roles that are bundles where many tasks involve physical presence, regulated trust, or accountability shrink last. Our AI careers hub covers this in more depth.
Who decides when AGI has arrived?
Nobody. There is no governing body, no benchmark with the authority of an Olympic record, and no agreed test. The most likely pattern is gradual, with different sub-communities (researchers, economists, policymakers) declaring AGI at different points and for different reasons. Expect a long period where the question "is this AGI?" is contested in good faith.
The bottom line
Treat AGI as a direction, not a destination. The useful questions in 2026 are narrower and better posed: which tasks do current systems do reliably, where do they fail predictably, how fast is the failure boundary moving, and what does it cost to integrate them into your particular workflow. Those questions have testable answers; "is it AGI yet" mostly does not. Spend your attention on benchmark trends like SWE-bench Verified and ARC-AGI-2, on real economic deployment patterns, and on the three concrete thresholds (economic, research, strategic) that change different things on different timelines. The AGI label will get applied eventually, by someone, in a press release. By the time it does, the underlying systems will already have been changing your work, your industry, and the economic landscape for years.
Last updated: May 2026
