Claude vs GPT for summarising sourcesTwo capable summarisers; very different document strengths.

Both Claude and GPT summarise sources well in 2026 — Anthropic and OpenAI document the major capabilities, including context-window sizes and retrieval-mode attribution patterns. This page compares documented behaviour and observable patterns. Model output shifts between releases, so treat operational signals as best-effort as of the page's update date, not guarantees.

By Martin Yarnold · Updated 10 May 2026

Both engines, weekly

Sentinel measures observable summarisation and citation behaviour across Claude and GPT (plus Perplexity and Gemini) on a fixed query set every Monday.

See how Sentinel measures it →

Six dimensions

Context windows are vendor-documented; attribution patterns describe observable behaviour and may shift between model versions. Verify against current vendor docs before quoting in commercial materials.

Dimension	Claude	GPT
Base context window	200K tokens (documented).	128K tokens (GPT-4-class, documented).
Attribution in retrieval mode	Inline citations when invoked with document context or web access.	Per-answer source attribution in Search mode.
Attribution outside retrieval	Inconsistent — not documented as a contract.	Inconsistent — not documented as a contract.
Best for long documents	Strong — extended context retains detail.	Capable, may chunk for very long sources.
Best for short web summaries	Capable, especially in Claude.ai web mode.	Strong — Search mode aggregates cleanly.
Vendor docs	docs.claude.com, support.claude.com.	platform.openai.com/docs.

Frequently asked questions

Which is better at summarising sources accurately?

Both are capable; the right answer depends on the source. Claude's published behaviour emphasises faithfulness to provided context, particularly for long documents — Anthropic documents Claude's extended context window (200K tokens at base, with longer windows available) as a deliberate design choice for document-grounded use. GPT (OpenAI) handles shorter sources well and integrates retrieval cleanly in Search mode. For a 50-page PDF, Claude tends to retain detail better; for a quick web-search summary with multiple short sources, GPT in Search mode produces tighter aggregations. Both models change with each release — test against your specific source mix.

Which model attributes sources more consistently?

Both attribute when invoked with retrieval or document context; neither attributes consistently outside those modes. Claude in retrieval mode (when given documents or web access) typically inserts inline citations. GPT in Search mode shows source attribution per retrieval-augmented answer. Neither vendor publishes "attribution rate" as a documented contract, so treat any specific percentage as observed behaviour, not guaranteed. The reliable signal: if you want consistent attribution, invoke the model with explicit retrieval context.

How does the context-window difference affect summarisation?

Claude's 200K-token base context (with longer windows on specific tiers) and GPT-4-class models' 128K-token context affect how much source material the model can hold at once. For large documents — long reports, multi-document research, large codebases — Claude can ingest more in a single pass without chunking. For typical web-search summaries (3–5 short articles), the difference disappears because both models comfortably fit the sources. Pick by source size: large = Claude advantage; small = parity.

Which model is safer against paraphrase-without-attribution?

Paraphrase without attribution is a known failure mode for both. Neither vendor guarantees attribution in free-form output. The structural mitigation that works for both: optimise your source page so the engine cannot disambiguate the content from anywhere else (entity-clarity in schema, distinctive phrasing, author byline, provenance hash). Strong source identity reduces the chance of attribution-less use because the engine has a stable entity it can name. This is a content-side defence; vendor-side guarantees are not on offer.

Which model is more likely to cite specific claims rather than whole sources?

Claude in retrieval mode tends to inline-cite specific claims when invoked with the right system prompt or in the Claude.ai web app with web access. GPT in Search mode produces source-level attribution more often than claim-level. Neither is documented as a contract — both behaviours change with model version. The practical move: structure your content with claim-level anchors (stable FAQ IDs, headings with predictable slugs) so that whichever model cites you, the citation can deep-link to the specific claim.

Where can I read the vendors' own docs to verify this?

Anthropic publishes Claude's capabilities and crawler behaviour at docs.claude.com and support.claude.com. OpenAI publishes bot information and model capabilities at platform.openai.com/docs. Both vendors' release notes describe per-version changes — model behaviour shifts between releases, so a comparison written today is stale by the next major version. Always verify against the current docs before quoting in commercial materials.

Get a free Sentinel snapshot →

Rotate for Promagen