How Perplexity ranks sourcesThe documented surface and the structural levers — without overclaim.
Perplexity publicly documents PerplexityBot — its autonomous crawler that honours robots.txt — and Perplexity-User — its user-triggered fetcher, which Perplexity has stated generally does not treat robots.txt as binding because the fetch is user-initiated. Perplexity also publishes general source-quality guidance — but not a complete ranking algorithm. This page describes what is documented, separates it from inference, and lists the structural signals operators can verifiably move.
By Martin Yarnold · UpdatedWhat Perplexity publishes vs what is inferred
Documented
Inferred
Frequently asked questions
Does Perplexity publish its source-ranking algorithm?
No. Perplexity publicly documents PerplexityBot — its autonomous crawler that honours robots.txt — and Perplexity-User — its user-triggered fetcher, which Perplexity has stated generally does not treat robots.txt as binding because the fetch is user-initiated. Perplexity also provides general guidance about source quality at its developer docs. It does not publish a complete ranking algorithm. Specific claims about "Perplexity prefers X over Y" beyond the documented bot-level behaviour are inference, not vendor contract.
How does Perplexity find sources for each answer?
Operationally: Perplexity performs per-query retrieval at answer time. The retrieval surface combines crawled content (via PerplexityBot indexing — autonomous crawling that honours robots.txt) with live web fetches (via Perplexity-User — user-triggered, which Perplexity has stated generally does not treat robots.txt as binding because the request is user-initiated). The actual ranking that selects the 3–8 sources displayed in the answer is not publicly documented; treat it as a closed retrieval system that operators influence via reachability and content quality, not direct tuning.
Why does Perplexity show so many sources per answer?
It is a product design decision. Perplexity's answer UI is built around citation visibility — every answer carries numbered source links, typically 3–8 per response. This contrasts with engines where citation visibility is conditional on retrieval mode. The high density makes Perplexity an easier engine to verify against ("did I appear in the source list?") but does not necessarily mean each citation drives proportional click-through.
Which structural signals can operators verifiably influence?
The same signals that move every retrieval-based engine: PerplexityBot allowed in robots.txt (the autonomous indexer responds to robots.txt allow/disallow rules), Perplexity-User allowed too as harmless hygiene (though Perplexity has stated it generally treats Perplexity-User as user-initiated and may not honour robots.txt for those requests), fast server responses (sub-300ms), valid JSON-LD on each page, complete metadata, substantive content, and clean internal linking. None of these guarantee citation; collectively they remove the common blockers. Specific ranking signals beyond these — content freshness weight, backlink influence, etc. — are not publicly documented by Perplexity.
Does Perplexity Pro cite differently than free?
Citation visibility appears similar — both modes display numbered sources. Pro accesses different underlying models and "Focus" modes that change retrieval depth, but the citation-display contract is consistent. Perplexity does not publish a documented difference in ranking by tier. For citation tracking, treat free-tier observations as representative of the basic ranking; Pro-tier behaviour may differ on edge cases but not in the citation-display contract.
How do I measure Perplexity citations against my queries?
Perplexity is the easiest engine to measure against because citations are always visible. Pick a fixed query set (10–30 questions), run them in Perplexity weekly, and record whether your domain appears in the numbered source list per query. Promagen Sentinel automates this; the manual version is a structured spreadsheet plus weekly habit. The output is a citation-rate-per-query time series.