arxiv.org is read but not cited by AI systems

arxiv.org

Classified by Aater · 17 May 2026

arxiv.org is accessible and legible to AI systems, but lacks the authority signals needed for consistent attribution.

capturable

absent

marginal

capturable

emerging

authoritative

How this classification was derived · the three gates

Gate 1 · Reachability

pass

AI systems can access and fetch this domain's content.

Gate 2 · Legibility

present

Content present but not fully structured

Gate 3 · Authority

moderate

Authority signals are insufficient for reliable AI citation.

Primary constraint: Content has high density (0.35) but lacks the minimum required distinct entities or specific data claims.

AI systems can reach arxiv.org but struggle to extract its content.

Recommended next steps

1.Add author attribution1 hr

Expected impact: Improves Authority

View implementation →

<script type="application/ld+json">
{ "@context":"https://schema.org", "@type":"Article",
  "headline":"Post title",
  "author":{ "@type":"Person", "name":"Author Name", "url":"https://example.com/author" } }
</script>

2.Add Organization schema15 min

Expected impact: Improves Authority

View implementation →

<script type="application/ld+json">
{ "@context":"https://schema.org", "@type":"Organization",
  "name":"Your Company", "url":"https://example.com",
  "logo":"https://example.com/logo.png",
  "sameAs":["https://www.linkedin.com/company/you","https://x.com/you"] }
</script>

3.Add JSON-LD structured data15 min

Expected impact: Improves Legibility

View implementation →

<script type="application/ld+json">
{ "@context":"https://schema.org", "@type":"WebSite",
  "name":"Your Site", "url":"https://example.com" }
</script>

Why these recommendations?

Content density is 0.35 against a threshold of 0.3 for Structured classification. The page lacks sufficient specific claims and named entities for AI systems to extract meaningful, attributable information.

• No structured data (JSON-LD / schema.org) — machine-readable metadata is absent.

• No Organization schema — entity identity is not machine-asserted.

• No author attribution — content lacks attributable provenance.

• Add publication dates: Retrieval-augmented AI pipelines filter by recency. Undated content is deprioritised in freshness-weighted retrieval.

Additional signal · Entity presence

Does not affect participation state

How the public web recognizes this organization as an entity (knowledge graph). Observational only — a lower bound: absence means “not documented in the knowledge graph,” not “does not exist.”

✓Recognized as an entity in Wikidata: arXiv.

✓English Wikipedia article present.

·No linked YouTube presence.

✓Linked GitHub presence.

See how arxiv.org compares to a competitor →

Pulse · Live monitoring

This is a snapshot. Pulse is the live feed.

See which AI agents are actually crawling arxiv.org — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and more — at what frequency and depth. A lightweight snippet reveals the agent activity your logs don't surface.

Pulse can alert you when authority signals improve or new AI agent activity is detected on arxiv.org.

Activate Pulse on arxiv.org →

Lightweight install · no performance impact · first agent activity in 24–72h (traffic-dependent)

What AI systems extract from this page →

Server-delivered content the crawler read · first 400 characters

arXiv.org e-Print archive Skip to main content Learn about arXiv becoming an independent nonprofit. We gratefully acknowledge support from the Simons Foundation, member institutions , and all contributors. Donate Status Login Help | Advanced Search All fields Title Author Abstract Comments Journal reference ACM classification MSC classification Report number arXiv identifier DOI ORCID arXiv author

Measured 17 May 2026 · 18d ago

This classification reflects Aater's assessment of observable structural signals. It does not represent an editorial opinion about the quality or value of this domain or organisation. Domain owners may request removal by writing to founder@aater.ai. Requests are honoured within 48 hours.