The Topaz et al. 2026 Lancet correspondence reports exactly these four facts, and each one is confirmed by verbatim text in the paper itself.
What Was Claimed?
The claim summarizes the headline findings of a recent study about citation integrity in biomedical research. A team led by Maxim Topaz at Columbia University built an automated pipeline to scan a very large slice of the published medical literature, looking for references that pointed to studies which — once you actually checked — did not appear to exist anywhere. The claim names the corpus size, the time period, the number of fabricated references found, and the number of distinct papers that contained them.
Why this matters: scientific papers cite each other to build chains of evidence. If a paper cites a study that does not exist, then any reader, peer reviewer, or clinical guideline writer relying on that citation is leaning on something fictional. The numbers in this claim describe the scale of that problem in 2023–2026 — a period that roughly overlaps with the rapid adoption of large language models, which are known to invent plausible-looking but fictitious references.
What Did We Find?
The source paper does state all four figures, with very close textual matches.
The methods section says the team scanned "PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026: 2 471 758 papers and 125 615 773 structured references." That is the corpus and the time window. The paper's own title rounds the corpus to "2·5 million biomedical papers," matching the claim's "2.5 million." The time window is summarized in the claim as "January 2023 through February 2026," which is a faithful month-level description of the exact endpoints.
The results section then says: "Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers." That is both of the remaining figures in the claim, in a single sentence. The paper's limitations section also independently restates the affected-paper count: "Of the 2810 affected papers, 98·4% had received no publisher action at the time of our audit."
A relevant nuance from the supplementary appendix is worth surfacing. The main paper uses the word "fabricated" as if these references definitely point to nothing. The supplement is more careful — it calls them "suspected fabricated references" and reports that the pipeline had 91% precision on a blinded validation by three independent reviewers. So while the paper headlines 4,046, the more rigorous reading is that roughly 91% of those — about 3,680 — are very likely true references to nothing, and the rest are likely false positives. That does not change what the paper reports, which is what the claim is about. But it is worth knowing if you are quoting the figure yourself.
The paper was published only eleven days before this proof was run, so there has been essentially no time for retractions or corrections to appear. No such notices are attached to the version on file.
What Should You Keep In Mind?
This proof verifies what the paper says, not whether the world is exactly as the paper says. The pipeline's 91% precision means the 4,046 headline figure is the pipeline's count, not a ground-truth count of confirmed nonexistent studies — the rigorous count is somewhat lower. The pipeline also cannot estimate recall, so the true total of fabricated references in this corpus may be higher than 4,046 if any slipped past the filters. The paper excluded 23% of references that lacked a PubMed identifier or DOI; fabricated references could be more or less common in that excluded slice. And PubMed Central's Open Access subset is not the full biomedical literature — the paper acknowledges this. None of these caveats affects the claim, but they matter for anyone planning to use the 4,046 figure.
The paper itself is a single Lancet correspondence, not an independently replicated finding. As replication and follow-up work appear over time, the figures may need updating.
How Was This Verified?
The proof located each numeric figure in a verbatim quote from the source PDF, then used the proof-engine's citation verifier to confirm that each quote actually appears in the paper. Four sub-claims, all confirmed. See the structured proof report for the evidence summary, the full verification audit for the raw computation traces and source credibility assessment, or re-run the proof yourself.
What could challenge this verdict?
Four adversarial questions were investigated; none break the proof.
The paper's headline term is "fabricated references," but the supplementary appendix uses the more cautious "suspected fabricated references" and reports a pipeline precision of 91% (Fleiss' κ = 0.71, from a 500-entry masked validation by three independent reviewers). Strictly, this means the true number of references-to-nothing is approximately 4046 × 0.91 ≈ 3,682 — not exactly 4,046. The claim under verification, however, is about what the paper reports, and the paper reports 4,046 as the headline figure without precision adjustment.
The paper was published as Lancet correspondence on May 9, 2026; today's date is May 20, 2026. Eleven days post-publication leaves essentially no window for a formal retraction or correction to appear, and none is attached to the uploaded PDF.
The claim phrases the date range as "January 2023 through February 2026," while the paper specifies "Jan 1, 2023, to Feb 18, 2026." The month-level phrasing is a common and accurate way to summarize an interval ending Feb 18 — it does not imply analysis through Feb 28.
The "2.5 million" figure matches the paper's own headline rounding of 2,471,758 (which is 2.47M, rounding to 2.5M at two significant figures).
Source: proof.py JSON summary — adversarial_checks
Proof Logic
The claim is a four-part compound assertion about what a single, named source — the Topaz et al. 2026 correspondence in The Lancet — reports. Because the claim is about what the paper says (not about the ground-truth fabrication rate in PubMed Central), the appropriate evidence is verbatim text from that paper.
SC1 — Corpus size (≈2.5 million papers in PubMed Central). The paper's methods sentence (B1) states the system scanned "2 471 758 papers and 125 615 773 structured references" from PubMed Central's Open Access subset. The paper's own title rounds this to "2·5 million biomedical papers," matching the claim's "2.5 million" figure.
SC2 — Time window (January 2023 through February 2026). The same methods sentence (B2) specifies "Jan 1, 2023, to Feb 18, 2026." The claim's month-resolution phrasing "January 2023 through February 2026" is a faithful summary of this interval; both endpoints fall within the named months, with the early-2026 quarter explicitly flagged in the paper as an incomplete (7-week) period.
SC3 — 4,046 fabricated references. The paper's results sentence (B3) states verbatim: "Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers." The paper defines "fabricated references" as "references whose claimed titles correspond to no existing publication" — semantically equivalent to the claim's "references pointing to studies that do not exist."
SC4 — 2,810 affected papers. The same results sentence (B3) covers this count, and the paper's limitations section provides an independent in-paper restatement (B4): "Of the 2810 affected papers, 98·4% had received no publisher action at the time of our audit." Two distinct sentences in the same article report the same figure, providing internal corroboration.
Each verbatim quote was located in a pypdf-extracted text snapshot of the uploaded source PDF and confirmed with the proof-engine's citation verifier. All four sub-claim evaluations return True, so the compound claim_holds = n_holding == n_total = 4 == 4 = True.
Source: author analysis
The Topaz et al. 2026 Lancet correspondence reports exactly these four facts, and each one is confirmed by verbatim text in the paper itself.
What Was Claimed?
The claim summarizes the headline findings of a recent study about citation integrity in biomedical research. A team led by Maxim Topaz at Columbia University built an automated pipeline to scan a very large slice of the published medical literature, looking for references that pointed to studies which — once you actually checked — did not appear to exist anywhere. The claim names the corpus size, the time period, the number of fabricated references found, and the number of distinct papers that contained them.
Why this matters: scientific papers cite each other to build chains of evidence. If a paper cites a study that does not exist, then any reader, peer reviewer, or clinical guideline writer relying on that citation is leaning on something fictional. The numbers in this claim describe the scale of that problem in 2023–2026 — a period that roughly overlaps with the rapid adoption of large language models, which are known to invent plausible-looking but fictitious references.
What Did We Find?
The source paper does state all four figures, with very close textual matches.
The methods section says the team scanned "PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026: 2 471 758 papers and 125 615 773 structured references." That is the corpus and the time window. The paper's own title rounds the corpus to "2·5 million biomedical papers," matching the claim's "2.5 million." The time window is summarized in the claim as "January 2023 through February 2026," which is a faithful month-level description of the exact endpoints.
The results section then says: "Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers." That is both of the remaining figures in the claim, in a single sentence. The paper's limitations section also independently restates the affected-paper count: "Of the 2810 affected papers, 98·4% had received no publisher action at the time of our audit."
A relevant nuance from the supplementary appendix is worth surfacing. The main paper uses the word "fabricated" as if these references definitely point to nothing. The supplement is more careful — it calls them "suspected fabricated references" and reports that the pipeline had 91% precision on a blinded validation by three independent reviewers. So while the paper headlines 4,046, the more rigorous reading is that roughly 91% of those — about 3,680 — are very likely true references to nothing, and the rest are likely false positives. That does not change what the paper reports, which is what the claim is about. But it is worth knowing if you are quoting the figure yourself.
The paper was published only eleven days before this proof was run, so there has been essentially no time for retractions or corrections to appear. No such notices are attached to the version on file.
What Should You Keep In Mind?
This proof verifies what the paper says, not whether the world is exactly as the paper says. The pipeline's 91% precision means the 4,046 headline figure is the pipeline's count, not a ground-truth count of confirmed nonexistent studies — the rigorous count is somewhat lower. The pipeline also cannot estimate recall, so the true total of fabricated references in this corpus may be higher than 4,046 if any slipped past the filters. The paper excluded 23% of references that lacked a PubMed identifier or DOI; fabricated references could be more or less common in that excluded slice. And PubMed Central's Open Access subset is not the full biomedical literature — the paper acknowledges this. None of these caveats affects the claim, but they matter for anyone planning to use the 4,046 figure.
The paper itself is a single Lancet correspondence, not an independently replicated finding. As replication and follow-up work appear over time, the figures may need updating.
How Was This Verified?
The proof located each numeric figure in a verbatim quote from the source PDF, then used the proof-engine's citation verifier to confirm that each quote actually appears in the paper. Four sub-claims, all confirmed. See the structured proof report for the evidence summary, the full verification audit for the raw computation traces and source credibility assessment, or re-run the proof yourself.
Before any verdict ships, the engine runs adversarial searches for evidence that could break the proof. 4 were run here.
| subject | Topaz, Roguin, Gupta, Zhang, Peltonen — 'Fabricated citations: an audit across 2.5 million biomedical papers', Lancet 2026; 407: 1779-81 (correspondence) |
|---|---|
| threshold | |
| note | All four sub-claims must hold for the compound claim to be PROVED. Each sub-claim is verified against verbatim text from the Topaz et al. 2026 correspondence article in The Lancet (the primary and only authoritative source for what that paper reports). |
| sub-claims | SC1 |
| SC2 | |
| SC3 | |
| SC4 |
[✓] sc1_corpus_size [snapshot]: Full quote verified for sc1_corpus_size (source: tier 4/academic)
[✓] sc2_date_range [snapshot]: Full quote verified for sc2_date_range (source: tier 4/academic)
[✓] sc3_fab_count: Full quote verified for sc3_fab_count (source: tier 4/academic)
[✓] sc4_affected_papers [snapshot]: Full quote verified for sc4_affected_papers (source: tier 4/academic)
SC1: corpus size ≈ 2.5M papers in PMC: 1 >= 1 = True
SC2: date range Jan 2023 - Feb 2026: 1 >= 1 = True
SC3: 4046 fabricated references: 1 >= 1 = True
SC4: 2810 affected papers: 1 >= 1 = True
compound: all sub-claims hold: 4 == 4 = True
Source: proof.py inline output (execution trace)
Four adversarial questions were investigated; none break the proof.
The paper's headline term is "fabricated references," but the supplementary appendix uses the more cautious "suspected fabricated references" and reports a pipeline precision of 91% (Fleiss' κ = 0.71, from a 500-entry masked validation by three independent reviewers). Strictly, this means the true number of references-to-nothing is approximately 4046 × 0.91 ≈ 3,682 — not exactly 4,046. The claim under verification, however, is about what the paper reports, and the paper reports 4,046 as the headline figure without precision adjustment.
The paper was published as Lancet correspondence on May 9, 2026; today's date is May 20, 2026. Eleven days post-publication leaves essentially no window for a formal retraction or correction to appear, and none is attached to the uploaded PDF.
The claim phrases the date range as "January 2023 through February 2026," while the paper specifies "Jan 1, 2023, to Feb 18, 2026." The month-level phrasing is a common and accurate way to summarize an interval ending Feb 18 — it does not imply analysis through Feb 28.
The "2.5 million" figure matches the paper's own headline rounding of 2,471,758 (which is 2.47M, rounding to 2.5M at two significant figures).
Source: proof.py JSON summary — adversarial_checks
audit trail · Detailed Evidence
All 4 citations verified.
Original audit log
All four citations resolve to the same source URL (the Lancet article PIIS0140-6736(26)00603-3) and are verified against a pypdf-extracted text snapshot of the user-uploaded PDF.
- B1 — SC1 corpus size: Status
verified· Methodfull_quote· Fetch modesnapshot(snapshot_source:user_uploaded_pdf:pypdf_extract). Verbatim status: verbatim. - B2 — SC2 date range: Status
verified· Methodfull_quote· Fetch modesnapshot. Verbatim status: verbatim. - B3 — SC3 fab count: Status
verified· Methodfull_quote· Fetch modelive(the verifier was able to confirm this quote without snapshot fallback — likely because the substring is short and stable enough to match against the Lancet page text). Verbatim status: verbatim. - B4 — SC4 affected papers: Status
verified· Methodfull_quote· Fetch modesnapshot. Verbatim status: verbatim.
No citations are in not_found, partial, or fetch_failed state. No citation recovery loop was needed.
Source: proof.py JSON summary — evidence[].verification
| Field | Value |
|---|---|
| Subject | Topaz, Roguin, Gupta, Zhang, Peltonen — Lancet 2026; 407: 1779-81 |
| Purpose | fact_verification |
| Compound operator | AND |
| Proof direction | affirm |
| SC1 property | Corpus size ≈ 2.5 million biomedical papers in PubMed Central |
| SC2 property | Time window: January 2023 through February 2026 |
| SC3 property | Number of fabricated references identified = 4,046 |
| SC4 property | Number of distinct papers containing fabricated refs = 2,810 |
| Threshold per SC | 1 verified in-source quote |
Source: proof.py JSON summary — claim_formal
The natural-language claim asserts four specific facts about a single named source: the Topaz et al. 2026 correspondence in The Lancet titled "Fabricated citations: an audit across 2·5 million biomedical papers" (vol 407, pp 1779-1781, published May 9, 2026). The claim's logical structure is Source S reports facts F1 ∧ F2 ∧ F3 ∧ F4, where:
- F1: corpus = ~2.5 million biomedical papers in PubMed Central
- F2: window = January 2023 through February 2026
- F3: fabricated references identified = 4,046
- F4: distinct papers affected = 2,810
The compound operator is AND — all four must hold. The proof_direction is "affirm" (this is not a disproof). The formal interpretation is a faithful 1:1 mapping of the natural-language claim with one explicit operationalization: "studies that do not exist" is interpreted as the paper's own definition, "references whose claimed titles correspond to no existing publication." The claim's "2.5 million" is interpreted as the paper's own headline rounding of 2,471,758 (the exact figure reported in methods).
Formalization scope: The claim is verified against what the paper reports, not against the ground-truth fabrication rate in PubMed Central. The paper's own supplementary appendix uses the more cautious term "suspected fabricated references" and reports 91% pipeline precision. This is documented in the operator_note for SC3 and in the adversarial checks.
Source: proof.py JSON summary — claim_natural, claim_formal
| Fact ID | Domain | Type | Note |
|---|---|---|---|
| B1 | thelancet.com | Academic | Tier 4 — known academic/scholarly publisher |
| B2 | thelancet.com | Academic | Tier 4 — known academic/scholarly publisher |
| B3 | thelancet.com | Academic | Tier 4 — known academic/scholarly publisher |
| B4 | thelancet.com | Academic | Tier 4 — known academic/scholarly publisher |
All four citations resolve to the same source — The Lancet, a tier-4 academic publisher. No flagged or unclassified sources.
Source: proof.py JSON summary — evidence[].verification.credibility
[✓] sc1_corpus_size [snapshot]: Full quote verified for sc1_corpus_size (source: tier 4/academic)
[✓] sc2_date_range [snapshot]: Full quote verified for sc2_date_range (source: tier 4/academic)
[✓] sc3_fab_count: Full quote verified for sc3_fab_count (source: tier 4/academic)
[✓] sc4_affected_papers [snapshot]: Full quote verified for sc4_affected_papers (source: tier 4/academic)
SC1: corpus size ≈ 2.5M papers in PMC: 1 >= 1 = True
SC2: date range Jan 2023 - Feb 2026: 1 >= 1 = True
SC3: 4046 fabricated references: 1 >= 1 = True
SC4: 2810 affected papers: 1 >= 1 = True
compound: all sub-claims hold: 4 == 4 = True
Source: proof.py inline output (execution trace)
This proof has an intrinsic single-source structure: the claim is about what the Topaz et al. 2026 paper itself reports. There is no second authoritative source for "what the paper says" other than the paper. The proof therefore declares a single in-source quote per sub-claim and accepts that the validator's "≥2 sources per SC" guidance does not apply to this kind of claim.
For SC4 (2,810 affected papers), the paper itself provides a measure of internal corroboration: the same count appears in two distinct sentences (the results sentence "4046 fabricated references across 2810 papers" and the limitations sentence "Of the 2810 affected papers, 98·4% had received no publisher action"). B3 and B4 are extracted from these two sentences respectively.
COI assessment: No COI flags are recorded. The single source (Topaz et al.) is the paper being characterized by the claim — this is by design of the claim, not a conflict of interest. The proof does not evaluate whether Topaz et al.'s reported counts reflect ground-truth fabrications in PubMed Central; that is a separate empirical claim outside the scope of this proof.
Source: proof.py JSON summary — cross_checks
- Rule 1 — No hand-typed extracted values: PASS. Sub-claim outcomes derive from
verify_all_citations()output, not hand-typed. - Rule 2 — Citation verification: PASS. All four citations fetched/snapshot-verified via the
proof_citationspackage. - Rule 3 — System time: N/A. Proof is not time-sensitive.
- Rule 4 — Claim interpretation: PASS. CLAIM_FORMAL contains operator_note for each sub-claim and an aggregate operator_note.
- Rule 5 — Adversarial checks: PASS. Four counter-evidence questions investigated; findings documented.
- Rule 6 — Cross-checks: Single-source by design. The proof structure is correct for "what does the paper report" claims; the validator's per-SC source-count warning is acknowledged and explained above.
- Rule 7 — No hard-coded constants/formulas: PASS. Uses
compare()andapply_verdict_qualifier()from the proof-engine computations module. - validate_proof.py result: PASS with warnings — 23/27 checks passed, 0 issues, 4 warnings (all four warnings are the per-SC single-source notes addressed above).
| Fact ID | Extracted value | Found in quote | Quote snippet |
|---|---|---|---|
| B1 | verified | true | We developed an automated reference verification system scanning PubMed Central's O |
| B2 | verified | true | scanning PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026 |
| B3 | verified | true | Among 97·1 million verified references, we identified 4046 fabricated references a |
| B4 | verified | true | Of the 2810 affected papers, 98·4% had received no publisher action at the time of |
Extraction method: full-quote presence check against snapshot. No value parsing was needed — the claim is a presence/match of named figures inside named quotes, not a derived numeric computation.
Source: proof.py JSON summary — evidence[].extraction; method narrative: author analysis
| ID | Fact | Verified |
|---|---|---|
| B1 | SC1: corpus size from Topaz et al. 2026 methods | Yes |
| B2 | SC2: date range from Topaz et al. 2026 methods | Yes |
| B3 | SC3: 4046 fabricated references (Topaz et al. 2026 results) | Yes |
| B4 | SC4: 2810 affected papers (Topaz et al. 2026 limitations) | Yes |
| A1 | SC1 verified-source count | Computed: 1 quote verified in source paper |
| A2 | SC2 verified-source count | Computed: 1 quote verified in source paper |
| A3 | SC3 verified-source count | Computed: 1 quote verified in source paper |
| A4 | SC4 verified-source count | Computed: 1 quote verified in source paper |
All four citations resolve against the Topaz et al. 2026 PDF snapshot at tier-4 academic credibility (thelancet.com).
Cite this proof
Proof Engine. (2026). Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved. https://doi.org/10.5281/zenodo.20306620
Proof Engine. "Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved." 2026. https://doi.org/10.5281/zenodo.20306620.
@misc{proofengine_topaz_et_al_2026_analyzed_2_5_million_biomedical_papers_in_pubmed_central_from,
title = {Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved},
author = {{Proof Engine}},
year = {2026},
url = {https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/},
note = {Verdict: PROVED. Generated by proof-engine v1.34.0},
doi = {10.5281/zenodo.20306620},
}TY - DATA TI - Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved AU - Proof Engine PY - 2026 UR - https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/ N1 - Verdict: PROVED. Generated by proof-engine v1.34.0 DO - 10.5281/zenodo.20306620 ER -
View proof source
This is the exact proof.py that was deposited to Zenodo and runs when you re-execute via Binder. Every fact in the verdict above traces to code below.
"""
Proof: Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed
Central from January 2023 through February 2026 and identified 4,046 references
pointing to studies that do not exist, distributed across 2,810 papers.
Generated: 2026-05-20
"""
import os
import sys
PROOF_ENGINE_ROOT = os.environ.get("PROOF_ENGINE_ROOT")
if not PROOF_ENGINE_ROOT:
_d = os.path.dirname(os.path.abspath(__file__))
while _d != os.path.dirname(_d):
if os.path.isdir(os.path.join(_d, "proof-engine", "skills", "proof-engine", "scripts")):
PROOF_ENGINE_ROOT = os.path.join(_d, "proof-engine", "skills", "proof-engine")
break
_d = os.path.dirname(_d)
if not PROOF_ENGINE_ROOT:
# Fall back to local proof-citations package if skill scripts unavailable.
PROOF_ENGINE_ROOT = None
if PROOF_ENGINE_ROOT:
sys.path.insert(0, PROOF_ENGINE_ROOT)
from scripts.verify_citations import verify_all_citations # noqa: E402
from scripts.computations import compare, apply_verdict_qualifier # noqa: E402
from scripts.proof_summary import ProofSummaryBuilder # noqa: E402
else:
# Use proof-citations package directly (skill scripts are thin shims over it).
from proof_citations.verify import verify_all_citations # type: ignore
from proof_citations.computations import compare, apply_verdict_qualifier # type: ignore
from proof_citations.proof_summary import ProofSummaryBuilder # type: ignore
# 1. CLAIM INTERPRETATION (Rule 4)
CLAIM_NATURAL = (
"Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed "
"Central from January 2023 through February 2026 and identified 4,046 "
"references pointing to studies that do not exist, distributed across "
"2,810 papers."
)
CLAIM_FORMAL = {
"subject": "Topaz, Roguin, Gupta, Zhang, Peltonen — 'Fabricated citations: "
"an audit across 2.5 million biomedical papers', Lancet 2026; "
"407: 1779-81 (correspondence)",
"purpose": "fact_verification",
"sub_claims": [
{
"id": "SC1",
"property": "Corpus size ≈ 2.5 million biomedical papers in PubMed Central",
"operator": "==",
"threshold": True,
"operator_note": (
"Paper reports 2,471,758 papers scanned from PMC Open Access "
"subset; this rounds to '2.5 million' as used in the paper's "
"own title and the claim."
),
},
{
"id": "SC2",
"property": "Time window: January 2023 through February 2026",
"operator": "==",
"threshold": True,
"operator_note": (
"Paper specifies exactly Jan 1, 2023 to Feb 18, 2026. The "
"claim's month-resolution phrasing 'January 2023 through "
"February 2026' is a faithful summary of this interval."
),
},
{
"id": "SC3",
"property": "Number of fabricated references identified = 4,046",
"operator": "==",
"threshold": True,
"operator_note": (
"Paper's text says '4046 fabricated references'. The paper's "
"supplementary appendix uses the more cautious term 'suspected "
"fabricated references' and reports pipeline precision of 91% "
"— see adversarial checks. The claim describes them as "
"'references pointing to studies that do not exist', which "
"matches the paper's own definition: 'references whose "
"claimed titles correspond to no existing publication'."
),
},
{
"id": "SC4",
"property": "Number of distinct papers containing fabricated refs = 2,810",
"operator": "==",
"threshold": True,
"operator_note": (
"Paper states '4046 fabricated references across 2810 papers' "
"and later 'Of the 2810 affected papers, 98·4% had received "
"no publisher action'."
),
},
],
"compound_operator": "AND",
"proof_direction": "affirm",
"operator_note": (
"All four sub-claims must hold for the compound claim to be PROVED. "
"Each sub-claim is verified against verbatim text from the Topaz et "
"al. 2026 correspondence article in The Lancet (the primary and only "
"authoritative source for what that paper reports)."
),
"subclaim_to_sources": {
"SC1": ["sc1_corpus_size"],
"SC2": ["sc2_date_range"],
"SC3": ["sc3_fab_count"],
"SC4": ["sc4_affected_papers"],
},
}
# 2. FACT REGISTRY
FACT_REGISTRY = {
"B1": {"key": "sc1_corpus_size", "label": "SC1: corpus size from Topaz et al. 2026 methods"},
"B2": {"key": "sc2_date_range", "label": "SC2: date range from Topaz et al. 2026 methods"},
"B3": {"key": "sc3_fab_count", "label": "SC3: 4046 fabricated references (Topaz et al. 2026 results)"},
"B4": {"key": "sc4_affected_papers", "label": "SC4: 2810 affected papers (Topaz et al. 2026 limitations)"},
"A1": {"label": "SC1 verified-source count", "method": None, "result": None},
"A2": {"label": "SC2 verified-source count", "method": None, "result": None},
"A3": {"label": "SC3 verified-source count", "method": None, "result": None},
"A4": {"label": "SC4 verified-source count", "method": None, "result": None},
}
# 3. EMPIRICAL FACTS
# Source: the Topaz et al. 2026 Lancet correspondence (PIIS0140-6736(26)00603-3).
# The article is uploaded as a PDF; we rely on a verbatim text snapshot of the
# PDF (extracted with pypdf) as the citation-verification surface. The live URL
# on thelancet.com requires institutional access for the full text.
_PROOF_DIR = os.path.dirname(os.path.abspath(__file__))
def _load_snapshot(fname):
fpath = os.path.join(_PROOF_DIR, fname)
try:
with open(fpath) as f:
return f.read()
except FileNotFoundError:
return None
_PAPER_SNAPSHOT = _load_snapshot("snapshots/topaz_paper.txt")
_PAPER_URL = (
"https://www.thelancet.com/journals/lancet/article/"
"PIIS0140-6736(26)00603-3/fulltext"
)
_SOURCE_NAME = (
"Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: "
"an audit across 2·5 million biomedical papers. Lancet 2026; 407: 1779-81."
)
empirical_facts = {
# SC1 — corpus size. Quote is the methods sentence reporting the corpus.
"sc1_corpus_size": {
"quote": (
"We developed an automated reference verification system scanning "
"PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, "
"2026: 2 471 758 papers and 125 615 773 structured references."
),
"url": _PAPER_URL,
"source_name": _SOURCE_NAME,
"snapshot": _PAPER_SNAPSHOT,
"snapshot_source": "user_uploaded_pdf:pypdf_extract",
},
# SC2 — date range. Same methods sentence (Jan 1, 2023 to Feb 18, 2026).
"sc2_date_range": {
"quote": (
"scanning PubMed Central's Open Access subset from Jan 1, 2023, "
"to Feb 18, 2026"
),
"url": _PAPER_URL,
"source_name": _SOURCE_NAME,
"snapshot": _PAPER_SNAPSHOT,
"snapshot_source": "user_uploaded_pdf:pypdf_extract",
},
# SC3 — 4046 fabricated references. Results sentence.
"sc3_fab_count": {
"quote": (
"Among 97·1 million verified references, we identified 4046 "
"fabricated references across 2810 papers"
),
"url": _PAPER_URL,
"source_name": _SOURCE_NAME,
"snapshot": _PAPER_SNAPSHOT,
"snapshot_source": "user_uploaded_pdf:pypdf_extract",
},
# SC4 — 2810 affected papers. Limitations sentence that re-states the
# affected-paper count gives independent textual confirmation within the
# same article.
"sc4_affected_papers": {
"quote": (
"Of the 2810 affected papers, 98·4% had received no publisher "
"action at the time of our audit"
),
"url": _PAPER_URL,
"source_name": _SOURCE_NAME,
"snapshot": _PAPER_SNAPSHOT,
"snapshot_source": "user_uploaded_pdf:pypdf_extract",
},
}
# 4. CITATION VERIFICATION (Rule 2)
citation_results = verify_all_citations(empirical_facts, wayback_fallback=False)
# 5. PER-SUB-CLAIM VERIFICATION COUNTS
COUNTABLE_STATUSES = ("verified", "partial")
sc1_keys = ["sc1_corpus_size"]
sc2_keys = ["sc2_date_range"]
sc3_keys = ["sc3_fab_count"]
sc4_keys = ["sc4_affected_papers"]
n_sc1 = sum(1 for k in sc1_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
n_sc2 = sum(1 for k in sc2_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
n_sc3 = sum(1 for k in sc3_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
n_sc4 = sum(1 for k in sc4_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
# 6. PER-SUB-CLAIM EVALUATION
sc1_holds = compare(n_sc1, ">=", 1, label="SC1: corpus size ≈ 2.5M papers in PMC")
sc2_holds = compare(n_sc2, ">=", 1, label="SC2: date range Jan 2023 - Feb 2026")
sc3_holds = compare(n_sc3, ">=", 1, label="SC3: 4046 fabricated references")
sc4_holds = compare(n_sc4, ">=", 1, label="SC4: 2810 affected papers")
# 7. COMPOUND EVALUATION
n_holding = sum([sc1_holds, sc2_holds, sc3_holds, sc4_holds])
n_total = len(CLAIM_FORMAL["sub_claims"])
claim_holds = compare(n_holding, "==", n_total, label="compound: all sub-claims hold")
# 8. COI FLAGS (per sub-claim)
# All four sub-claims share a single source: the Topaz paper itself. Since the
# CLAIM under verification is "what does the Topaz paper REPORT," self-reporting
# by the paper IS the appropriate evidence — this is by design of the claim,
# not a COI on the proof. (A COI gate is for the proof's reasoning; we are not
# evaluating ground-truth fabrication counts, only whether the paper states
# those counts.)
sc1_coi_flags = []
sc2_coi_flags = []
sc3_coi_flags = []
sc4_coi_flags = []
# 9. ADVERSARIAL CHECKS (Rule 5)
adversarial_checks = [
{
"question": "Does the paper itself use weaker language than 'studies "
"that do not exist' for these 4,046 references?",
"verification_performed": (
"Searched the supplementary appendix (uploaded mmc1.pdf) for the "
"exact language used to describe the 4,046 entries. The main "
"Lancet correspondence uses 'fabricated references' (defined as "
"'references whose claimed titles correspond to no existing "
"publication'). The supplementary appendix consistently uses "
"'suspected fabricated references' and reports pipeline precision "
"of 91% (Fleiss' kappa = 0.71) on a 500-entry masked validation. "
"This means roughly 9% of the 4,046 entries may be false positives."
),
"finding": (
"The user's phrasing 'references pointing to studies that do not "
"exist' tracks the paper's own definition and headline term. "
"However, the rigorous statement is that the pipeline FLAGGED "
"4,046 entries as suspected fabrications with 91% precision, so "
"the true number of references-to-nothing is approximately "
"4046 * 0.91 ≈ 3682, not exactly 4,046. The claim under "
"verification is about what the paper REPORTS, and the paper "
"reports the figure 4,046 explicitly. No precision adjustment is "
"applied to the headline number in the paper's own text."
),
"breaks_proof": False,
},
{
"question": "Are there any independent retractions, corrections, or "
"rebuttals of the Topaz et al. 2026 finding that would "
"change the headline numbers?",
"verification_performed": (
"The paper was published as a Lancet correspondence on May 9, "
"2026 (vol 407, pp 1779-1781); today's date is May 20, 2026. "
"Eleven days post-publication leaves essentially no time for a "
"formal retraction or correction to appear. No such notice is "
"attached to the uploaded PDF. The user-supplied PDF and "
"supplement are the canonical source for what the paper reports."
),
"finding": (
"No retractions, corrections, or errata are known for the source "
"paper at the time of this proof. The headline figures (2.5M / "
"Jan 2023 - Feb 2026 / 4,046 / 2,810) appear unchanged."
),
"breaks_proof": False,
},
{
"question": "Could the date phrasing 'January 2023 through February "
"2026' overstate the actual interval?",
"verification_performed": (
"Compared the claim's interval ('January 2023 through February "
"2026') against the paper's exact interval ('Jan 1, 2023, to "
"Feb 18, 2026'). The claim describes both bounds at month "
"resolution; the paper specifies day-of-month bounds within "
"those months. The paper also notes the early-2026 quarter is "
"incomplete (Jan 1 - Feb 18 represents the first 7 weeks of "
"2026)."
),
"finding": (
"The claim's month-level phrasing is consistent with the paper. "
"It does not imply analysis through Feb 28, 2026; saying 'through "
"February 2026' to describe a period ending Feb 18, 2026 is a "
"common and accurate summary. No overstatement."
),
"breaks_proof": False,
},
{
"question": "Is the corpus-size figure '2.5 million' a fair "
"rounding of the paper's actual 2,471,758?",
"verification_performed": (
"2,471,758 rounded to one significant figure beyond 'millions' "
"gives 2.5 million (since 2.47 rounds up to 2.5 at 2 sig figs). "
"The paper itself uses '2·5 million' in its title and in the "
"supplementary appendix subtitle. The user's claim adopts the "
"paper's own rounding."
),
"finding": (
"'2.5 million' is the paper's own headline rounding. Match is "
"exact."
),
"breaks_proof": False,
},
]
# 10. VERDICT
if __name__ == "__main__":
any_unverified = any(
cr["status"] != "verified" for cr in citation_results.values()
)
any_breaks = any(ac.get("breaks_proof") for ac in adversarial_checks)
is_disproof = CLAIM_FORMAL.get("proof_direction") == "disprove"
# COI gates — none active for this proof; all sub-claims confirmed by the
# paper itself, which is the correct evidence for "what the paper reports."
any_coi_override = False
if any_breaks:
base_verdict = "UNDETERMINED"
elif any_coi_override:
base_verdict = "UNDETERMINED"
elif not claim_holds and n_holding > 0:
base_verdict = "PARTIALLY VERIFIED"
elif claim_holds:
base_verdict = "DISPROVED" if is_disproof else "PROVED"
elif not claim_holds and n_holding == 0:
base_verdict = "UNDETERMINED"
else:
base_verdict = "UNDETERMINED"
verdict = apply_verdict_qualifier(base_verdict, any_unverified)
builder = ProofSummaryBuilder(CLAIM_NATURAL, CLAIM_FORMAL)
sc_keys_map = {"SC1": sc1_keys, "SC2": sc2_keys, "SC3": sc3_keys, "SC4": sc4_keys}
for fid, info in FACT_REGISTRY.items():
if not fid.startswith("B"):
continue
ef_key = info["key"]
ef = empirical_facts[ef_key]
cr = citation_results.get(ef_key, {})
sub_claim = None
for sc, keys in sc_keys_map.items():
if ef_key in keys:
sub_claim = sc
break
builder.add_empirical_fact(
fid,
label=info["label"],
source_name=ef["source_name"],
source_url=ef["url"],
source_quote=ef["quote"],
sub_claim=sub_claim,
)
builder.set_verification(
fid,
status=cr.get("status", "unknown"),
method=cr.get("method", "full_quote"),
coverage_pct=cr.get("coverage_pct"),
fetch_mode=cr.get("fetch_mode", "snapshot"),
credibility=cr.get("credibility", {}),
)
builder.set_extraction(
fid,
value=cr.get("status", "unknown"),
value_in_quote=cr.get("status") in COUNTABLE_STATUSES,
quote_snippet=ef["quote"][:80],
)
fact_ids_by_sc = {
sc: [fid for fid, info in FACT_REGISTRY.items()
if fid.startswith("B") and info["key"] in keys]
for sc, keys in sc_keys_map.items()
}
n_by_sc = {"SC1": n_sc1, "SC2": n_sc2, "SC3": n_sc3, "SC4": n_sc4}
holds_by_sc = {"SC1": sc1_holds, "SC2": sc2_holds, "SC3": sc3_holds, "SC4": sc4_holds}
for i, sc in enumerate(["SC1", "SC2", "SC3", "SC4"], start=1):
builder.add_computed_fact(
f"A{i}",
label=f"{sc} verified-source count",
method=f"count(verified {sc.lower()} citations) = {n_by_sc[sc]}",
result=n_by_sc[sc],
depends_on=fact_ids_by_sc[sc],
sub_claim=sc,
)
builder.add_cross_check(
description=f"{sc}: in-source quote verification",
fact_ids=fact_ids_by_sc[sc],
n_sources_consulted=len(sc_keys_map[sc]),
n_sources_verified=n_by_sc[sc],
sources={k: citation_results[k]["status"] for k in sc_keys_map[sc]},
independence_note=(
"Single authoritative source (Topaz et al. 2026) — claim is "
"about what that paper reports."
),
coi_flags=[],
agreement=holds_by_sc[sc],
)
builder.add_sub_claim_result(
id=sc,
n_confirming=n_by_sc[sc],
threshold=1,
holds=holds_by_sc[sc],
)
for ac in adversarial_checks:
builder.add_adversarial_check(
question=ac["question"],
verification_performed=ac["verification_performed"],
finding=ac["finding"],
breaks_proof=ac["breaks_proof"],
)
builder.set_verdict(base_verdict, any_unverified=any_unverified)
builder.set_key_results(
n_holding=n_holding,
n_total=n_total,
claim_holds=claim_holds,
)
builder.emit()
Re-execute this proof
The verdict above is cached from when this proof was minted. To re-run the exact
proof.py shown in "View proof source" and see the verdict recomputed live,
launch it in your browser — no install required.
Re-execute the exact bytes deposited at Zenodo.
Re-execute in Binder runs in your browser · ~60s · no installFirst run takes longer while Binder builds the container image; subsequent runs are cached.
machine-readable formats
Downloads & raw data
found this useful? ★ star on github