▶ re-execute

Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.

Name: proof.json
Creator: Proof Engine
License: https://opensource.org/licenses/MIT

ai health ·4 adversarial checks ·4 sources · generated 2026-05-20 ·v1.34.0

PROVED

4 of 4 citations URL-verified

verdict

PROVED

transparency

4 / 4

citations URL-verified

robustness

4 / 4

adversarial challenges withstood

doi

10.5281/zenodo.20306620

immutable zenodo deposit

narrative

The Topaz et al. 2026 Lancet correspondence reports exactly these four facts, and each one is confirmed by verbatim text in the paper itself.

What Was Claimed?

The claim summarizes the headline findings of a recent study about citation integrity in biomedical research. A team led by Maxim Topaz at Columbia University built an automated pipeline to scan a very large slice of the published medical literature, looking for references that pointed to studies which — once you actually checked — did not appear to exist anywhere. The claim names the corpus size, the time period, the number of fabricated references found, and the number of distinct papers that contained them.

Why this matters: scientific papers cite each other to build chains of evidence. If a paper cites a study that does not exist, then any reader, peer reviewer, or clinical guideline writer relying on that citation is leaning on something fictional. The numbers in this claim describe the scale of that problem in 2023–2026 — a period that roughly overlaps with the rapid adoption of large language models, which are known to invent plausible-looking but fictitious references.

What Did We Find?

The source paper does state all four figures, with very close textual matches.

The methods section says the team scanned "PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026: 2 471 758 papers and 125 615 773 structured references." That is the corpus and the time window. The paper's own title rounds the corpus to "2·5 million biomedical papers," matching the claim's "2.5 million." The time window is summarized in the claim as "January 2023 through February 2026," which is a faithful month-level description of the exact endpoints.

The results section then says: "Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers." That is both of the remaining figures in the claim, in a single sentence. The paper's limitations section also independently restates the affected-paper count: "Of the 2810 affected papers, 98·4% had received no publisher action at the time of our audit."

A relevant nuance from the supplementary appendix is worth surfacing. The main paper uses the word "fabricated" as if these references definitely point to nothing. The supplement is more careful — it calls them "suspected fabricated references" and reports that the pipeline had 91% precision on a blinded validation by three independent reviewers. So while the paper headlines 4,046, the more rigorous reading is that roughly 91% of those — about 3,680 — are very likely true references to nothing, and the rest are likely false positives. That does not change what the paper reports, which is what the claim is about. But it is worth knowing if you are quoting the figure yourself.

The paper was published only eleven days before this proof was run, so there has been essentially no time for retractions or corrections to appear. No such notices are attached to the version on file.

What Should You Keep In Mind?

This proof verifies what the paper says, not whether the world is exactly as the paper says. The pipeline's 91% precision means the 4,046 headline figure is the pipeline's count, not a ground-truth count of confirmed nonexistent studies — the rigorous count is somewhat lower. The pipeline also cannot estimate recall, so the true total of fabricated references in this corpus may be higher than 4,046 if any slipped past the filters. The paper excluded 23% of references that lacked a PubMed identifier or DOI; fabricated references could be more or less common in that excluded slice. And PubMed Central's Open Access subset is not the full biomedical literature — the paper acknowledges this. None of these caveats affects the claim, but they matter for anyone planning to use the 4,046 figure.

The paper itself is a single Lancet correspondence, not an independently replicated finding. As replication and follow-up work appear over time, the figures may need updating.

How Was This Verified?

The proof located each numeric figure in a verbatim quote from the source PDF, then used the proof-engine's citation verifier to confirm that each quote actually appears in the paper. Four sub-claims, all confirmed. See the structured proof report for the evidence summary, the full verification audit for the raw computation traces and source credibility assessment, or re-run the proof yourself.

What could challenge this verdict?

Four adversarial questions were investigated; none break the proof.

The paper's headline term is "fabricated references," but the supplementary appendix uses the more cautious "suspected fabricated references" and reports a pipeline precision of 91% (Fleiss' κ = 0.71, from a 500-entry masked validation by three independent reviewers). Strictly, this means the true number of references-to-nothing is approximately 4046 × 0.91 ≈ 3,682 — not exactly 4,046. The claim under verification, however, is about what the paper reports, and the paper reports 4,046 as the headline figure without precision adjustment.

The paper was published as Lancet correspondence on May 9, 2026; today's date is May 20, 2026. Eleven days post-publication leaves essentially no window for a formal retraction or correction to appear, and none is attached to the uploaded PDF.

The claim phrases the date range as "January 2023 through February 2026," while the paper specifies "Jan 1, 2023, to Feb 18, 2026." The month-level phrasing is a common and accurate way to summarize an interval ending Feb 18 — it does not imply analysis through Feb 28.

The "2.5 million" figure matches the paper's own headline rounding of 2,471,758 (which is 2.47M, rounding to 2.5M at two significant figures).

Source: proof.py JSON summary — adversarial_checks

argument

Proof Logic

The claim is a four-part compound assertion about what a single, named source — the Topaz et al. 2026 correspondence in The Lancet — reports. Because the claim is about what the paper says (not about the ground-truth fabrication rate in PubMed Central), the appropriate evidence is verbatim text from that paper.

SC1 — Corpus size (≈2.5 million papers in PubMed Central). The paper's methods sentence (B1) states the system scanned "2 471 758 papers and 125 615 773 structured references" from PubMed Central's Open Access subset. The paper's own title rounds this to "2·5 million biomedical papers," matching the claim's "2.5 million" figure.

SC2 — Time window (January 2023 through February 2026). The same methods sentence (B2) specifies "Jan 1, 2023, to Feb 18, 2026." The claim's month-resolution phrasing "January 2023 through February 2026" is a faithful summary of this interval; both endpoints fall within the named months, with the early-2026 quarter explicitly flagged in the paper as an incomplete (7-week) period.

SC3 — 4,046 fabricated references. The paper's results sentence (B3) states verbatim: "Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers." The paper defines "fabricated references" as "references whose claimed titles correspond to no existing publication" — semantically equivalent to the claim's "references pointing to studies that do not exist."

SC4 — 2,810 affected papers. The same results sentence (B3) covers this count, and the paper's limitations section provides an independent in-paper restatement (B4): "Of the 2810 affected papers, 98·4% had received no publisher action at the time of our audit." Two distinct sentences in the same article report the same figure, providing internal corroboration.

Each verbatim quote was located in a pypdf-extracted text snapshot of the uploaded source PDF and confirmed with the proof-engine's citation verifier. All four sub-claim evaluations return True, so the compound claim_holds = n_holding == n_total = 4 == 4 = True.

Source: author analysis

narrative — hover paragraphs to highlight source

The Topaz et al. 2026 Lancet correspondence reports exactly these four facts, and each one is confirmed by verbatim text in the paper itself.

What Was Claimed?

What Did We Find?

The source paper does state all four figures, with very close textual matches.

What Should You Keep In Mind?

The paper itself is a single Lancet correspondence, not an independently replicated finding. As replication and follow-up work appear over time, the figures may need updating.

How Was This Verified?

              
              proof.py
              
loading proof.py…

Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: an audit across 2·5 million biomedical papers. Lancet 2026; 407: 1779-81.

www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext

"We developed an automated reference verification system scanning PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026: 2 471 758 papers and 125 615 773 structured references."

✓ verified tier-4 · Academic

Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: an audit across 2·5 million biomedical papers. Lancet 2026; 407: 1779-81.

www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext

"scanning PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026"

✓ verified tier-4 · Academic

Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: an audit across 2·5 million biomedical papers. Lancet 2026; 407: 1779-81.

www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext

"Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers"

✓ verified tier-4 · Academic

Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: an audit across 2·5 million biomedical papers. Lancet 2026; 407: 1779-81.

www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext

"Of the 2810 affected papers, 98·4% had received no publisher action at the time of our audit"

✓ verified tier-4 · Academic

Before any verdict ships, the engine runs adversarial searches for evidence that could break the proof. 4 were run here.

Does the paper itself use weaker language than 'studies that do not exist' for these 4,046 references?

held

›

search performed

Searched the supplementary appendix (uploaded mmc1.pdf) for the exact language used to describe the 4,046 entries. The main Lancet correspondence uses 'fabricated references' (defined as 'references whose claimed titles correspond to no existing publication'). The supplementary appendix consistently uses 'suspected fabricated references' and reports pipeline precision of 91% (Fleiss' kappa = 0.71) on a 500-entry masked validation. This means roughly 9% of the 4,046 entries may be false positives.

finding

The user's phrasing 'references pointing to studies that do not exist' tracks the paper's own definition and headline term. However, the rigorous statement is that the pipeline FLAGGED 4,046 entries as suspected fabrications with 91% precision, so the true number of references-to-nothing is approximately 4046 * 0.91 ≈ 3682, not exactly 4,046. The claim under verification is about what the paper REPORTS, and the paper reports the figure 4,046 explicitly. No precision adjustment is applied to the headline number in the paper's own text.

Are there any independent retractions, corrections, or rebuttals of the Topaz et al. 2026 finding that would change the headline numbers?

held

›

search performed

The paper was published as a Lancet correspondence on May 9, 2026 (vol 407, pp 1779-1781); today's date is May 20, 2026. Eleven days post-publication leaves essentially no time for a formal retraction or correction to appear. No such notice is attached to the uploaded PDF. The user-supplied PDF and supplement are the canonical source for what the paper reports.

finding

No retractions, corrections, or errata are known for the source paper at the time of this proof. The headline figures (2.5M / Jan 2023 - Feb 2026 / 4,046 / 2,810) appear unchanged.

Could the date phrasing 'January 2023 through February 2026' overstate the actual interval?

held

›

search performed

Compared the claim's interval ('January 2023 through February 2026') against the paper's exact interval ('Jan 1, 2023, to Feb 18, 2026'). The claim describes both bounds at month resolution; the paper specifies day-of-month bounds within those months. The paper also notes the early-2026 quarter is incomplete (Jan 1 - Feb 18 represents the first 7 weeks of 2026).

finding

The claim's month-level phrasing is consistent with the paper. It does not imply analysis through Feb 28, 2026; saying 'through February 2026' to describe a period ending Feb 18, 2026 is a common and accurate summary. No overstatement.

Is the corpus-size figure '2.5 million' a fair rounding of the paper's actual 2,471,758?

held

›

search performed

2,471,758 rounded to one significant figure beyond 'millions' gives 2.5 million (since 2.47 rounds up to 2.5 at 2 sig figs). The paper itself uses '2·5 million' in its title and in the supplementary appendix subtitle. The user's claim adopts the paper's own rounding.

finding

'2.5 million' is the paper's own headline rounding. Match is exact.

subject	Topaz, Roguin, Gupta, Zhang, Peltonen — 'Fabricated citations: an audit across 2.5 million biomedical papers', Lancet 2026; 407: 1779-81 (correspondence)
threshold
note	All four sub-claims must hold for the compound claim to be PROVED. Each sub-claim is verified against verbatim text from the Topaz et al. 2026 correspondence article in The Lancet (the primary and only authoritative source for what that paper reports).
sub-claims	SC1
	SC2
	SC3
	SC4

[✓] sc1_corpus_size [snapshot]: Full quote verified for sc1_corpus_size (source: tier 4/academic)
[✓] sc2_date_range [snapshot]: Full quote verified for sc2_date_range (source: tier 4/academic)
[✓] sc3_fab_count: Full quote verified for sc3_fab_count (source: tier 4/academic)
[✓] sc4_affected_papers [snapshot]: Full quote verified for sc4_affected_papers (source: tier 4/academic)
SC1: corpus size ≈ 2.5M papers in PMC: 1 >= 1 = True
SC2: date range Jan 2023 - Feb 2026: 1 >= 1 = True
SC3: 4046 fabricated references: 1 >= 1 = True
SC4: 2810 affected papers: 1 >= 1 = True
compound: all sub-claims hold: 4 == 4 = True

Source: proof.py inline output (execution trace)

counter-evidence

Four adversarial questions were investigated; none break the proof.

The "2.5 million" figure matches the paper's own headline rounding of 2,471,758 (which is 2.47M, rounding to 2.5M at two significant figures).

Source: proof.py JSON summary — adversarial_checks

sub-claim confidence

SC1 Corpus size ≈ 2.5 million...

Corpus size ≈ 2.5 million biomedical papers in PubMed Central

indeterminate

SC2 Time window: January 2023 through...

Time window: January 2023 through February 2026

indeterminate

SC3 Number of fabricated references...

Number of fabricated references identified = 4,046

indeterminate

SC4 Number of distinct papers...

Number of distinct papers containing fabricated refs = 2,810

indeterminate

4 sources

B1 www.thelancet.com/journ... T4·acad ✓

B2 www.thelancet.com/journ... T4·acad ✓

B3 www.thelancet.com/journ... T4·acad ✓

B4 www.thelancet.com/journ... T4·acad ✓

verified by

⬡ Proof Engine
open-source · re-runnable
methodology · re-run · github

audit trail · Detailed Evidence

Citation Verification 4/4 verified ▸

All 4 citations verified.

Original audit log

All four citations resolve to the same source URL (the Lancet article PIIS0140-6736(26)00603-3) and are verified against a pypdf-extracted text snapshot of the user-uploaded PDF.

B1 — SC1 corpus size: Status verified · Method full_quote · Fetch mode snapshot (snapshot_source: user_uploaded_pdf:pypdf_extract). Verbatim status: verbatim.
B2 — SC2 date range: Status verified · Method full_quote · Fetch mode snapshot. Verbatim status: verbatim.
B3 — SC3 fab count: Status verified · Method full_quote · Fetch mode live (the verifier was able to confirm this quote without snapshot fallback — likely because the substring is short and stable enough to match against the Lancet page text). Verbatim status: verbatim.
B4 — SC4 affected papers: Status verified · Method full_quote · Fetch mode snapshot. Verbatim status: verbatim.

No citations are in not_found, partial, or fetch_failed state. No citation recovery loop was needed.

Source: proof.py JSON summary — evidence[].verification

Claim Specification▸

Field	Value
Subject	Topaz, Roguin, Gupta, Zhang, Peltonen — Lancet 2026; 407: 1779-81
Purpose	fact_verification
Compound operator	AND
Proof direction	affirm
SC1 property	Corpus size ≈ 2.5 million biomedical papers in PubMed Central
SC2 property	Time window: January 2023 through February 2026
SC3 property	Number of fabricated references identified = 4,046
SC4 property	Number of distinct papers containing fabricated refs = 2,810
Threshold per SC	1 verified in-source quote

Source: proof.py JSON summary — claim_formal

Claim Interpretation▸

The natural-language claim asserts four specific facts about a single named source: the Topaz et al. 2026 correspondence in The Lancet titled "Fabricated citations: an audit across 2·5 million biomedical papers" (vol 407, pp 1779-1781, published May 9, 2026). The claim's logical structure is Source S reports facts F1 ∧ F2 ∧ F3 ∧ F4, where:

F1: corpus = ~2.5 million biomedical papers in PubMed Central
F2: window = January 2023 through February 2026
F3: fabricated references identified = 4,046
F4: distinct papers affected = 2,810

The compound operator is AND — all four must hold. The proof_direction is "affirm" (this is not a disproof). The formal interpretation is a faithful 1:1 mapping of the natural-language claim with one explicit operationalization: "studies that do not exist" is interpreted as the paper's own definition, "references whose claimed titles correspond to no existing publication." The claim's "2.5 million" is interpreted as the paper's own headline rounding of 2,471,758 (the exact figure reported in methods).

Formalization scope: The claim is verified against what the paper reports, not against the ground-truth fabrication rate in PubMed Central. The paper's own supplementary appendix uses the more cautious term "suspected fabricated references" and reports 91% pipeline precision. This is documented in the operator_note for SC3 and in the adversarial checks.

Source: proof.py JSON summary — claim_natural, claim_formal

Source Credibility Assessment▸

Fact ID	Domain	Type	Note
B1	thelancet.com	Academic	Tier 4 — known academic/scholarly publisher
B2	thelancet.com	Academic	Tier 4 — known academic/scholarly publisher
B3	thelancet.com	Academic	Tier 4 — known academic/scholarly publisher
B4	thelancet.com	Academic	Tier 4 — known academic/scholarly publisher

All four citations resolve to the same source — The Lancet, a tier-4 academic publisher. No flagged or unclassified sources.

Source: proof.py JSON summary — evidence[].verification.credibility

Computation Traces▸

[✓] sc1_corpus_size [snapshot]: Full quote verified for sc1_corpus_size (source: tier 4/academic)
[✓] sc2_date_range [snapshot]: Full quote verified for sc2_date_range (source: tier 4/academic)
[✓] sc3_fab_count: Full quote verified for sc3_fab_count (source: tier 4/academic)
[✓] sc4_affected_papers [snapshot]: Full quote verified for sc4_affected_papers (source: tier 4/academic)
SC1: corpus size ≈ 2.5M papers in PMC: 1 >= 1 = True
SC2: date range Jan 2023 - Feb 2026: 1 >= 1 = True
SC3: 4046 fabricated references: 1 >= 1 = True
SC4: 2810 affected papers: 1 >= 1 = True
compound: all sub-claims hold: 4 == 4 = True

Source: proof.py inline output (execution trace)

Independent Source Agreement▸

This proof has an intrinsic single-source structure: the claim is about what the Topaz et al. 2026 paper itself reports. There is no second authoritative source for "what the paper says" other than the paper. The proof therefore declares a single in-source quote per sub-claim and accepts that the validator's "≥2 sources per SC" guidance does not apply to this kind of claim.

For SC4 (2,810 affected papers), the paper itself provides a measure of internal corroboration: the same count appears in two distinct sentences (the results sentence "4046 fabricated references across 2810 papers" and the limitations sentence "Of the 2810 affected papers, 98·4% had received no publisher action"). B3 and B4 are extracted from these two sentences respectively.

COI assessment: No COI flags are recorded. The single source (Topaz et al.) is the paper being characterized by the claim — this is by design of the claim, not a conflict of interest. The proof does not evaluate whether Topaz et al.'s reported counts reflect ground-truth fabrications in PubMed Central; that is a separate empirical claim outside the scope of this proof.

Source: proof.py JSON summary — cross_checks

Quality Checks▸

Rule 1 — No hand-typed extracted values: PASS. Sub-claim outcomes derive from verify_all_citations() output, not hand-typed.
Rule 2 — Citation verification: PASS. All four citations fetched/snapshot-verified via the proof_citations package.
Rule 3 — System time: N/A. Proof is not time-sensitive.
Rule 4 — Claim interpretation: PASS. CLAIM_FORMAL contains operator_note for each sub-claim and an aggregate operator_note.
Rule 5 — Adversarial checks: PASS. Four counter-evidence questions investigated; findings documented.
Rule 6 — Cross-checks: Single-source by design. The proof structure is correct for "what does the paper report" claims; the validator's per-SC source-count warning is acknowledged and explained above.
Rule 7 — No hard-coded constants/formulas: PASS. Uses compare() and apply_verdict_qualifier() from the proof-engine computations module.
validate_proof.py result: PASS with warnings — 23/27 checks passed, 0 issues, 4 warnings (all four warnings are the per-SC single-source notes addressed above).

Source Data▸

Fact ID	Extracted value	Found in quote	Quote snippet
B1	verified	true	We developed an automated reference verification system scanning PubMed Central's O
B2	verified	true	scanning PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, 2026
B3	verified	true	Among 97·1 million verified references, we identified 4046 fabricated references a
B4	verified	true	Of the 2810 affected papers, 98·4% had received no publisher action at the time of

Extraction method: full-quote presence check against snapshot. No value parsing was needed — the claim is a presence/match of named figures inside named quotes, not a derived numeric computation.

Source: proof.py JSON summary — evidence[].extraction; method narrative: author analysis

Evidence Summary▸

ID	Fact	Verified
B1	SC1: corpus size from Topaz et al. 2026 methods	Yes
B2	SC2: date range from Topaz et al. 2026 methods	Yes
B3	SC3: 4046 fabricated references (Topaz et al. 2026 results)	Yes
B4	SC4: 2810 affected papers (Topaz et al. 2026 limitations)	Yes
A1	SC1 verified-source count	Computed: 1 quote verified in source paper
A2	SC2 verified-source count	Computed: 1 quote verified in source paper
A3	SC3 verified-source count	Computed: 1 quote verified in source paper
A4	SC4 verified-source count	Computed: 1 quote verified in source paper

All four citations resolve against the Topaz et al. 2026 PDF snapshot at tier-4 academic credibility (thelancet.com).

Cite this proof

DOI: 10.5281/zenodo.20306620 · all versions

APA Chicago BibTeX RIS

Proof Engine. (2026). Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved. https://doi.org/10.5281/zenodo.20306620

Proof Engine. "Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved." 2026. https://doi.org/10.5281/zenodo.20306620.

@misc{proofengine_topaz_et_al_2026_analyzed_2_5_million_biomedical_papers_in_pubmed_central_from,
  title   = {Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved},
  author  = {{Proof Engine}},
  year    = {2026},
  url     = {https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/},
  note    = {Verdict: PROVED. Generated by proof-engine v1.34.0},
  doi     = {10.5281/zenodo.20306620},
}

TY  - DATA
TI  - Claim Verification: “Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers.” — Proved
AU  - Proof Engine
PY  - 2026
UR  - https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/
N1  - Verdict: PROVED. Generated by proof-engine v1.34.0
DO  - 10.5281/zenodo.20306620
ER  -

View proof source 452 lines · 18.3 KB

This is the exact proof.py that was deposited to Zenodo and runs when you re-execute via Binder. Every fact in the verdict above traces to code below.

"""
Proof: Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed
Central from January 2023 through February 2026 and identified 4,046 references
pointing to studies that do not exist, distributed across 2,810 papers.

Generated: 2026-05-20
"""
import os
import sys

PROOF_ENGINE_ROOT = os.environ.get("PROOF_ENGINE_ROOT")
if not PROOF_ENGINE_ROOT:
    _d = os.path.dirname(os.path.abspath(__file__))
    while _d != os.path.dirname(_d):
        if os.path.isdir(os.path.join(_d, "proof-engine", "skills", "proof-engine", "scripts")):
            PROOF_ENGINE_ROOT = os.path.join(_d, "proof-engine", "skills", "proof-engine")
            break
        _d = os.path.dirname(_d)
    if not PROOF_ENGINE_ROOT:
        # Fall back to local proof-citations package if skill scripts unavailable.
        PROOF_ENGINE_ROOT = None
if PROOF_ENGINE_ROOT:
    sys.path.insert(0, PROOF_ENGINE_ROOT)
    from scripts.verify_citations import verify_all_citations  # noqa: E402
    from scripts.computations import compare, apply_verdict_qualifier  # noqa: E402
    from scripts.proof_summary import ProofSummaryBuilder  # noqa: E402
else:
    # Use proof-citations package directly (skill scripts are thin shims over it).
    from proof_citations.verify import verify_all_citations  # type: ignore
    from proof_citations.computations import compare, apply_verdict_qualifier  # type: ignore
    from proof_citations.proof_summary import ProofSummaryBuilder  # type: ignore


# 1. CLAIM INTERPRETATION (Rule 4)
CLAIM_NATURAL = (
    "Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed "
    "Central from January 2023 through February 2026 and identified 4,046 "
    "references pointing to studies that do not exist, distributed across "
    "2,810 papers."
)

CLAIM_FORMAL = {
    "subject": "Topaz, Roguin, Gupta, Zhang, Peltonen — 'Fabricated citations: "
               "an audit across 2.5 million biomedical papers', Lancet 2026; "
               "407: 1779-81 (correspondence)",
    "purpose": "fact_verification",
    "sub_claims": [
        {
            "id": "SC1",
            "property": "Corpus size ≈ 2.5 million biomedical papers in PubMed Central",
            "operator": "==",
            "threshold": True,
            "operator_note": (
                "Paper reports 2,471,758 papers scanned from PMC Open Access "
                "subset; this rounds to '2.5 million' as used in the paper's "
                "own title and the claim."
            ),
        },
        {
            "id": "SC2",
            "property": "Time window: January 2023 through February 2026",
            "operator": "==",
            "threshold": True,
            "operator_note": (
                "Paper specifies exactly Jan 1, 2023 to Feb 18, 2026. The "
                "claim's month-resolution phrasing 'January 2023 through "
                "February 2026' is a faithful summary of this interval."
            ),
        },
        {
            "id": "SC3",
            "property": "Number of fabricated references identified = 4,046",
            "operator": "==",
            "threshold": True,
            "operator_note": (
                "Paper's text says '4046 fabricated references'. The paper's "
                "supplementary appendix uses the more cautious term 'suspected "
                "fabricated references' and reports pipeline precision of 91% "
                "— see adversarial checks. The claim describes them as "
                "'references pointing to studies that do not exist', which "
                "matches the paper's own definition: 'references whose "
                "claimed titles correspond to no existing publication'."
            ),
        },
        {
            "id": "SC4",
            "property": "Number of distinct papers containing fabricated refs = 2,810",
            "operator": "==",
            "threshold": True,
            "operator_note": (
                "Paper states '4046 fabricated references across 2810 papers' "
                "and later 'Of the 2810 affected papers, 98·4% had received "
                "no publisher action'."
            ),
        },
    ],
    "compound_operator": "AND",
    "proof_direction": "affirm",
    "operator_note": (
        "All four sub-claims must hold for the compound claim to be PROVED. "
        "Each sub-claim is verified against verbatim text from the Topaz et "
        "al. 2026 correspondence article in The Lancet (the primary and only "
        "authoritative source for what that paper reports)."
    ),
    "subclaim_to_sources": {
        "SC1": ["sc1_corpus_size"],
        "SC2": ["sc2_date_range"],
        "SC3": ["sc3_fab_count"],
        "SC4": ["sc4_affected_papers"],
    },
}


# 2. FACT REGISTRY
FACT_REGISTRY = {
    "B1": {"key": "sc1_corpus_size",       "label": "SC1: corpus size from Topaz et al. 2026 methods"},
    "B2": {"key": "sc2_date_range",        "label": "SC2: date range from Topaz et al. 2026 methods"},
    "B3": {"key": "sc3_fab_count",         "label": "SC3: 4046 fabricated references (Topaz et al. 2026 results)"},
    "B4": {"key": "sc4_affected_papers",   "label": "SC4: 2810 affected papers (Topaz et al. 2026 limitations)"},
    "A1": {"label": "SC1 verified-source count", "method": None, "result": None},
    "A2": {"label": "SC2 verified-source count", "method": None, "result": None},
    "A3": {"label": "SC3 verified-source count", "method": None, "result": None},
    "A4": {"label": "SC4 verified-source count", "method": None, "result": None},
}


# 3. EMPIRICAL FACTS
# Source: the Topaz et al. 2026 Lancet correspondence (PIIS0140-6736(26)00603-3).
# The article is uploaded as a PDF; we rely on a verbatim text snapshot of the
# PDF (extracted with pypdf) as the citation-verification surface. The live URL
# on thelancet.com requires institutional access for the full text.

_PROOF_DIR = os.path.dirname(os.path.abspath(__file__))


def _load_snapshot(fname):
    fpath = os.path.join(_PROOF_DIR, fname)
    try:
        with open(fpath) as f:
            return f.read()
    except FileNotFoundError:
        return None


_PAPER_SNAPSHOT = _load_snapshot("snapshots/topaz_paper.txt")
_PAPER_URL = (
    "https://www.thelancet.com/journals/lancet/article/"
    "PIIS0140-6736(26)00603-3/fulltext"
)
_SOURCE_NAME = (
    "Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: "
    "an audit across 2·5 million biomedical papers. Lancet 2026; 407: 1779-81."
)

empirical_facts = {
    # SC1 — corpus size. Quote is the methods sentence reporting the corpus.
    "sc1_corpus_size": {
        "quote": (
            "We developed an automated reference verification system scanning "
            "PubMed Central's Open Access subset from Jan 1, 2023, to Feb 18, "
            "2026: 2 471 758 papers and 125 615 773 structured references."
        ),
        "url": _PAPER_URL,
        "source_name": _SOURCE_NAME,
        "snapshot": _PAPER_SNAPSHOT,
        "snapshot_source": "user_uploaded_pdf:pypdf_extract",
    },
    # SC2 — date range. Same methods sentence (Jan 1, 2023 to Feb 18, 2026).
    "sc2_date_range": {
        "quote": (
            "scanning PubMed Central's Open Access subset from Jan 1, 2023, "
            "to Feb 18, 2026"
        ),
        "url": _PAPER_URL,
        "source_name": _SOURCE_NAME,
        "snapshot": _PAPER_SNAPSHOT,
        "snapshot_source": "user_uploaded_pdf:pypdf_extract",
    },
    # SC3 — 4046 fabricated references. Results sentence.
    "sc3_fab_count": {
        "quote": (
            "Among 97·1 million verified references, we identified 4046 "
            "fabricated references across 2810 papers"
        ),
        "url": _PAPER_URL,
        "source_name": _SOURCE_NAME,
        "snapshot": _PAPER_SNAPSHOT,
        "snapshot_source": "user_uploaded_pdf:pypdf_extract",
    },
    # SC4 — 2810 affected papers. Limitations sentence that re-states the
    # affected-paper count gives independent textual confirmation within the
    # same article.
    "sc4_affected_papers": {
        "quote": (
            "Of the 2810 affected papers, 98·4% had received no publisher "
            "action at the time of our audit"
        ),
        "url": _PAPER_URL,
        "source_name": _SOURCE_NAME,
        "snapshot": _PAPER_SNAPSHOT,
        "snapshot_source": "user_uploaded_pdf:pypdf_extract",
    },
}


# 4. CITATION VERIFICATION (Rule 2)
citation_results = verify_all_citations(empirical_facts, wayback_fallback=False)


# 5. PER-SUB-CLAIM VERIFICATION COUNTS
COUNTABLE_STATUSES = ("verified", "partial")
sc1_keys = ["sc1_corpus_size"]
sc2_keys = ["sc2_date_range"]
sc3_keys = ["sc3_fab_count"]
sc4_keys = ["sc4_affected_papers"]

n_sc1 = sum(1 for k in sc1_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
n_sc2 = sum(1 for k in sc2_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
n_sc3 = sum(1 for k in sc3_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)
n_sc4 = sum(1 for k in sc4_keys if citation_results[k]["status"] in COUNTABLE_STATUSES)


# 6. PER-SUB-CLAIM EVALUATION
sc1_holds = compare(n_sc1, ">=", 1, label="SC1: corpus size ≈ 2.5M papers in PMC")
sc2_holds = compare(n_sc2, ">=", 1, label="SC2: date range Jan 2023 - Feb 2026")
sc3_holds = compare(n_sc3, ">=", 1, label="SC3: 4046 fabricated references")
sc4_holds = compare(n_sc4, ">=", 1, label="SC4: 2810 affected papers")


# 7. COMPOUND EVALUATION
n_holding = sum([sc1_holds, sc2_holds, sc3_holds, sc4_holds])
n_total = len(CLAIM_FORMAL["sub_claims"])
claim_holds = compare(n_holding, "==", n_total, label="compound: all sub-claims hold")


# 8. COI FLAGS (per sub-claim)
# All four sub-claims share a single source: the Topaz paper itself. Since the
# CLAIM under verification is "what does the Topaz paper REPORT," self-reporting
# by the paper IS the appropriate evidence — this is by design of the claim,
# not a COI on the proof. (A COI gate is for the proof's reasoning; we are not
# evaluating ground-truth fabrication counts, only whether the paper states
# those counts.)
sc1_coi_flags = []
sc2_coi_flags = []
sc3_coi_flags = []
sc4_coi_flags = []


# 9. ADVERSARIAL CHECKS (Rule 5)
adversarial_checks = [
    {
        "question": "Does the paper itself use weaker language than 'studies "
                    "that do not exist' for these 4,046 references?",
        "verification_performed": (
            "Searched the supplementary appendix (uploaded mmc1.pdf) for the "
            "exact language used to describe the 4,046 entries. The main "
            "Lancet correspondence uses 'fabricated references' (defined as "
            "'references whose claimed titles correspond to no existing "
            "publication'). The supplementary appendix consistently uses "
            "'suspected fabricated references' and reports pipeline precision "
            "of 91% (Fleiss' kappa = 0.71) on a 500-entry masked validation. "
            "This means roughly 9% of the 4,046 entries may be false positives."
        ),
        "finding": (
            "The user's phrasing 'references pointing to studies that do not "
            "exist' tracks the paper's own definition and headline term. "
            "However, the rigorous statement is that the pipeline FLAGGED "
            "4,046 entries as suspected fabrications with 91% precision, so "
            "the true number of references-to-nothing is approximately "
            "4046 * 0.91 ≈ 3682, not exactly 4,046. The claim under "
            "verification is about what the paper REPORTS, and the paper "
            "reports the figure 4,046 explicitly. No precision adjustment is "
            "applied to the headline number in the paper's own text."
        ),
        "breaks_proof": False,
    },
    {
        "question": "Are there any independent retractions, corrections, or "
                    "rebuttals of the Topaz et al. 2026 finding that would "
                    "change the headline numbers?",
        "verification_performed": (
            "The paper was published as a Lancet correspondence on May 9, "
            "2026 (vol 407, pp 1779-1781); today's date is May 20, 2026. "
            "Eleven days post-publication leaves essentially no time for a "
            "formal retraction or correction to appear. No such notice is "
            "attached to the uploaded PDF. The user-supplied PDF and "
            "supplement are the canonical source for what the paper reports."
        ),
        "finding": (
            "No retractions, corrections, or errata are known for the source "
            "paper at the time of this proof. The headline figures (2.5M / "
            "Jan 2023 - Feb 2026 / 4,046 / 2,810) appear unchanged."
        ),
        "breaks_proof": False,
    },
    {
        "question": "Could the date phrasing 'January 2023 through February "
                    "2026' overstate the actual interval?",
        "verification_performed": (
            "Compared the claim's interval ('January 2023 through February "
            "2026') against the paper's exact interval ('Jan 1, 2023, to "
            "Feb 18, 2026'). The claim describes both bounds at month "
            "resolution; the paper specifies day-of-month bounds within "
            "those months. The paper also notes the early-2026 quarter is "
            "incomplete (Jan 1 - Feb 18 represents the first 7 weeks of "
            "2026)."
        ),
        "finding": (
            "The claim's month-level phrasing is consistent with the paper. "
            "It does not imply analysis through Feb 28, 2026; saying 'through "
            "February 2026' to describe a period ending Feb 18, 2026 is a "
            "common and accurate summary. No overstatement."
        ),
        "breaks_proof": False,
    },
    {
        "question": "Is the corpus-size figure '2.5 million' a fair "
                    "rounding of the paper's actual 2,471,758?",
        "verification_performed": (
            "2,471,758 rounded to one significant figure beyond 'millions' "
            "gives 2.5 million (since 2.47 rounds up to 2.5 at 2 sig figs). "
            "The paper itself uses '2·5 million' in its title and in the "
            "supplementary appendix subtitle. The user's claim adopts the "
            "paper's own rounding."
        ),
        "finding": (
            "'2.5 million' is the paper's own headline rounding. Match is "
            "exact."
        ),
        "breaks_proof": False,
    },
]


# 10. VERDICT
if __name__ == "__main__":
    any_unverified = any(
        cr["status"] != "verified" for cr in citation_results.values()
    )
    any_breaks = any(ac.get("breaks_proof") for ac in adversarial_checks)
    is_disproof = CLAIM_FORMAL.get("proof_direction") == "disprove"

    # COI gates — none active for this proof; all sub-claims confirmed by the
    # paper itself, which is the correct evidence for "what the paper reports."
    any_coi_override = False

    if any_breaks:
        base_verdict = "UNDETERMINED"
    elif any_coi_override:
        base_verdict = "UNDETERMINED"
    elif not claim_holds and n_holding > 0:
        base_verdict = "PARTIALLY VERIFIED"
    elif claim_holds:
        base_verdict = "DISPROVED" if is_disproof else "PROVED"
    elif not claim_holds and n_holding == 0:
        base_verdict = "UNDETERMINED"
    else:
        base_verdict = "UNDETERMINED"
    verdict = apply_verdict_qualifier(base_verdict, any_unverified)

    builder = ProofSummaryBuilder(CLAIM_NATURAL, CLAIM_FORMAL)

    sc_keys_map = {"SC1": sc1_keys, "SC2": sc2_keys, "SC3": sc3_keys, "SC4": sc4_keys}

    for fid, info in FACT_REGISTRY.items():
        if not fid.startswith("B"):
            continue
        ef_key = info["key"]
        ef = empirical_facts[ef_key]
        cr = citation_results.get(ef_key, {})
        sub_claim = None
        for sc, keys in sc_keys_map.items():
            if ef_key in keys:
                sub_claim = sc
                break
        builder.add_empirical_fact(
            fid,
            label=info["label"],
            source_name=ef["source_name"],
            source_url=ef["url"],
            source_quote=ef["quote"],
            sub_claim=sub_claim,
        )
        builder.set_verification(
            fid,
            status=cr.get("status", "unknown"),
            method=cr.get("method", "full_quote"),
            coverage_pct=cr.get("coverage_pct"),
            fetch_mode=cr.get("fetch_mode", "snapshot"),
            credibility=cr.get("credibility", {}),
        )
        builder.set_extraction(
            fid,
            value=cr.get("status", "unknown"),
            value_in_quote=cr.get("status") in COUNTABLE_STATUSES,
            quote_snippet=ef["quote"][:80],
        )

    fact_ids_by_sc = {
        sc: [fid for fid, info in FACT_REGISTRY.items()
             if fid.startswith("B") and info["key"] in keys]
        for sc, keys in sc_keys_map.items()
    }

    n_by_sc = {"SC1": n_sc1, "SC2": n_sc2, "SC3": n_sc3, "SC4": n_sc4}
    holds_by_sc = {"SC1": sc1_holds, "SC2": sc2_holds, "SC3": sc3_holds, "SC4": sc4_holds}

    for i, sc in enumerate(["SC1", "SC2", "SC3", "SC4"], start=1):
        builder.add_computed_fact(
            f"A{i}",
            label=f"{sc} verified-source count",
            method=f"count(verified {sc.lower()} citations) = {n_by_sc[sc]}",
            result=n_by_sc[sc],
            depends_on=fact_ids_by_sc[sc],
            sub_claim=sc,
        )
        builder.add_cross_check(
            description=f"{sc}: in-source quote verification",
            fact_ids=fact_ids_by_sc[sc],
            n_sources_consulted=len(sc_keys_map[sc]),
            n_sources_verified=n_by_sc[sc],
            sources={k: citation_results[k]["status"] for k in sc_keys_map[sc]},
            independence_note=(
                "Single authoritative source (Topaz et al. 2026) — claim is "
                "about what that paper reports."
            ),
            coi_flags=[],
            agreement=holds_by_sc[sc],
        )
        builder.add_sub_claim_result(
            id=sc,
            n_confirming=n_by_sc[sc],
            threshold=1,
            holds=holds_by_sc[sc],
        )

    for ac in adversarial_checks:
        builder.add_adversarial_check(
            question=ac["question"],
            verification_performed=ac["verification_performed"],
            finding=ac["finding"],
            breaks_proof=ac["breaks_proof"],
        )

    builder.set_verdict(base_verdict, any_unverified=any_unverified)
    builder.set_key_results(
        n_holding=n_holding,
        n_total=n_total,
        claim_holds=claim_holds,
    )
    builder.emit()

↓ download proof.py · view on Zenodo (immutable)

Re-execute this proof

The verdict above is cached from when this proof was minted. To re-run the exact proof.py shown in "View proof source" and see the verdict recomputed live, launch it in your browser — no install required.

Re-execute the exact bytes deposited at Zenodo.

▶ Re-execute in Binder runs in your browser · ~60s · no install

First run takes longer while Binder builds the container image; subsequent runs are cached.

machine-readable formats

⬡ Jupyter Notebook interactive re-verification ⟳ W3C PROV-JSON provenance trace ⧉ RO-Crate 1.1 research object package

Downloads & raw data

↓ structured proof report ↓ narrative summary ↓ run the proof (Python) ↓ original audit log view on github raw data (JSON)

machine-readable formats

↓ interactive notebook (.ipynb) ↓ provenance trace (W3C PROV) ↓ research package (RO-Crate 1.1)

Embed this proof

Cite this proof in your wiki, docs, or README:

HTML

<a href="https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/" title="Topaz et al. (2026) analyzed 2.5 million biomedical papers in PubMed Central from January 2023 through February 2026 and identified 4,046 references pointing to studies that do not exist, distributed across 2,810 papers."><img src="https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/badge.svg" alt="proof: PROVED"/></a>

Markdown

[![proof](https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/badge.svg)](https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/)

SVG URL

https://proofengine.info/proofs/topaz-et-al-2026-analyzed-2-5-million-biomedical-papers-in-pubmed-central-from/badge.svg

Preview:

found this useful? ★ star on github