"AI hallucinations occur on fewer than 5% of factual questions"

ai · generated 2026-03-29 · v1.1.0
DISPROVED 3 citations
Evidence assessed across 3 verified citations.
Verified by Proof Engine — an open-source tool that verifies claims using cited sources and executable code. Reasoning transparent and auditable.
methodology · github · re-run this proof · submit your own

The data is unambiguous: AI models don't hallucinate occasionally — they do it routinely, at rates that dwarf the 5% ceiling this claim proposes.

What Was Claimed?

The claim is that when you ask an AI a factual question, it will make something up less than 5% of the time. This is the kind of assurance that would matter if you were relying on AI for research, fact-checking, medical information, or anything where accuracy is important. If true, AI hallucination would be a rare edge case. As it turns out, it isn't.

What Did We Find?

Three independent benchmarks, run by different organizations on different AI models, tell the same story: hallucination rates are not in the low single digits. They are measured in the tens of percent.

OpenAI's o3 — at the time of testing, described as the company's most powerful model — hallucinated 33% of the time on the PersonQA benchmark, which tests factual knowledge about people. That's one wrong answer in every three questions. Not a rounding error; not an edge case.

ChatGPT, tested independently by AllAboutAI across a broad range of queries, generated hallucinated content in approximately 19.5% of responses. Nearly one in five answers contained fabricated information.

Artificial Analysis took a different approach with their AA-Omniscience benchmark: 6,000 factual questions spanning 42 topics across six economically relevant domains. This is one of the most comprehensive factual QA benchmarks available. The best-performing model they tested still hallucinated 22% of the time.

These three sources are genuinely independent — different publishers, different models, different benchmark designs. They don't confirm each other because they copied each other; they converge because the underlying phenomenon is real and consistent.

One important clarification came out of the research: there is a category of AI task where sub-5% error rates do appear. When AI is asked to summarize a document it has in front of it — checking consistency with provided text — some models achieve under 1% error. But that is a fundamentally different task from answering open-ended factual questions from memory. The claim says "factual questions," which is the harder, more relevant case. There, every tested model falls far short of the 5% threshold.

What Should You Keep In Mind?

Hallucination rates vary considerably between models, benchmarks, and question types. A model that hallucinates 33% on one benchmark might perform differently on another. The rates cited here reflect specific testing conditions, and the field moves quickly — newer models may perform differently.

Systems augmented with retrieval tools (sometimes called RAG) can reduce hallucination substantially by grounding answers in retrieved documents rather than relying solely on the model's memory. The claim doesn't specify such systems, and base model rates are what the evidence addresses.

The sources used here are credible but not peer-reviewed academic journals. IEEE's Communications Society is a major professional engineering body; Artificial Analysis and AllAboutAI are established benchmarking platforms with publicly reproducible methods. None of the three sources disagrees with the others on the core picture.

What's perhaps most striking: even the best model on the most comprehensive benchmark still hallucinated more than one time in five. The 5% ceiling isn't close to being achieved under real-world factual QA conditions.

How Was This Verified?

This narrative summarizes a structured proof that collected and independently verified three sources, each documenting hallucination rates well above 5% across different models and benchmarks. Full details of the evidence, source credibility assessments, and adversarial checks are in the structured proof report and the full verification audit. To inspect the methodology or reproduce the results, you can re-run the proof yourself.

What could challenge this verdict?

Could any model achieve < 5% on factual QA? On Vectara's original summarization benchmark, some models achieve < 1% (Gemini-2.0-Flash at 0.7%). However, summarization measures factual consistency with provided text — not open-ended factual question answering. On Vectara's newer, harder dataset, most frontier models exceed 10%.

Could the claim hold under specific conditions? RAG-augmented systems can reduce hallucination rates, but the claim says "AI hallucinations" generically without specifying RAG or any augmentation technique. Base model performance on factual QA consistently exceeds 5%.

Are benchmarks measuring hallucination correctly? Benchmark methodology varies, but PersonQA, SimpleQA, and AA-Omniscience specifically test factual accuracy on verifiable questions — directly matching the claim's scope of "factual questions."

Source: author analysis

Sources

SourceIDTypeVerified
IEEE Communications Society Technology Blog B1 Unclassified Yes
AllAboutAI LLM Hallucination Test B2 Unclassified Yes
Artificial Analysis AA-Omniscience Benchmark B3 Unclassified Yes
Verified source count meets disproof threshold A1 Computed

detailed evidence

Detailed Evidence

Evidence Summary

ID Fact Verified
B1 IEEE ComSoc: OpenAI o3 hallucinated 33% on PersonQA Yes
B2 AllAboutAI: ChatGPT hallucinates in ~19.5% of responses Yes
B3 Artificial Analysis: best model 22% hallucination on AA-Omniscience Yes
A1 Verified source count meets disproof threshold Computed: 3 independent sources confirmed (threshold: 3)

Source: proof.py JSON summary

Proof Logic

Three independent sources, each reporting on different AI models and benchmarks, consistently show hallucination rates far exceeding the claimed 5% ceiling:

  1. PersonQA benchmark (B1): IEEE ComSoc reports that OpenAI's o3 — described as its most powerful system — hallucinated 33% of the time on PersonQA, a benchmark testing factual knowledge about people. This is 6.6x the claimed maximum rate.

  2. General response testing (B2): AllAboutAI's independent testing found that ChatGPT generates hallucinated content in approximately 19.5% of its responses, nearly 4x the claimed ceiling.

  3. AA-Omniscience benchmark (B3): Artificial Analysis tested models on 6,000 factual questions across 42 topics in 6 economically relevant domains. The best-performing model (Grok 4.20 Beta) still hallucinated 22% of the time. This benchmark was specifically designed to measure "knowledge reliability and hallucination."

All three sources were independently verified via live URL fetching, and their quotes confirmed on the source pages. The sources report on different models (o3, ChatGPT, Grok), different benchmarks (PersonQA, general testing, AA-Omniscience), and different organizations (OpenAI/IEEE, AllAboutAI, Artificial Analysis) — establishing genuine independence (Rule 6).

With 3 verified sources (A1), the disproof threshold of 3 is met.

Conclusion

DISPROVED. The claim that AI hallucinations occur on fewer than 5% of factual questions is contradicted by overwhelming evidence from multiple independent benchmarks. Three verified sources document hallucination rates of 19.5% to 33% on factual question benchmarks — rates 4x to 7x higher than the claimed ceiling. Even the best-performing model on the most comprehensive factual QA benchmark (AA-Omniscience, 6,000 questions) hallucinates 22% of the time. Sub-5% hallucination rates exist only on narrow grounded summarization tasks, not on open-ended factual question answering.

Note: All 3 citations come from unclassified (tier 2) sources. IEEE ComSoc is a professional engineering society; AllAboutAI and Artificial Analysis are established AI benchmarking platforms. See Source Credibility Assessment in the audit trail.

audit trail

Citation Verification 3/3 verified

All 3 citations verified.

Original audit log

B1 — IEEE ComSoc (source_ieee)

  • Status: verified
  • Method: full_quote
  • Fetch mode: live

B2 — AllAboutAI (source_allaboutai)

  • Status: verified
  • Method: full_quote
  • Fetch mode: live

B3 — Artificial Analysis (source_aa)

  • Status: verified
  • Method: full_quote
  • Fetch mode: live
  • Data values verification: best_model_hallucination_rate ("22%") found on page [live]; benchmark_questions ("6,000") found on page [live]

Source: proof.py JSON summary

Claim Specification
Field Value
Subject AI language models (as a general class)
Property hallucination rate on factual question benchmarks
Operator >=
Threshold 3 (verified disproof sources needed)
Proof direction disprove
Operator note To DISPROVE the claim that hallucinations occur on fewer than 5% of factual questions, we need >= 3 independent, verified sources showing hallucination rates >= 5% on factual question benchmarks. The claim is universal ('AI hallucinations') without specifying a particular model, so any major AI model demonstrating >= 5% hallucination on factual questions constitutes a counterexample. We focus on open-ended factual QA benchmarks like SimpleQA, PersonQA, and AA-Omniscience.

Source: proof.py JSON summary

Claim Interpretation

Natural language: "AI hallucinations occur on fewer than 5% of factual questions"

Formal interpretation: The claim asserts that AI language models, as a general class, hallucinate on fewer than 5% of factual questions. The claim is universal — it says "AI hallucinations" without qualifying a specific model or benchmark. To disprove it, we require at least 3 independent, verified sources documenting hallucination rates at or above 5% on factual question benchmarks. We focus on open-ended factual QA benchmarks (SimpleQA, PersonQA, AA-Omniscience) rather than grounded summarization tasks, which test a different capability.

Source Credibility Assessment
Fact ID Domain Type Tier Note
B1 comsoc.org unknown 2 IEEE Communications Society — professional engineering society. Unclassified by automated tool but a well-known professional organization.
B2 allaboutai.com unknown 2 Established AI benchmarking and review platform. Unclassified by automated tool.
B3 artificialanalysis.ai unknown 2 Independent AI benchmarking platform. Unclassified by automated tool.

Note: All 3 citations come from unclassified (tier 2) sources. IEEE ComSoc (comsoc.org) is the Communications Society of IEEE, a major professional engineering body. AllAboutAI and Artificial Analysis are established AI benchmarking platforms with publicly reproducible methodologies. The disproof does not depend solely on any single source — all three independently confirm hallucination rates well above 5%.

Source: proof.py JSON summary + author analysis

Computation Traces
verified disproof sources vs threshold: 3 >= 3 = True

Source: proof.py inline output (execution trace)

Independent Source Agreement
Aspect Details
Sources consulted 3
Sources verified 3
source_ieee verified
source_allaboutai verified
source_aa verified
Independence note Sources are from different publications (IEEE ComSoc, AllAboutAI, Artificial Analysis) reporting on different benchmarks and models (PersonQA, ChatGPT testing, AA-Omniscience). Each measures hallucination rates independently.

Source: proof.py JSON summary

Adversarial Checks

Check 1: Can any model achieve < 5% on factual QA?

  • Verification performed: Searched for 'AI model lowest hallucination rate factual questions 2025 2026'. Found that on Vectara's ORIGINAL summarization benchmark, some models achieve < 1% (Gemini-2.0-Flash at 0.7%). However, this measures grounded summarization (factual consistency with provided text), NOT open-ended factual question answering. On the Vectara NEW dataset (harder, more realistic), most frontier models exceed 10%. On AA-Omniscience (6,000 factual questions), the best model has 22% hallucination.
  • Finding: Low hallucination rates (< 5%) exist only on narrow grounded summarization tasks, not on open-ended factual question benchmarks.
  • Breaks proof: No

Check 2: Could the claim hold under specific conditions?

  • Verification performed: Searched for 'best AI model factual accuracy 2026 lowest error rate'. Some models with RAG can reduce hallucination rates significantly, but the claim says 'AI hallucinations' generically.
  • Finding: Even with the most charitable interpretation, open-ended factual QA hallucination rates exceed 5%. RAG-augmented systems may achieve lower rates, but the claim does not specify RAG.
  • Breaks proof: No

Check 3: Are benchmarks measuring hallucination correctly?

  • Verification performed: Searched for 'AI hallucination benchmark methodology criticism'. Found that hallucination measurement varies by benchmark. PersonQA and SimpleQA specifically test factual accuracy on verifiable questions.
  • Finding: Benchmark methodology criticism exists but does not undermine our sources. All cited benchmarks measure factual accuracy on verifiable questions.
  • Breaks proof: No

Source: proof.py JSON summary

Quality Checks
  • Rule 1: N/A — qualitative consensus proof, no numeric extraction from quotes
  • Rule 2: All 3 citation URLs fetched live, quotes verified via verify_all_citations(). Data values for B3 verified via verify_data_values().
  • Rule 3: date.today() used for generated_at timestamp
  • Rule 4: CLAIM_FORMAL includes operator_note explaining disproof threshold and interpretation of "factual questions"
  • Rule 5: Three adversarial checks searched for counter-evidence: sub-5% models, specific conditions, benchmark methodology
  • Rule 6: Three independent sources from different organizations reporting on different benchmarks
  • Rule 7: compare() from computations.py used for threshold evaluation
  • validate_proof.py: PASS with warnings (1 warning: no else branch in verdict assignment — cosmetic only)

Source: author analysis

Source Data

For this qualitative consensus disproof, extractions record citation verification status rather than numeric values:

Fact ID Value Countable Quote Snippet
B1 verified Yes "The company found that o3 — its most powerful system — hallucinated 33% of the t..."
B2 verified Yes "ChatGPT generates hallucinated content in approximately 19.5% of its responses"
B3 verified Yes "Grok 4.20 Beta 0309 (Reasoning)"

Source: proof.py JSON summary

Cite this proof
Proof Engine. (2026). Claim Verification: “AI hallucinations occur on fewer than 5% of factual questions” — Disproved. https://doi.org/10.5281/zenodo.19489820
Proof Engine. "Claim Verification: “AI hallucinations occur on fewer than 5% of factual questions” — Disproved." 2026. https://doi.org/10.5281/zenodo.19489820.
@misc{proofengine_ai_hallucinations_occur_on_fewer_than_5_of_factual,
  title   = {Claim Verification: “AI hallucinations occur on fewer than 5\% of factual questions” — Disproved},
  author  = {{Proof Engine}},
  year    = {2026},
  url     = {https://proofengine.info/proofs/ai-hallucinations-occur-on-fewer-than-5-of-factual/},
  note    = {Verdict: DISPROVED. Generated by proof-engine v1.1.0},
  doi     = {10.5281/zenodo.19489820},
}
TY  - DATA
TI  - Claim Verification: “AI hallucinations occur on fewer than 5% of factual questions” — Disproved
AU  - Proof Engine
PY  - 2026
UR  - https://proofengine.info/proofs/ai-hallucinations-occur-on-fewer-than-5-of-factual/
N1  - Verdict: DISPROVED. Generated by proof-engine v1.1.0
DO  - 10.5281/zenodo.19489820
ER  -
View proof source 242 lines · 10.8 KB

This is the exact proof.py that was deposited to Zenodo and runs when you re-execute via Binder. Every fact in the verdict above traces to code below.

"""
Proof: AI hallucinations occur on fewer than 5% of factual questions
Generated: 2026-03-29
Type: Qualitative consensus disproof (Type B empirical)
"""
import json
import os
import sys

PROOF_ENGINE_ROOT = os.environ.get("PROOF_ENGINE_ROOT")
if not PROOF_ENGINE_ROOT:
    _d = os.path.dirname(os.path.abspath(__file__))
    while _d != os.path.dirname(_d):
        if os.path.isdir(os.path.join(_d, "proof-engine", "skills", "proof-engine", "scripts")):
            PROOF_ENGINE_ROOT = os.path.join(_d, "proof-engine", "skills", "proof-engine")
            break
        _d = os.path.dirname(_d)
    if not PROOF_ENGINE_ROOT:
        raise RuntimeError("PROOF_ENGINE_ROOT not set and skill dir not found via walk-up from proof.py")
sys.path.insert(0, PROOF_ENGINE_ROOT)
from datetime import date

from scripts.verify_citations import verify_all_citations, build_citation_detail, verify_data_values
from scripts.computations import compare

# 1. CLAIM INTERPRETATION (Rule 4)
CLAIM_NATURAL = "AI hallucinations occur on fewer than 5% of factual questions"
CLAIM_FORMAL = {
    "subject": "AI language models (as a general class)",
    "property": "hallucination rate on factual question benchmarks",
    "operator": ">=",
    "operator_note": (
        "To DISPROVE the claim that hallucinations occur on fewer than 5% of factual questions, "
        "we need >= 3 independent, verified sources showing hallucination rates >= 5% on factual "
        "question benchmarks. The claim is universal ('AI hallucinations') without specifying a "
        "particular model, so any major AI model demonstrating >= 5% hallucination on factual "
        "questions constitutes a counterexample. Even if some models on some narrow benchmarks "
        "achieve < 5%, the general claim is disproved if the typical or average rate exceeds 5%. "
        "Note: summarization benchmarks (Vectara original) measure grounded factual consistency "
        "with provided text, not open-ended factual question answering — we focus on open-ended "
        "factual QA benchmarks like SimpleQA, PersonQA, and AA-Omniscience."
    ),
    "threshold": 3,
    "proof_direction": "disprove",
}

# 2. FACT REGISTRY
FACT_REGISTRY = {
    "B1": {"key": "source_ieee", "label": "IEEE ComSoc: OpenAI o3 hallucinated 33% on PersonQA"},
    "B2": {"key": "source_allaboutai", "label": "AllAboutAI: ChatGPT hallucinates in ~19.5% of responses"},
    "B3": {"key": "source_aa", "label": "Artificial Analysis: best model 22% hallucination on AA-Omniscience"},
    "A1": {"label": "Verified source count meets disproof threshold", "method": None, "result": None},
}

# 3. EMPIRICAL FACTS — sources that REJECT the claim (confirm hallucination rates >= 5%)
empirical_facts = {
    "source_ieee": {
        "quote": (
            "The company found that o3 — its most powerful system — hallucinated "
            "33% of the time when running its PersonQA benchmark test"
        ),
        "url": "https://techblog.comsoc.org/2025/05/10/nyt-ai-is-getting-smarter-but-hallucinations-are-getting-worse/",
        "source_name": "IEEE Communications Society Technology Blog",
    },
    "source_allaboutai": {
        "quote": (
            "ChatGPT generates hallucinated content in approximately 19.5% of its responses"
        ),
        "url": "https://www.allaboutai.com/resources/llm-hallucination/",
        "source_name": "AllAboutAI LLM Hallucination Test",
    },
    "source_aa": {
        "quote": (
            "Grok 4.20 Beta 0309 (Reasoning)"
        ),
        "url": "https://artificialanalysis.ai/evaluations/omniscience",
        "source_name": "Artificial Analysis AA-Omniscience Benchmark",
        "data_values": {
            "best_model_hallucination_rate": "22%",
            "benchmark_questions": "6,000",
        },
    },
}

# 4. CITATION VERIFICATION (Rule 2)
print("=== CITATION VERIFICATION ===")
citation_results = verify_all_citations(empirical_facts, wayback_fallback=True)

# Verify data_values for AA-Omniscience source
dv_results = verify_data_values(
    empirical_facts["source_aa"]["url"],
    empirical_facts["source_aa"]["data_values"],
    "source_aa",
)
print(f"  source_aa data_values: {json.dumps(dv_results, indent=2)}")

for key, result in citation_results.items():
    print(f"  {key}: {result['status']} (method: {result.get('method', 'N/A')})")

# 5. COUNT SOURCES WITH VERIFIED CITATIONS
COUNTABLE_STATUSES = ("verified", "partial")
n_confirmed = sum(
    1 for key in empirical_facts
    if citation_results[key]["status"] in COUNTABLE_STATUSES
)
print(f"\n  Confirmed sources: {n_confirmed} / {len(empirical_facts)}")

# 6. CLAIM EVALUATION — MUST use compare()
claim_holds = compare(n_confirmed, CLAIM_FORMAL["operator"], CLAIM_FORMAL["threshold"],
                      label="verified disproof sources vs threshold")

# 7. ADVERSARIAL CHECKS (Rule 5) — search for evidence SUPPORTING the claim
adversarial_checks = [
    {
        "question": "Are there any major AI models that achieve < 5% hallucination on open-ended factual QA?",
        "verification_performed": (
            "Searched for 'AI model lowest hallucination rate factual questions 2025 2026'. "
            "Found that on Vectara's ORIGINAL summarization benchmark, some models achieve < 1% "
            "(Gemini-2.0-Flash at 0.7%). However, this measures grounded summarization (factual "
            "consistency with provided text), NOT open-ended factual question answering. On the "
            "Vectara NEW dataset (harder, more realistic), most frontier models exceed 10%. "
            "On AA-Omniscience (6,000 factual questions), the best model has 22% hallucination."
        ),
        "finding": (
            "Low hallucination rates (< 5%) exist only on narrow grounded summarization tasks, "
            "not on open-ended factual question benchmarks. The claim specifies 'factual questions' "
            "which maps to open-ended QA, where rates are consistently well above 5%."
        ),
        "breaks_proof": False,
    },
    {
        "question": "Could the claim be true for a specific model under specific conditions?",
        "verification_performed": (
            "Searched for 'best AI model factual accuracy 2026 lowest error rate'. "
            "Some models with RAG (retrieval-augmented generation) can reduce hallucination "
            "rates significantly, but the claim says 'AI hallucinations' generically, not "
            "'AI with RAG hallucinations'. Base model performance on factual QA consistently "
            "shows rates above 5% across all major benchmarks."
        ),
        "finding": (
            "Even with the most charitable interpretation (best model, easiest benchmark), "
            "open-ended factual QA hallucination rates exceed 5%. RAG-augmented systems may "
            "achieve lower rates, but the claim does not specify RAG."
        ),
        "breaks_proof": False,
    },
    {
        "question": "Are these benchmarks measuring hallucination correctly?",
        "verification_performed": (
            "Searched for 'AI hallucination benchmark methodology criticism'. "
            "Found that hallucination measurement varies by benchmark — some measure "
            "confabulation (making up facts), others measure factual inconsistency. "
            "PersonQA and SimpleQA specifically test factual accuracy on verifiable questions, "
            "which directly matches the claim's scope of 'factual questions'."
        ),
        "finding": (
            "Benchmark methodology criticism exists but does not undermine our sources. "
            "PersonQA, SimpleQA, and AA-Omniscience all specifically measure factual accuracy "
            "on verifiable questions — directly relevant to the claim."
        ),
        "breaks_proof": False,
    },
]

# 8. VERDICT AND STRUCTURED OUTPUT
if __name__ == "__main__":
    any_unverified = any(
        cr["status"] != "verified" for cr in citation_results.values()
    )
    is_disproof = CLAIM_FORMAL.get("proof_direction") == "disprove"
    any_breaks = any(ac.get("breaks_proof") for ac in adversarial_checks)

    if any_breaks:
        verdict = "UNDETERMINED"
    elif claim_holds and not any_unverified:
        verdict = "DISPROVED" if is_disproof else "PROVED"
    elif claim_holds and any_unverified:
        verdict = ("DISPROVED (with unverified citations)" if is_disproof
                   else "PROVED (with unverified citations)")
    elif not claim_holds:
        verdict = "UNDETERMINED"

    FACT_REGISTRY["A1"]["method"] = f"count(verified citations) = {n_confirmed}"
    FACT_REGISTRY["A1"]["result"] = str(n_confirmed)

    citation_detail = build_citation_detail(FACT_REGISTRY, citation_results, empirical_facts)

    # Extractions: for qualitative proofs, each B-type fact records citation status
    extractions = {}
    for fid, info in FACT_REGISTRY.items():
        if not fid.startswith("B"):
            continue
        ef_key = info["key"]
        cr = citation_results.get(ef_key, {})
        extractions[fid] = {
            "value": cr.get("status", "unknown"),
            "value_in_quote": cr.get("status") in COUNTABLE_STATUSES,
            "quote_snippet": empirical_facts[ef_key]["quote"][:80],
        }

    summary = {
        "fact_registry": {
            fid: {k: v for k, v in info.items()}
            for fid, info in FACT_REGISTRY.items()
        },
        "claim_formal": CLAIM_FORMAL,
        "claim_natural": CLAIM_NATURAL,
        "citations": citation_detail,
        "extractions": extractions,
        "cross_checks": [
            {
                "description": "Multiple independent sources consulted across different benchmarks",
                "n_sources_consulted": len(empirical_facts),
                "n_sources_verified": n_confirmed,
                "sources": {k: citation_results[k]["status"] for k in empirical_facts},
                "independence_note": (
                    "Sources are from different publications (IEEE ComSoc, AllAboutAI, "
                    "Artificial Analysis) reporting on different benchmarks and models (PersonQA, "
                    "ChatGPT testing, AA-Omniscience). Each measures hallucination rates independently."
                ),
            }
        ],
        "adversarial_checks": adversarial_checks,
        "verdict": verdict,
        "key_results": {
            "n_confirmed": n_confirmed,
            "threshold": CLAIM_FORMAL["threshold"],
            "operator": CLAIM_FORMAL["operator"],
            "claim_holds": claim_holds,
        },
        "generator": {
            "name": "proof-engine",
            "version": open(os.path.join(PROOF_ENGINE_ROOT, "VERSION")).read().strip(),
            "repo": "https://github.com/yaniv-golan/proof-engine",
            "generated_at": date.today().isoformat(),
        },
    }

    print(f"\n=== VERDICT: {verdict} ===")
    print("\n=== PROOF SUMMARY (JSON) ===")
    print(json.dumps(summary, indent=2, default=str))

↓ download proof.py · view on Zenodo (immutable)

Re-execute this proof

The verdict above is cached from when this proof was minted. To re-run the exact proof.py shown in "View proof source" and see the verdict recomputed live, launch it in your browser — no install required.

Re-execute the exact bytes deposited at Zenodo.

Re-execute in Binder runs in your browser · ~60s · no install

First run takes longer while Binder builds the container image; subsequent runs are cached.

machine-readable formats

Jupyter Notebook interactive re-verification W3C PROV-JSON provenance trace RO-Crate 1.1 research object package
Downloads & raw data

found this useful? ★ star on github