# Audit: Using AI tools makes humans worse at critical thinking and original problem-solving.

- **Generated:** 2026-03-29
- **Reader summary:** [proof.md](proof.md)
- **Proof script:** [proof.py](proof.py)

## Claim Specification

| Field | Value |
|-------|-------|
| Subject | AI tool usage by humans |
| Property | associated with diminished critical thinking and problem-solving abilities |
| Operator | >= |
| Threshold | 3 independent verified sources |
| Proof direction | affirm |
| Operator note | The claim as stated is a universal causal assertion. We interpret it as: at least 3 independent, peer-reviewed or authoritative sources report that AI tool usage is associated with reduced critical thinking or problem-solving abilities. This is a consensus-of-evidence interpretation — the claim is PROVED if the weight of independent evidence supports the association, even though individual studies show correlation rather than proven causation. Important nuance: the evidence shows this effect is moderated by usage patterns, task stakes, and user confidence — heavy/uncritical use drives the decline, not all AI usage universally. |

*Source: proof.py JSON summary*

## Fact Registry

| ID | Key | Label |
|----|-----|-------|
| B1 | gerlich_2025 | Gerlich (2025): Negative correlation (r=-0.68) between AI usage and critical thinking scores in 666 participants |
| B2 | lee_chi_2025 | Lee et al. (2025, CHI): Higher confidence in GenAI associated with less critical thinking in 319 knowledge workers |
| B3 | harvard_gazette_2025 | Harvard Gazette (2025): Harvard faculty experts warn AI use undercuts critical thinking |
| B4 | pmc_cognitive_paradox | Jose et al. (2025, PMC): ChatGPT users scored 17% lower on concept understanding despite solving 48% more problems |
| A1 | — | Verified source count meets threshold |

*Source: proof.py JSON summary*

## Full Evidence Table

### Type A (Computed) Facts

| ID | Fact | Method | Result |
|----|------|--------|--------|
| A1 | Verified source count meets threshold | count(verified citations) = 4 | 4 |

*Source: proof.py JSON summary*

### Type B (Empirical) Facts

| ID | Fact | Source | URL | Quote | Status | Method | Credibility |
|----|------|--------|-----|-------|--------|--------|-------------|
| B1 | Gerlich (2025) study findings | PsyPost report on Gerlich (2025), Societies 15(1):6 | [Link](https://www.psypost.org/ai-tools-may-weaken-critical-thinking-skills-by-encouraging-cognitive-offloading-study-suggests/) | "Participants who reported heavy reliance on AI tools performed worse on critical thinking assessments compared to those who used these tools less frequently." | verified | full_quote | Tier 2 (unknown) |
| B2 | Lee et al. (2025) study findings | Microsoft Research — Lee et al. (2025), CHI 2025 | [Link](https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/) | "Higher confidence in GenAI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking." | verified | full_quote | Tier 2 (unknown) |
| B3 | Harvard faculty expert opinion | Harvard Gazette (2025) | [Link](https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling-our-minds/) | "I am very worried about the effects of general-use LLMs on critical reasoning skills" | verified | full_quote | Tier 4 (academic) |
| B4 | Cognitive paradox review findings | Jose et al. (2025), Frontiers — PMC | [Link](https://pmc.ncbi.nlm.nih.gov/articles/PMC12036037/) | "Excessive reliance may reduce cognitive engagement and long-term retention" | verified | full_quote | Tier 5 (government) |

*Source: proof.py JSON summary*

## Citation Verification Details

**B1 — Gerlich (2025) via PsyPost:**
- Status: verified
- Method: full_quote
- Fetch mode: live

**B2 — Lee et al. (2025) via Microsoft Research:**
- Status: verified
- Method: full_quote
- Fetch mode: live

**B3 — Harvard Gazette (2025):**
- Status: verified
- Method: full_quote
- Fetch mode: live

**B4 — Jose et al. (2025) via PMC:**
- Status: verified
- Method: full_quote
- Fetch mode: live

*Source: proof.py JSON summary*

## Computation Traces

```
  Confirmed sources: 4 / 4
  verified source count vs threshold: 4 >= 3 = True
```

*Source: proof.py inline output (execution trace)*

## Independent Source Agreement (Rule 6)

| Metric | Value |
|--------|-------|
| Sources consulted | 4 |
| Sources verified | 4 |
| gerlich_2025 | verified |
| lee_chi_2025 | verified |
| harvard_gazette_2025 | verified |
| pmc_cognitive_paradox | verified |

**Independence note:** Sources are from different institutions and research teams: (1) SBS Swiss Business School via PsyPost, (2) Microsoft Research via CHI 2025, (3) Harvard University via Harvard Gazette, (4) Multiple Indian universities via PMC/Frontiers. No two sources share authors or datasets.

*Source: proof.py JSON summary*

## Adversarial Checks (Rule 5)

### Check 1: Do any studies show AI tools IMPROVE critical thinking or problem-solving?

- **Verification performed:** Searched for "AI tools improve critical thinking enhance problem solving evidence study 2025 2026". Found that AI-powered classrooms can improve learning outcomes by 23-35% in STEM disciplines and language learning. Stanford research showed a 15% increase in scores for students using AI platforms. However, these gains are in knowledge acquisition, not in independent critical thinking or problem-solving ability. The PMC cognitive paradox paper itself notes that ChatGPT users solved 48% more problems but scored 17% lower on concept understanding — showing AI helps with task completion but may impair deeper cognitive engagement.
- **Finding:** AI tools can improve task performance and learning outcomes, but these benefits are distinct from critical thinking and independent problem-solving. The evidence consistently shows that while AI boosts productivity, it may simultaneously reduce the depth of cognitive engagement required for critical thinking.
- **Breaks proof:** No

### Check 2: Are the effects task-dependent rather than universal?

- **Verification performed:** Searched for Microsoft CHI 2025 findings on task-dependent effects. The Lee et al. study found that for high-stakes tasks requiring accuracy, workers expend MORE effort in critical thinking with AI. For routine, low-stakes tasks under time pressure, they report LESS critical thinking effort. This shows the effect is moderated by task stakes and user confidence, not universal.
- **Finding:** The cognitive decline effect is moderated by task stakes, user confidence, and usage patterns. This does not break the proof because: (1) the claim is supported by the overall pattern across multiple studies, (2) the operator_note explicitly acknowledges this nuance, and (3) even task-dependent effects confirm that AI usage CAN and DOES reduce critical thinking under common conditions (routine tasks, high AI confidence). The proof documents this important qualification.
- **Breaks proof:** No

### Check 3: Has the key Gerlich (2025) study been retracted or significantly corrected?

- **Verification performed:** Searched for "Gerlich 2025 AI Tools in Society correction retraction". Found a correction notice (Societies 2025, 15(9), 252) published September 2025. The correction addressed a duplicated table (Table 4 was a duplicate of Table 3). The author states the scientific conclusions are unaffected, and the correction was approved by the Academic Editor.
- **Finding:** The correction was minor (table duplication) and does not affect the study's findings or conclusions about the negative correlation between AI usage and critical thinking.
- **Breaks proof:** No

*Source: proof.py JSON summary*

## Source Credibility Assessment

| Fact ID | Domain | Type | Tier | Note |
|---------|--------|------|------|------|
| B1 | psypost.org | unknown | 2 | Unclassified domain — PsyPost is a science news site reporting on peer-reviewed research (Gerlich 2025, published in MDPI Societies) |
| B2 | microsoft.com | unknown | 2 | Unclassified domain — Microsoft Research publication page for peer-reviewed CHI 2025 paper |
| B3 | harvard.edu | academic | 4 | Academic domain (.edu) — Harvard University official gazette |
| B4 | nih.gov | government | 5 | Government domain (.gov) — NIH PubMed Central hosting peer-reviewed article |

Note: 2 citations (B1, B2) come from tier-2 (unclassified) domains. Both are reporting platforms for peer-reviewed research: B1 reports on a study published in MDPI Societies (peer-reviewed journal), and B2 is Microsoft Research's own page for a paper published at ACM CHI 2025 (top-tier HCI venue). The claim does not depend solely on these sources — B3 (tier 4) and B4 (tier 5) independently support it.

*Source: proof.py JSON summary + author analysis*

## Extraction Records

For this qualitative consensus proof, extractions record citation verification status rather than numeric values:

| Fact ID | Value (status) | Countable | Quote snippet |
|---------|---------------|-----------|---------------|
| B1 | verified | Yes | "Participants who reported heavy reliance on AI tools performed worse on critical..." |
| B2 | verified | Yes | "Higher confidence in GenAI is associated with less critical thinking, while high..." |
| B3 | verified | Yes | "I am very worried about the effects of general-use LLMs on critical reasoning sk..." |
| B4 | verified | Yes | "Excessive reliance may reduce cognitive engagement and long-term retention" |

*Source: proof.py JSON summary*

## Hardening Checklist

- **Rule 1:** N/A — qualitative consensus proof, no numeric value extraction
- **Rule 2:** All 4 citation URLs fetched and quotes verified via `verify_all_citations()` with full_quote matches
- **Rule 3:** `date.today()` used for generation timestamp
- **Rule 4:** CLAIM_FORMAL with operator_note explicitly documents the consensus-of-evidence interpretation and threshold rationale
- **Rule 5:** 3 adversarial checks performed: searched for pro-AI evidence, task-dependent effects, and study corrections
- **Rule 6:** 4 independent sources from different institutions (Swiss, Microsoft, Harvard, Indian universities) with no shared authors or datasets
- **Rule 7:** N/A — qualitative consensus proof, no formulas or constants
- **validate_proof.py result:** PASS with warnings (14/15 checks passed, 1 warning about missing else branch in verdict assignment — non-blocking)

*Source: author analysis*

---

Generated by [proof-engine](https://github.com/yaniv-golan/proof-engine) v1.2.0 on 2026-03-29.