I'm terrified of AI

Is it time to buy a farm yet?

Jun 09, 2025

The TLDR

I saw an AI write a full research paper. It got accepted with zero human edits.
1 in 5 researchers are already using AI for peer reviews while retractions are piling up.
The AI looked smart but still botched a basic citation. No one caught it.
I’m all for progress, but if we outsource judgment, peer review stops meaning anything.

I’m Both Impressed and Alarmed

I watched an AI write a complete research paper and push it through peer review at ICLR’s “I Can’t Believe It’s Not Better” workshop.

Sakana’s system handled:

The hypothesis
Code
Data
Prose

Then, landed an acceptance with no human edits.

BUT, I think it’s important to point out that the workshop track is lighter than the main conference.

It still has human reviewers who check relevance and rigour, but it’s worth mentioning.

Clearing that bar proves an autonomous model can navigate real scholarly gatekeeping, not just toy benchmarks.

The Early Signs Were Everywhere

A Wiley survey of almost five-thousand researchers already showed 19% are using large language models to speed up peer review.

Stanford’s James Zou reported that up to seventeen percent of recent peer reviews at major computer-science venues contain AI-generated text that goes far beyond spelling fixes.

Those numbers tell me the community has been priming itself for automation long before Sakana’s demo.

Retractions Keep Climbing

Last year journals retracted more than ten-thousand papers!

10,000.

That’s a lot.

In fact, it’s the steepest spike on record.

Eight percent of Dutch scientists admitted to falsifying or fabricating data at least once during a recent three-year window.

These stats fuel the argument that humans alone are struggling to police integrity at scale.

I think it’s a dangerous game

LLMs lack domain intuition.

They flag style issues instantly but cannot judge whether a negative result matters or whether a dataset violates ethics rules. If editors defer to the machine on substance, the peer-review filter stops being a filter.

I see three takeaways.

First, fully automated authorship has crossed the proof-of-concept line. The next iteration will aim for the main conference track, and someone will eventually hit that mark.

Second, peer review itself is already part human, part AI. The trend will accelerate because the incentives line up for speed and cost savings.

Third, scholarly trust will hinge on transparent disclosure. If reviewers, editors, or authors use AI, readers deserve to know how and where. That is the only way to keep accountability intact.

A Path Forward

Me personally, I want journals to publish clear AI-usage policies that separate acceptable assistance from unacceptable delegation.

Grammar fixes are fine.

Final judgment on novelty, ethics, or statistical validity must stay in human hands.

I want reviewers trained to read AI-generated critiques critically, not copy-paste them. Credit reviewers who invest real effort, and weed out those who phone it in with automated praise.

I also want better ‘original source’ tools.

Embedding cryptographic signatures in text or code could show exactly which passages came from which source, whether human or model. Someone say blockchain?

I don’t think there’s a rollback plan

AI can relieve the mundane.

It can scan thousands of references, flag shaky stats, and surface conflicting data. Used wisely, it frees experts to focus on insight instead of clerical chores.

The Sakana experiment shows the upside and the cliff edge in the same breath. An autonomous system can assemble a credible narrative, but it can also smuggle errors past reviewers who assume the model never blinks.

I refuse to treat this as an either-or choice.

We can embrace AI assistance while enforcing guardrails that preserve the meaning of authorship and review.

Where You Can Dig Deeper

For those who are interested, here are the papers:

Compositional Regularization: Unexpected Obstacles in Enhancing Neural Network Generalization

Real-World Challenges in Pest Detection Using Deep Learning: An Investigation into Failures and Solutions

Unveiling the Impact of Label Noise on Model Calibration in Deep Learning

Share Release Hype