SakanaAI’s AI Scientist-v2: The Autonomous Research System That Wrote a Peer-Reviewed Paper

For decades, scientific discovery has been a fundamentally human endeavor 鈥?driven by curiosity, intuition, and the painstaking process of formulating hypotheses, running experiments, and writing papers. But a new system from SakanaAI is challenging that assumption in a very concrete way: it just produced the first workshop paper written entirely by an AI and accepted through peer review.

The system is called AI Scientist-v2, and it is a generalized, end-to-end agentic framework for automated scientific research. Unlike its predecessor (v1), which relied on human-authored templates, v2 removes all such scaffolding and employs a progressive agentic tree search guided by an experiment manager agent to explore and discover across the full landscape of machine learning research.

How AI Scientist-v2 Works

At its core, the system operates in three autonomous phases. First, it generates hypotheses by analyzing existing literature and identifying gaps or promising new angles. It does this by interacting with tools like Semantic Scholar to check novelty against the existing body of research. Second, it runs experiments 鈥?actually executing code written by large language models to test those hypotheses. Third, it analyzes results and synthesizes everything into a complete scientific manuscript, ready for submission.

The key architectural innovation is the progressive agentic tree search. Rather than following a linear pipeline, the experiment manager agent explores branches of the hypothesis tree dynamically, deciding which directions are most promising and worth pursuing further. This gives the system a much more open-ended, exploratory capability than traditional automated research systems.

As the AI Scientist-v2 team notes: “This system autonomously generates hypotheses, runs experiments, analyzes data, and writes scientific manuscripts.” It supports multiple model backends including OpenAI models, Google Gemini (via OpenAI API compatibility), and Claude models through Amazon Bedrock.

AI Scientist-v2 System Architecture — The AI Scientist-v2 uses a progressive agentic tree search guided by an experiment manager agent to autonomously explore the space of scientific hypotheses.

What It Produced 鈥?And Why It Matters

The resulting paper 鈥?produced entirely without human writing 鈥?was submitted to an academic workshop and accepted through that venue peer review process. This is a milestone, even if the paper is not destined for a top-tier journal. It demonstrates that the pipeline from idea to publication 鈥?the core workflow of academic research 鈥?is increasingly automatable.

It is worth noting that the system has tradeoffs compared to v1. The template-free, exploratory approach of v2 produces a broader range of ideas but has a lower overall success rate than v1, which benefits from well-defined starting templates. For tasks with clear objectives and a solid foundation, v1 remains superior. But v2 shines in open-ended scientific exploration 鈥?exactly the regime where human creativity has traditionally been most essential.

Real Risks Require Real Caution

The AI Scientist-v2 team is unusually forthright about the dangers. Their GitHub repo carries a prominent caution notice:

This codebase will execute Large Language Model (LLM)-written code. There are various risks and challenges associated with this autonomy, including the potential use of dangerous packages, uncontrolled web access, and the possibility of spawning unintended processes. Ensure that you run this within a controlled sandbox environment (e.g., a Docker container).

This is not boilerplate. Running LLM-generated code autonomously in an uncontrolled environment is genuinely dangerous 鈥?code could install malicious packages, exfiltrate data, or compromise systems. Anyone using this system needs to understand those risks fully.

AI Scientist-v2 Featured Image — AI Scientist-v2: From hypothesis generation to published paper 鈥?all without human authorship.

The Bigger Picture

AI Scientist-v2 represents a significant step toward the long-anticipated goal of fully automated scientific discovery. The question is no longer whether AI can do science 鈥?it is whether the scientific community will adapt its institutions and norms to accommodate increasingly capable autonomous systems.

The first AI-authored peer-reviewed paper is a milestone worth noting. The next one may not be a workshop paper 鈥?it may be a Nature submission. And the one after that may raise uncomfortable questions about authorship, accountability, and the future of human scientific inquiry.

The full project, including code, paper, and experimental results, is available on GitHub.

How AI Scientist-v2 Works

What It Produced 鈥?And Why It Matters

Real Risks Require Real Caution

The Bigger Picture

Related Posts

Newsletter

Join the discussion Cancel reply