Ask any researcher what they wish they had more of, and the answer is almost always time. The pace of scientific discovery has always been limited by human bandwidth. Researchers spend years mastering narrow domains, writing grant proposals, running experiments, analyzing data, and wrestling results into publishable papers. Now, a new system from SakanaAI is pushing that boundary further than ever 鈥?and it did something remarkable: it wrote a paper that was accepted through peer review at an AI workshop, with no human authorship.
What Is AI Scientist-v2?
AI Scientist-v2 is a generalized, end-to-end agentic system that autonomously generates hypotheses, designs experiments, runs them, analyzes the results, and produces a complete scientific manuscript. It is the successor to the original AI Scientist, but with a crucial difference: version 2 removes all reliance on human-authored templates and generalizes across machine learning domains.
Where the first version worked best when given a strong starting template 鈥?following well-defined structures with high success rates 鈥?version 2 takes a broader, more exploratory approach. It uses a technique called progressive agentic tree search, guided by an experiment manager agent, to explore a space of scientific ideas the way a human researcher might: by forming hypotheses, testing them, learning from failures, and refining the next attempt.
The system generates research ideas from a high-level topic description, runs actual experiments using code it writes itself, produces plots and data analyses, and then synthesizes everything into a formatted scientific paper with citations, related work, and methodology sections.
The Agentic Tree Search Engine
At the heart of AI Scientist-v2 is a best-first tree search (BFTS) algorithm. The system starts with multiple independent search trees and explores them in parallel. Each node in the tree represents a partial or complete experimental result. The experiment manager agent decides which nodes to expand, when to debug a failing experiment, and when a particular line of inquiry has reached a natural conclusion.
This approach is designed to handle the open-ended nature of scientific exploration. Real science does not follow a linear script. Experiments fail. Hypotheses turn out to be wrong. Sometimes a wrong hypothesis opens the door to a more interesting discovery. AI Scientist-v2 is built to navigate that mess, not just execute a fixed pipeline.
The system uses multiple AI models for different stages: a frontier model for writing and review, another for running experiments, and citation models to contextualize the work within existing literature. SakanaAI says using Claude 3.5 Sonnet for the experimentation phase typically costs around $15 to $20 per run, with an additional $5 for the writing phase.
A Peer-Accepted AI-Written Paper
The milestone that got the AI research community’s attention was not just that AI Scientist-v2 could produce a paper 鈥?it was that a paper it produced was accepted through peer review at an actual workshop. That paper, submitted to an ICLR workshop, went through human reviewers who evaluated it on scientific rigor, novelty, and clarity. An AI wrote it. Humans reviewed it. It passed.
That is a meaningful threshold. Workshop papers at top venues are typically short, focused contributions 鈥?the kind of paper that AI might plausibly generate given a strong idea. And AI Scientist-v2 is clear-eyed about its own limitations: it does not always produce better papers than version 1, especially when a strong starting template is available. Version 1 follows well-defined templates with higher success rates. Version 2 is for open-ended exploration where the path forward is not obvious.
What It Runs (And Why You Should Be Careful)
AI Scientist-v2 executes LLM-written code. That is worth emphasizing. The system will generate Python code, run it against real machine learning models, produce actual experimental results, and use those results to write a paper. SakanaAI explicitly warns that there are risks: dangerous packages, uncontrolled web access, unintended processes. The company recommends running everything inside a controlled sandbox 鈥?a Docker container, specifically 鈥?and treating the output with appropriate skepticism.
This is not a toy demo. It is a system that writes and runs arbitrary code in the pursuit of novel scientific results. SakanaAI’s license includes usage restrictions, and any resulting paper must clearly state that it was autonomously generated by AI.
What This Means for Science
If systems like AI Scientist-v2 continue to improve, the implications for scientific research are profound. Automated hypothesis generation could accelerate discovery in fields where the bottleneck is not data but researcher time. A system that can run hundreds of experimental variations in the time a human lab assistant might run a handful could compress months of work into hours.
The caveat, of course, is quality. A paper that clears a workshop review is not the same as a paper that advances a field. The question of whether automated science can produce genuinely novel insights remains open. Early signs suggest the system is most useful for exploring well-defined hypothesis spaces, not for the kind of conceptual leaps that historically mark major discoveries.
Getting Started
The codebase is publicly available on GitHub under a derivative of the Responsible AI License. Installation requires a Linux machine with NVIDIA GPUs, CUDA, and PyTorch. The process takes about an hour. Users need to provide API keys for whatever language models they want to use 鈥?OpenAI, Google Gemini, or Claude via AWS Bedrock are all supported.
The pipeline has two main stages: ideation and experiment. The ideation script takes a topic description and generates a set of research ideas, checking each against Semantic Scholar for novelty. The main pipeline then runs the experiments and produces a paper. All output goes into timestamped folders inside the experiments directory.
AI Scientist-v2 is a compelling glimpse of where automated science is heading. It is not replacing researchers 鈥?not yet, and not for the hardest problems 鈥?but it is already doing work that would have seemed like science fiction five years ago. And the paper it wrote? It is peer-reviewed, published, and out there.