AI News, LLMs

AI Scientist-v2: SakanaAI’s Autonomous System Generates Peer-Reviewed Scientific Papers

SakanaAI has released AI Scientist-v2, a generalized end-to-end agentic system that has generated the first workshop paper written entirely by AI and accepted through peer review. The system autonomously generates hypotheses, runs experiments, analyzes data, and writes scientific manuscripts.

Beyond Templates: A New Approach to Autonomous Science

Unlike its predecessor (AI Scientist-v1), which relied on human-authored templates, AI Scientist-v2 removes this dependency, generalizes across machine learning domains, and employs a progressive agentic tree search guided by an experiment manager agent. This allows the system to explore research directions without predefined boundaries.

The system uses a best-first tree search (BFTS) approach to exploration. The configuration allows users to set parameters like num_workers (number of parallel exploration paths) and steps (maximum number of nodes to explore). For example, if num_workers=3 and steps=21, the tree search explores up to 21 nodes, expanding 3 nodes concurrently at each step.

How It Works

The AI Scientist-v2 pipeline consists of several stages. First, the system uses an LLM to brainstorm and refine research ideas based on a high-level topic description, interacting with Semantic Scholar to check for novelty. Then, it runs experiments via agentic tree search, analyzing results and generating paper drafts.

The ideation step generates potential research ideas, outputting a JSON file containing structured research ideas including hypotheses, proposed experiments, and related work analysis. The main pipeline then uses this JSON file to run experiments, analyze results, and produce a paper.

A typical run using Claude 3.5 Sonnet for the experimentation phase costs around $15–20, with the subsequent writing phase adding approximately $5 when using default models. The complete pipeline typically finishes within several hours.

Real-World Results

The system has already produced papers accepted at academic workshops. The research team notes that v2 doesn’t necessarily produce better papers than v1, especially when a strong starting template is available. v1 follows well-defined templates leading to high success rates, while v2 takes a broader, more exploratory approach with potentially lower success rates but greater novelty potential.

v1 works best for tasks with clear objectives and solid foundations, whereas v2 is designed for open-ended scientific exploration where the research direction is less constrained.

Safety Considerations

The codebase will execute LLM-written code, which carries various risks including the potential use of dangerous packages, uncontrolled web access, and unintended process spawning. The developers strongly recommend running the system within a controlled sandbox environment such as a Docker container.

As AI systems become increasingly capable of autonomous scientific research, questions about oversight, safety, and the changing nature of scientific discovery itself become more pressing. AI Scientist-v2 represents a significant step toward fully autonomous scientific research systems, marking a new era in how scientific discoveries may be made.

Join the discussion

Your email address will not be published. Required fields are marked *