Artificial intelligence has taken another giant leap forward with the release of AI Scientist-v2 by SakanaAI, a groundbreaking system capable of autonomously generating scientific research papers that have been accepted through peer review at academic workshops.
The AI Scientist-v2 represents a fundamental shift in how scientific discoveries can be made. Unlike its predecessor, which relied on human-authored templates, this new version removes those constraints entirely and employs a progressive agentic tree search guided by an experiment manager agent. The system autonomously generates hypotheses, runs experiments, analyzes data, and writes complete scientific manuscripts.
How AI Scientist-v2 Works
The system operates through a sophisticated multi-stage pipeline. First, it uses an LLM to brainstorm and refine research ideas based on a high-level topic description, interacting with tools like Semantic Scholar to check for novelty. Then, the main AI Scientist-v2 pipeline runs experiments via agentic tree search, analyzes results, and generates a paper draft鈥攁ll without human intervention.
According to the SakanaAI team, ‘We are excited to introduce The AI Scientist-v2, a generalized end-to-end agentic system that has generated the first workshop paper written entirely by AI and accepted through peer review.’
Key Features
- Autonomous Hypothesis Generation: The system creates novel research hypotheses without human templates
- Self-Running Experiments: Executes LLM-written code to test hypotheses
- Data Analysis: Automatically processes and interprets experimental results
- Paper Writing: Generates complete scientific manuscripts in academic format
- Peer Review Simulation: Produces work that passes workshop peer review
Technical Implementation
The system requires Linux with NVIDIA GPUs running CUDA and PyTorch. Installation involves creating a conda environment with Python 3.11, installing PyTorch with CUDA support, and setting up PDF and LaTeX tools for paper generation.
By default, the system uses OpenAI’s API for GPT models and can also utilize Google’s Gemini through OpenAI-compatible endpoints. For Claude models, users can configure Amazon Bedrock with appropriate AWS credentials.
Cost and Performance
The team estimates that ideation costs are generally low (a few dollars), while the main experiment pipeline typically costs around – per run when using Claude 3.5 Sonnet. The writing phase adds approximately $5 with default models.
The success rate depends on the foundation model chosen and the complexity of the research idea. Higher success rates are observed with more powerful models like Claude 3.5 Sonnet for the experimentation phase.
Implications for Scientific Research
The release of AI Scientist-v2 raises important questions about the future of scientific research. While the system produces papers that pass workshop-level peer review, the SakanaAI team acknowledges that ‘The AI Scientist-v2 doesn’t necessarily produce better papers than v1, especially when a strong starting template is available.’
v1 follows well-defined templates, leading to high success rates, while v2 takes a broader, more exploratory approach with lower success rates. v1 works best for tasks with clear objectives, whereas v2 is designed for open-ended scientific exploration.
Safety Considerations
The developers include a caution notice: ‘This codebase will execute Large Language Model (LLM)-written code. There are various risks and challenges associated with this autonomy, including the potential use of dangerous packages, uncontrolled web access, and the possibility of spawning unintended processes.’
Users are strongly advised to run the system within a controlled sandbox environment such as Docker.
Conclusion
AI Scientist-v2 represents a significant milestone in automated scientific discovery. While still requiring human oversight and safety precautions, it demonstrates that AI systems can now contribute meaningfully to the scientific research process鈥攏ot just as tools, but as active participants in discovery itself.