View all files | ||||
Claw AI Lab is a lab-native multi-agent research platform for interactive and scalable AI-driven science. It enables users to create a full AI research lab from a single prompt, with customizable roles, research directions, and collaborative workflows, rather than relying on a single-agent or fixed serial pipeline. Claw orchestrates multiple agents and projects in parallel through a FIFO-based scheduling framework, maximizing compute utilization while supporting cross-project knowledge sharing and mutual improvement. Crucially, the system keeps humans in the loop: users can intervene whenever needed, provide feedback under ambiguity, inject new ideas, and iteratively refine the research process through rollback and continuation. Combined with a simple UI that reduces everything to prompts and clicks, Claw transforms automated research into a more intuitive, steerable, and laboratory-like experience.
We welcome contributions from the community to make this project better together!
You are warmly invited to scroll to the bottom of the page to join our group for beta testing and discussion.
Launch projects, monitor agents, and inspect every artifact — all from a single interface.
Real-time event stream · Multi-project overview · One-click rollback & resume · Artifact inspector
| 🖥️ | Interactive UI | Real-time web dashboard with event stream, data shelf, and multi-project monitoring |
| 🧬 | Claw Code Harness | Reads your local codebases, datasets & checkpoints — writes runnable code back to disk |
| 🔬 | End-to-End Pipeline | One prompt → paper + code + figures + experiment logs, fully autonomous |
| 🤝 | Three Research Modes | Explore · Discussion (multi-agent debate) · Reproduce |
Each project autonomously produces a full research deliverable: Paper · Code · Figures · Experiment Logs
|
OATH: Quantifying Video Hallucination via Occlusion Debt Lab Explore · CV · Video Generation Evaluation Best method achieves 0.1714 primary error vs CLIP-T baseline 0.2393 (↓28%) |
Reproducing PhyCustom on FLUX Reproduce · Image Gen · Multi-Concept Customization 5 methods × 3 seeds = 15 runs; output-space decoupling edges at 0.2813 |
Multi-agent discussion on: "What is the most deployable direction for Video Action Models in Embodied AI?"
Agent A — World Model + MPC (Model Predictive Control) is the most industrially stable path.
Agent B — "Train with video, infer with action" is the most deployable policy paradigm.
Agent C — Execution monitoring & SOP (Standard Operating Procedure) automation lands fastest as a product.
Consensus: The most deployable form is not a single end-to-end model, but a layered, modular system — use video supervision during training to learn rich dynamics, output actions directly at inference for low latency, and layer planning/MPC/safety modules on top for closed-loop robustness and recovery.
Top 3 Research Directions (ranked by deployability)| 1 | Layered Video-Action Stack — video-action joint training + direct action inference + MPC safety | Highest — best balance of latency, interpretability & safety |
| 2 | Video-to-Plan / SOP — demo videos → step sequences & skill graphs for existing robots | High — smallest embodiment gap, clearest commercial path |
| 3 | Execution Monitor — real-time step tracking, anomaly detection, re-planning triggers | High — fastest to production; critical for industrial reliability |
| World Model + MPC vs. Direct Action? | Combine both — world model for representation, direct action for control, MPC for safety |
| Human video: valuable or too much gap? | Pre-training yes; direct low-level transfer not yet reliable |
| Is monitoring a "real" action model? | Not the backbone, but fastest to reach production value |
→ Full Transcript · → Consensus Synthesis
Fill in following configurations in examples/config_template.yaml:
Open http://localhost:5903/ → Submit your research topic and let the agents work.
| 1 | Prepare local codebases, datasets & checkpoints — enter their paths when submitting a project | Avoids download delays and network failures during runs |
| 2 | Use a strong coding model like GPT 5.4 | Significantly better code quality and fewer iteration cycles |
| 3 | Review the IMPORTANT fields in Configuration Details | Misconfigured API keys or resource limits are the #1 cause of failed runs |
Every field in examples/config_template.yaml explained. Fields marked IMPORTANT are the ones you almost always need to set.
Click to expand full referenceWe learned and reused code from the following projects: AutoResearchClaw, AutoResearch, claw-code.
We thank the authors for their contributions to the community!
MIT — see LICENSE for details.
If you find Claw AI Lab useful, please cite: