Code Generation
Developing AI systems that automatically generate code from specifications, design documents, and natural language descriptions, advancing program synthesis and software development automation.

Data Systems Group · MIT CSAIL
Our mission is to establish the engineering principles for AI-driven data systems. We move beyond ad-hoc experimentation to build a foundation of new abstractions, reusable software components, and rigorous development tools.
Developing AI systems that automatically generate code from specifications, design documents, and natural language descriptions, advancing program synthesis and software development automation.
Building AI-powered systems for data analysis, transformation, and insight extraction, applying optimization techniques to make data processing more efficient and accessible.
Developing efficient systems for retrieving relevant information from massive datasets, optimizing both accuracy and performance for real-world applications.
Investigating operational frameworks for deploying and managing AI agents in production environments, addressing challenges in monitoring, debugging, and maintaining autonomous systems.
Developing methods for AI agents to autonomously plan and execute complex multi-step tasks, combining reasoning, decision-making, and adaptive execution strategies.
D4 achieves a 54.8% average pass rate on the commit-0 benchmark, outperforming the next-best system by 14 percentage points. The system is an LLM-based compiler that generates code from design documents, treating design specifications as input and automatically producing implementation code.
Palimpzest achieves a 90.3x speedup and 9.1x cost reduction on document processing benchmarks. The system applies cost-based query optimization techniques to AI-powered data processing, automatically selecting optimal combinations of models, prompts, and execution strategies for data transformation tasks.
A2rchi is an open-source RAG framework for AI support systems. The system enables retrieval-augmented generation for technical assistance and has been deployed at MIT (SubMIT system and multiple courses) and CERN for particle physics data processing support.
BRAD (Blueprint for Relational Adaptive Databases) virtualizes cloud data infrastructures, automatically optimizing cloud database deployments across different storage engines and compute resources to balance performance and cost.
KramaBench is a comprehensive benchmark for evaluating AI agents on knowledge-intensive tasks. The benchmark provides a systematic framework for assessing agent performance across diverse scenarios requiring deep domain knowledge and reasoning capabilities.

Postdoctoral Researcher

Postdoctoral Researcher

Postdoctoral Researcher
Research Engineer
PhD Student

PhD Student

PhD Student

PhD Student

PhD Student

PhD Student

PhD Student

PhD Student

PhD Student

PhD Student

PhD Student
PhD Student

PhD Student

PhD Student

PhD Student
For a complete list of publications from Everest Lab members, please visit the DSG Publications page.
Our research is made possible through the generous support of industry partners and funding agencies who share our vision for advancing AI-driven data systems.







For sponsorship opportunities, please contact us at everest-info@csail.mit.edu
MIT Computer Science & Artificial Intelligence Laboratory
32 Vassar Street
Cambridge, MA 02139
We welcome inquiries from prospective PhD students interested in AI-powered data systems. Please reach out to faculty members directly regarding research opportunities.