AMLP
AMLP integrates dataset creation, input/output handling, and analysis for machine learning interatomic potentials. It supports Gaussian, VASP, and CP2K, with LLM agents for code selection and ASE-based AMLP-Analysis for molecular simulations and validation.
Install / Use
/learn @adamlaho/AMLPREADME

The Automated Machine Learning Pipeline (AMLP) provides an integrated framework that unifies the entire workflow from dataset creation to model validation. It leverages large language model (LLM) agents to assist with electronic-structure code selection, thereby reducing the manual effort typically required. AMLP also incorporates automated dataset tools for efficient input generation—including geometry (or cell) optimizations and ab initio molecular dynamics (AIMD)—as well as output conversion and preparation of data in the MACE-compatible format. It supports three DFT packages—Gaussian, VASP, and CP2K—ensuring flexibility across different electronic-structure environments. Its analysis module, AMLP-Analysis (built on ASE), further supports a broad range of molecular simulations, enabling systematic evaluation and validation of machine learning interatomic potentials.
Table of Contents
- Multi-Agent DFT Research System
- AMLP-analysis: Automated Machine Learning Pipeline - Analysis Module
- Batch Evaluation Tool
Automated Machine Learning Pipeline
Overview
The Automated Machine Learning Pipeline use multi-agent DFT research system as an integrated framework that combines:
- AI-driven research analysis - Uses specialized AI agents to analyze research topics and generate summaries
- DFT code expertise - Provides expert recommendations for Gaussian, VASP, and CP2K simulations
- Input file generation - Efficiently processes crystallographic structures for DFT calculations
- Output data processing - Extracts and formats simulation results for analysis or ML model training
Features
🤖 AI-Agent Research Assistance
AMLP includes multiple AI agents to assist with different aspects of computational chemistry research:
- Experimental Chemist Agent: Summarizes and interprets experimental aspects of research topics.
- Theoretical Chemist Agent: Analyzes theoretical foundations and computational methodologies.
- DFT Expert Agents: Specialized agents for Gaussian, VASP, and CP2K that provide code-specific recommendations.
- Supervisor Agents: Integrate information from all agents and generate comprehensive reports.
📝 Input Generation
- Multi-code support: Generate inputs for CP2K, VASP, and Gaussian
- Batch processing: Convert multiple structure files automatically
- Format conversion: Process CIF and XYZ files with validation
- Supercell creation: Build supercells with custom dimensions
- Interactive guidance: Step-by-step parameter selection for DFT calculations
📊 Output Processing
- DFT output extraction: Extract energies, forces, and coordinates from simulation results
- ML-ready dataset creation: Convert DFT outputs to HDF5 format for machine learning potentials
- AIMD processing: Generate AIMD inputs from optimized structures at multiple temperatures
📥 Installation
Requirements
- Python 3.8+
- Required Python packages:
- NumPy
- PyYAML
- ASE (Atomic Simulation Environment, optional but recommended)
- openai (for AI agent functionality)
- requests
Setup
- Clone the repository:
git clone https://github.com/adamlaho/AMLP.git
cd AMLP
- Install dependencies:
pip install -r requirements.txt
API Configuration
The AI agents in this system use OpenAI's API for text generation. Follow these steps to configure API access:
-
Get API Key:
- Sign up for an account at OpenAI Platform
- Navigate to the API keys section and create a new secret key
- Copy the key (you will not be able to view it again)
-
Set Environment Variable:
🔑 The system looks for the API key in the
OPENAI_API_KEYenvironment variable:# On Linux/macOS export OPENAI_API_KEY="your-api-key-here" # On Windows set OPENAI_API_KEY=your-api-key-here -
Model Configuration:
By default, the agents use predefined settings. These can be customized in the configuration file:
AMLP/multi_agent_dft/config/default_config.yamlIn this file, you can adjust: • The type of AI models (e.g., OpenAI models). • PublicationAPI parameters. • Other runtime conditions for agent behavior.
-
Usage Monitoring:
- Be aware of your OpenAI API usage limits
- The AI agent functionality will consume tokens based on the length of inputs and outputs
- The system implements basic retry logic for API rate limiting (3 attempts with exponential backoff)
Usage
Basic Usage
Run the main script to start the system:
python3 amlpt.py
The system will present a menu with five operation modes:
- AI-agent feedback (research summaries & reports)
- Input generation (CP2K/VASP/Gaussian)
- Output processing (extract forces, energies, coordinates)
- ML potential dataset creation (JSON to MACE HDF5)
- AIMD processing (JSON to CP2K AIMD inputs)
AI-Assisted Research Workflow
This mode helps you explore research topics with AI assistance:
- Enter a research topic or question
- The system will refine your query and analyze literature
- Review reports from Experimental and Theoretical Chemist agents
- Examine DFT-specific recommendations from expert agents
- Use the generated reports to guide your computational research
Example:
Enter your research topic or question: Metal oxide catalysts for water splitting
Structure File Support
The system supports the following structure file formats:
- CIF (Crystallographic Information File)
- XYZ (Cartesian coordinates)
📝 Input Generation for Cell and Geometry optimizations
Generate input files for DFT calculations using either batch mode or guided mode:
Batch Mode
Automatically convert all supported files using default templates:
Batch-mode: which DFT code? (CP2K/VASP/Gaussian): cp2k
Path to file or directory: ./structures
Output directory: ./cp2k_inputs
Guided Mode
Step through detailed parameter selection for your DFT calculation:
Which DFT code? (CP2K/VASP/Gaussian): VASP
📊 Output Processing
Extract data from DFT calculation outputs:
Select DFT code (1/2/3): 1
Path to CP2K input file (.inp): ./cp2k_calcs/input.inp
Path to CP2K output file: ./cp2k_calcs/output.out
Path for output JSON file [output_data.json]: results.json
🔥 AIMD Input Generation
Generate AIMD inputs from optimized structures at multiple temperatures based on the cell/geo. optimization .json output processed file:
Path to your JSON file or directory: ./optimized_structures
Output directory for generated files: ./aimd_inputs
Select template (1-5) [1]: 2
🧮 ML Dataset Creation
Convert DFT outputs to machine learning potential training data:
Full path to JSON file containing DFT data: ./results/dft_data.json
Output directory for HDF5 datasets [current directory]: ./ml_datasets
Dataset base name [dft_data]: water_system
Output Files
Depending on the mode, the system generates:
- CP2K: .inp input files
- VASP: INCAR, POSCAR, KPOINTS, and POTCAR files in subdirectories
- Gaussian: .com
- Research Reports: .txt
- Processed Data: .json and .h5 data files
Troubleshooting
API-Related Issues
- Authentication Errors: Verify your API key is correct and properly set in the environment or config file
- Rate Limiting: If you see
RateLimitError, the system will automatically retry with exponential backoff - Model Not Available: Ensure you're using a model that's available to your API key level
Common Issues
- File validation errors: Check if your CIF or XYZ files follow standard format
- Missing cell parameters: Ensure cell information is properly defined for periodic systems
- ASE import errors: Install ASE for full functionality:
pip install ase
AMLP-analysis: Automated Machine Learning Pipeline - Analysis module
AMLP-A is a tool that helps you analyze atomic structures using machine learning. It combines several analysis methods into one easy workflow:
- ⚡ Single Point calculation
- 🔄 Geometry optimization
- 📦 Cell optimization
- 🌡️ Molecular dynamics simulations with different ensembles
- 📈 Structural analysis (RDF, coordination and Energy drift)
What Can AMLP-Analysis Do?
- Use Pre-trained Models: Works with MACE machine learning potentials
- Run Multiple Analyses: Perform different analyses in a single workflow
- Easy Configuration: Change simulation settings using a simple YAML file
- Reproducible Research: Get consistent results for scientific work
Getting Started with AMLP-Analysis
System Requirements
- Python 3.7 or newer (Python 3.9 recommended)
- Required packages: numpy, matplotlib, pyyaml, torch, tqdm, scipy, ase, mace-torch
How to Use AMLP-Analysis
Run the main analysis script:
python3 amlpa.py <input_file.xyz> config.yaml
Configuration Guide
Create a config.yaml file to customize your analysis. Here's what you can configure:
Basic Settings
base_name: 'acridine_test'
output_dir: './test_results'
Structure Settings
# Cell parameters
readcell_info: true # Try to read cell from XYZ header
cell_params: null # Fallba
Related Skills
openhue
343.1kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
343.1kElevenLabs text-to-speech with mac-style say UX.
weather
343.1kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.5kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
