Inframind
InfraMind: Fine-tuning toolkit for training SLMs on Infrastructure-as-Code using GRPO/DAPO. Achieves 97.3% accuracy on IaC generation.
Install / Use
/learn @saikiranrallabandi/InframindREADME
InfraMind
A fine-tuning toolkit for training small language models on Infrastructure-as-Code using reinforcement learning (GRPO/DAPO).
InfraMind fine-tunes SLMs using GRPO/DAPO with domain-specific rewards to generate valid Terraform, Kubernetes, Docker, and CI/CD configurations.
Trained Models
| Model | Method | Accuracy | HuggingFace | |-------|--------|----------|-------------| | inframind-0.5b-grpo | GRPO | 97.3% | srallabandi0225/inframind-0.5b-grpo | | inframind-0.5b-dapo | DAPO | 96.4% | srallabandi0225/inframind-0.5b-dapo |
What is InfraMind?
InfraMind is a fine-tuning toolkit that:
- Takes an existing small language model (Qwen, Llama, etc.)
- Fine-tunes it using reinforcement learning (GRPO)
- Uses infrastructure-specific reward functions to guide learning
- Produces a model capable of generating valid Infrastructure-as-Code
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐
│ Base Model │ → │ InfraMind │ → │ Fine-tuned Model │
│ Qwen2.5-0.5B │ │ GRPO Training │ │ inframind-0.5b-grpo│
│ -Instruct │ │ + IaC Rewards │ │ (97.3% accuracy) │
└─────────────────┘ └─────────────────┘ └─────────────────────┘
│
▼
┌─────────────────────┐
│ DAPO Training │
│ inframind-0.5b-dapo│
│ (96.4% accuracy) │
└─────────────────────┘
What InfraMind Provides
| Component | Description | |-----------|-------------| | InfraMind-Bench | Benchmark dataset with 500+ IaC tasks | | IaC Rewards | Domain-specific reward functions for Terraform, K8s, Docker, CI/CD | | Training Pipeline | GRPO implementation for infrastructure-focused fine-tuning |
The Problem
Large Language Models (GPT-4, Claude) can generate Infrastructure-as-Code, but:
- Cost: API calls add up ($100s-$1000s/month for teams)
- Privacy: Your infrastructure code is sent to external servers
- Offline: Doesn't work in air-gapped/secure environments
- Customization: Can't fine-tune on your specific patterns
Small open-source models (< 1B parameters) fail at IaC because:
- They hallucinate resource names (
aws_ec2instead ofaws_instance) - They generate invalid syntax that won't pass
terraform validate - They ignore security best practices
- Traditional fine-tuning (SFT/LoRA) only memorizes patterns, doesn't teach reasoning
Our Solution
InfraMind fine-tunes small models using reinforcement learning to reason about infrastructure, not just memorize examples.
| Approach | Method | Result | |----------|--------|--------| | SFT/LoRA | "Memorize this Terraform example" | Copies patterns, fails on novel tasks | | InfraMind | "Generate Terraform, I'll score if it's valid" | Learns reasoning, handles new tasks |
Reward Function
InfraMind uses domain-specific rewards:
Reward = α × Syntax + β × Correctness + γ × Format
Where:
- Syntax: Does it pass `terraform validate`?
- Correctness: Are the right resources used?
- Format: Is the structure proper?
Features
- InfraMind-Bench: 500+ tasks across Terraform, Kubernetes, Docker, CI/CD
- GRPO Training: Reinforcement learning that teaches reasoning
- Model Agnostic: Works with Qwen, Llama, Mistral, or any HuggingFace model
- Alpaca Format: Compatible with standard training pipelines
- Local-first: Runs entirely on your machine
Installation
pip install inframind
Or from source:
git clone https://github.com/saikiranrallabandi/inframind.git
cd inframind
pip install -e .
Quick Start
from inframind import create_dataset, InfraMindTrainer
# Load 500+ IaC tasks
dataset = create_dataset(size=100)
# Fine-tune with InfraMind (GRPO + IaC rewards)
trainer = InfraMindTrainer(model_name="Qwen/Qwen2.5-0.5B-Instruct")
trainer.train(dataset, epochs=1)
# Save your fine-tuned model
trainer.save("./qwen-0.5b-inframind")
How It Works
1. InfraMind-Bench Dataset
529 infrastructure tasks in Alpaca format:
{
"instruction": "Create Terraform for AWS EC2 instance",
"input": "t2.micro instance type",
"output": ""
}
| Category | Tasks | Examples | |----------|-------|----------| | Terraform | 225 | EC2, S3, VPC, RDS, EKS, Lambda, IAM | | Kubernetes | 138 | Deployments, Services, Ingress, RBAC | | Docker | 70 | Dockerfiles, docker-compose | | CI/CD | 96 | GitHub Actions, GitLab CI, Jenkins |
2. Training Loop
┌─────────────────────────────────────────────────────────────────┐
│ InfraMind TRAINING │
├─────────────────────────────────────────────────────────────────┤
│ │
│ For each IaC task: │
│ │
│ 1. GENERATE: Model produces multiple IaC outputs │
│ "Create EC2" → [output1, output2] │
│ │
│ 2. SCORE: Reward function evaluates each │
│ output1: syntax=1.0, correct=0.8, format=0.9 → 0.89 │
│ output2: syntax=0.0, correct=0.5, format=0.7 → 0.38 │
│ │
│ 3. ADVANTAGE: Compare within group (GRPO) │
│ output1: above average → positive advantage │
│ output2: below average → negative advantage │
│ │
│ 4. UPDATE: Increase probability of better outputs │
│ Model learns: "valid syntax = higher reward" │
│ │
└─────────────────────────────────────────────────────────────────┘
3. Reward Function
from inframind import IaCReward
reward = IaCReward(alpha=0.4, beta=0.3, gamma=0.3)
# Score a Terraform output
score, details = reward.score(terraform_code, category="terraform")
# score: 0.85
# details: {"syntax": 1.0, "correctness": 0.8, "format": 0.75}
Reward components:
| Component | Weight | What it measures | |-----------|--------|------------------| | Syntax | 0.4 | Valid resource declarations | | Correctness | 0.3 | Right resource types used | | Format | 0.3 | Proper structure (balanced braces, etc.) |
Training
InfraMind supports multiple training environments:
| Platform | Script | GPU Required |
|----------|--------|--------------|
| Local GPU | python train_local.py | Yes |
| Modal.com | modal run grpo_training.py | Provided |
| AWS SageMaker | Upload + HF Estimator | Yes |
| GCP Vertex AI | Custom training job | Yes |
| Azure ML | HF integration | Yes |
| HuggingFace Spaces | accelerate launch train_local.py | Yes |
| Google Colab | Run notebook | Free GPU |
Local Training (Any GPU)
# GRPO Training
python train_local.py --method grpo --epochs 3 --output ./models/grpo
# DAPO Training (from GRPO checkpoint)
python train_local.py --method dapo --checkpoint ./models/grpo --output ./models/dapo
# Quick test with 100 samples
python train_local.py --method grpo --samples 100 --epochs 1
# Evaluate trained model
python train_local.py --evaluate ./models/grpo
# Generate IaC
python train_local.py --generate ./models/grpo --prompt "Create Terraform for AWS EC2"
# Multi-GPU with Accelerate
accelerate launch train_local.py --method grpo --epochs 3
Modal.com (Cloud GPU)
# GRPO Training (Stage 1)
modal run grpo_training.py
# DAPO Training (Stage 2 - starts from GRPO checkpoint)
modal run dapo_training.py
# Evaluate GRPO model
modal run grpo_training.py::evaluate
# Evaluate DAPO model
modal run dapo_training.py::evaluate
# Quick test DAPO (110 samples)
modal run dapo_training.py::quick_test
Category-Specific Training
# Train only on Terraform
python scripts/train.py --category terraform --epochs 5
# Train only on Kubernetes
python scripts/train.py --category kubernetes --epochs 5
Custom Training
from inframind import create_dataset, InfraMindTrainer
# Load specific categories
dataset = create_dataset(categories=["terraform", "kubernetes"], size=200)
# Configure trainer
trainer = InfraMindTrainer(
model_name="Qwen/Qwen2.5-0.5B-Instruct",
lr=1e-5,
group_size=4 # More samples per task for better GRPO
)
# Train
history = trainer.train(dataset, epochs=3)
# Check progress
for epoch in history:
print(f"Epoch {epoch['epoch']}: Reward = {epoch['mean_reward']:.3f}")
Comparison with Existing Work
| Project | Type | Method | IaC-specific | |---------|------|--------|--------------| | devops-slm-v1 | Fine-tuned Model | LoRA/SFT | Yes | | AIAC | CLI Tool | Prompting (API) | Yes | | GPT-4 / Claude | API Service | - | No | | InfraMind | Fine-tuning Toolkit | GRPO | Yes |
**Key diff
Related Skills
tmux
348.0kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
diffs
348.0kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
blogwatcher
348.0kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
prd
Raito Bitcoin ZK client web portal.
