ZabbixFinOps
Zabbix Finops module.
Install / Use
/learn @Lfijho/ZabbixFinOpsREADME
Zabbix FinOps Toolkit
A Zabbix frontend module that identifies underutilized servers by analyzing historical metrics, helping teams reduce infrastructure costs through data-driven right-sizing recommendations. <img width="1600" height="744" alt="image" src="https://github.com/user-attachments/assets/f6118adc-6d44-453c-baee-0b6559a5c030" />
Features
- Waste Score & Efficiency Score — quantifies how much each server is under or well-utilized
- Growth Trend Detection — compares first week vs. last week averages to avoid premature downsizing
- Smart Safeguards — won't recommend reduction if disk, network or load average are saturated
- Sortable & Filterable Table — filter by host group, sort by any score column
- Top 10 Highlight — visually highlights the most underutilized servers
- Color-coded Badges — 🟢 Healthy, 🟡 Moderate, 🔴 High waste at a glance
- CSV Export — one-click download for offline analysis or reporting
- 30-day Historical Analysis — uses Zabbix
trendstables for efficient queries - Right-Sizing Simulation — suggests concrete vCPU and RAM targets based on P95 usage
How It Works
The module queries the Zabbix trends and trends_uint tables for the last 30 days and calculates:
| Metric | Source Item Key |
|--------|----------------|
| CPU utilization (%) | system.cpu.util |
| Memory utilization (%) | vm.memory.utilization or vm.memory.size[pavailable] (auto-inverted) |
| Disk usage (%) | vfs.fs.size[/,pused] |
| Network In/Out | net.if.in / net.if.out |
| Load Average | system.cpu.load |
Waste Score
waste_score = 100 - ((cpu_avg + ram_avg) / 2)
| Score | Level | Meaning | |-------|-------|---------| | ≥ 80 | HIGH | Server is heavily underutilized | | 60–79 | MEDIUM | Potential for optimization | | 40–59 | LOW | Moderate usage | | < 40 | HEALTHY | Well utilized |
Efficiency Score
efficiency_score = (cpu_avg + ram_avg) / 2
| Score | Level | |-------|-------| | 70–100 | Healthy usage | | 40–69 | Can be optimized | | 0–39 | High waste |
P95 Peak (Percentile 95)
Instead of using the absolute maximum (which can be skewed by a single 1-hour spike in 720 hours), the module uses the 95th percentile of hourly peaks.
Example: A server has 720 trend rows (30 days × 24 hours).
P95 ignores the top 5% (36 hours) of highest peaks.
If P95 is still high, it means the server regularly reaches that load — not just a rare spike.
The P95 value is calculated by sorting all hourly value_max entries ascending and picking the value at position floor(count × 0.95).
| Metric | P95 Threshold | What it means | |--------|--------------|---------------| | CPU P95 | ≥ 60% | Server regularly hits high CPU — not safe to downsize | | RAM P95 | ≥ 80% | Server regularly hits high RAM — not safe to downsize |
When P95 peaks are high but averages are low, the module shows: "Server mostly idle but with periodic load spikes. Investigate spike patterns before downsizing."
Detection Rules
A server is flagged as oversized only when ALL conditions are met:
- CPU average < 20% AND CPU P95 < 60%
- RAM average < 40% AND RAM P95 < 80%
- Disk usage is NOT near saturation (< 85%)
- Network is NOT persistently high (< 100 MB/s avg)
- No growth trend projected to exceed thresholds (see below)
If any safeguard triggers, the module explains why reduction is not recommended.
Trend Analysis
The module compares the average of week 1 (days 1–7) against the average of week 4 (days 24–30) to detect workload growth.
Data quality gate: Each week must have at least 24 hours of trend data. If a host was recently added or trend data was purged, the trend shows "N/A" instead of producing misleading results.
Projection-based blocking: Instead of blocking on any small increase, the module projects forward:
cpu_projected = cpu_avg + cpu_trend
ram_projected = ram_avg + ram_trend
Downsizing is only blocked if the projected value would reach or exceed the threshold (CPU ≥ 20% or RAM ≥ 40%). This means a server at 5% CPU growing +5pp (projected 10%) is still flagged for reduction, while a server at 15% CPU growing +8pp (projected 23%) is correctly held back.
Right-Sizing Simulation
Beyond detecting waste, the module suggests concrete right-sizing targets — answering the question every manager asks: "If I reduce, what should I reduce to?"
Formula:
recommended = current_allocation × 0.80
The module recommends 80% of the current allocation — directly, without rounding to predefined VM sizes. A safety check ensures the recommendation is never below the server's actual P95 peak usage.
| Step | Description |
|------|-------------|
| 1. Read current specs | system.cpu.num (vCPUs) and vm.memory.size[total] (RAM bytes) |
| 2. Calculate 80% of current | CPU: floor(current × 0.80) — RAM: round(current × 0.80, 1 decimal) |
| 3. Safety check | Ensure recommended ≥ P95 actual usage — if not, no recommendation is made |
Example:
| Host | vCPUs | vCPU Rec. | RAM | RAM Rec. | |------|-------|-----------|-----|----------| | db-mongo-dev | 4 vCPU | 3 vCPU | 7.5 GB | 6.0 GB |
A server with 7.5 GB RAM → 80% = 6.0 GB. P95 RAM usage is 21% (1.57 GB actual) — 6.0 ≥ 1.57, so the recommendation is safe.
Notes:
- Recommendations only appear when the suggested size is smaller than current (no upsizing suggestions).
- If
system.cpu.numorvm.memory.size[total]items are not available, the columns show "N/A". - The "—" symbol means no reduction is recommended (current size is already optimal or near-optimal).
Safeguards:
- P95 = 0% → recommendation is skipped entirely (likely missing or broken data).
- P95 safety floor → if 80% of current would be below actual P95 usage, no recommendation is made.
- Minimum RAM: 2 GB — the module never recommends less than 2 GB.
- Minimum vCPU: 1 — the module never recommends less than 1 vCPU.
Requirements
- Zabbix: 7.0.0 to 7.4.x (tested on 7.4.7)
- PHP: 8.0 or higher
- Hosts must be monitored with standard OS templates (Linux by Zabbix Agent, etc.)
- Trends data must be available (at least 1 hour of collection for trends to populate)
Installation
Git Clone
cd /usr/share/zabbix/ui/modules/
git clone https://github.com/Lfijho/ZabbixFinOps.git
Enable the Module
- Log in to the Zabbix frontend as an Admin user
- Navigate to Administration → General → Modules
- Click Scan directory
- Find "Zabbix FinOps Toolkit" in the list
- Click Enable
The module will appear in the menu under Monitoring → Infrastructure Cost Analyzer.
Usage
- Navigate to Monitoring → Infrastructure Cost Analyzer
- (Optional) Filter by host group using the multiselect dropdown
- Click Apply to filter results
- Click any column header (Waste Score, Efficiency, CPU Avg, RAM Avg) to sort
- Click Export CSV to download the report
Understanding the Results Table
| Column | Description | |--------|-------------| | Host | Server hostname | | Host Group | Zabbix host group(s) | | CPU Avg % | Average CPU utilization over 30 days | | CPU Max % | Absolute peak CPU over 30 days | | CPU P95 % | 95th percentile of hourly CPU peaks (ignores top 5% spikes) | | RAM Avg % | Average memory utilization over 30 days | | RAM Max % | Absolute peak memory over 30 days | | RAM P95 % | 95th percentile of hourly RAM peaks (ignores top 5% spikes) | | Disk Avg % | Average root filesystem usage | | Net In / Net Out | Average network throughput | | Load Avg | Average system load | | Waste Score | How underutilized (higher = more waste) | | Efficiency | How well utilized (higher = better) | | Trend | CPU/RAM usage direction (+ growth, - decline) | | Recommendation | Actionable suggestion |
Module Structure
ZabbixFinOpsToolkit/
├── manifest.json # Module metadata and action registration
├── Module.php # Menu registration (Monitoring → Infrastructure Cost Analyzer)
├── actions/
│ ├── CostAnalyzer.php # Main controller — queries trends, calculates scores
│ └── CostAnalyzerCsvExport.php # CSV export controller
├── views/
│ ├── finops.costanalyzer.view.php # HTML table view with filters and badges
│ └── finops.costanalyzer.csv.php # CSV output template
├── assets/
│ └── css/
│ └── finops-toolkit.css # Color indicators and card styles
├── docs/
│ └── screenshot-placeholder.png # Screenshot (replace with actual)
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
└── LICENSE
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for guidelines.
Quick Start for Contributors
- Fork this repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes
- Test in a Zabbix 7.x environment
- Submit a Pull Request
Changelog
See CHANGELOG.md for version history.
License
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
Acknowledgments
- Zabbix — the enterprise monitoring platform this module extends
- The FinOps community for inspiration on cloud/infrastructure cost optimization
