Zabbix FinOps Toolkit

A Zabbix frontend module that identifies underutilized servers by analyzing historical metrics, helping teams reduce infrastructure costs through data-driven right-sizing recommendations. <img width="1600" height="744" alt="image" src="https://github.com/user-attachments/assets/f6118adc-6d44-453c-baee-0b6559a5c030" />

Features

Waste Score & Efficiency Score — quantifies how much each server is under or well-utilized
Growth Trend Detection — compares first week vs. last week averages to avoid premature downsizing
Smart Safeguards — won't recommend reduction if disk, network or load average are saturated
Sortable & Filterable Table — filter by host group, sort by any score column
Top 10 Highlight — visually highlights the most underutilized servers
Color-coded Badges — 🟢 Healthy, 🟡 Moderate, 🔴 High waste at a glance
CSV Export — one-click download for offline analysis or reporting
30-day Historical Analysis — uses Zabbix trends tables for efficient queries
Right-Sizing Simulation — suggests concrete vCPU and RAM targets based on P95 usage

How It Works

The module queries the Zabbix trends and trends_uint tables for the last 30 days and calculates:

| Metric | Source Item Key | |--------|----------------| | CPU utilization (%) | system.cpu.util | | Memory utilization (%) | vm.memory.utilization or vm.memory.size[pavailable] (auto-inverted) | | Disk usage (%) | vfs.fs.size[/,pused] | | Network In/Out | net.if.in / net.if.out | | Load Average | system.cpu.load |

Waste Score

waste_score = 100 - ((cpu_avg + ram_avg) / 2)

| Score | Level | Meaning | |-------|-------|---------| | ≥ 80 | HIGH | Server is heavily underutilized | | 60–79 | MEDIUM | Potential for optimization | | 40–59 | LOW | Moderate usage | | < 40 | HEALTHY | Well utilized |

Efficiency Score

efficiency_score = (cpu_avg + ram_avg) / 2

| Score | Level | |-------|-------| | 70–100 | Healthy usage | | 40–69 | Can be optimized | | 0–39 | High waste |

P95 Peak (Percentile 95)

Instead of using the absolute maximum (which can be skewed by a single 1-hour spike in 720 hours), the module uses the 95th percentile of hourly peaks.

Example: A server has 720 trend rows (30 days × 24 hours).
P95 ignores the top 5% (36 hours) of highest peaks.
If P95 is still high, it means the server regularly reaches that load — not just a rare spike.

The P95 value is calculated by sorting all hourly value_max entries ascending and picking the value at position floor(count × 0.95).

| Metric | P95 Threshold | What it means | |--------|--------------|---------------| | CPU P95 | ≥ 60% | Server regularly hits high CPU — not safe to downsize | | RAM P95 | ≥ 80% | Server regularly hits high RAM — not safe to downsize |

When P95 peaks are high but averages are low, the module shows: "Server mostly idle but with periodic load spikes. Investigate spike patterns before downsizing."

Detection Rules

A server is flagged as oversized only when ALL conditions are met:

CPU average < 20% AND CPU P95 < 60%
RAM average < 40% AND RAM P95 < 80%
Disk usage is NOT near saturation (< 85%)
Network is NOT persistently high (< 100 MB/s avg)
No growth trend projected to exceed thresholds (see below)

If any safeguard triggers, the module explains why reduction is not recommended.

Trend Analysis

The module compares the average of week 1 (days 1–7) against the average of week 4 (days 24–30) to detect workload growth.

Data quality gate: Each week must have at least 24 hours of trend data. If a host was recently added or trend data was purged, the trend shows "N/A" instead of producing misleading results.

Projection-based blocking: Instead of blocking on any small increase, the module projects forward:

cpu_projected = cpu_avg + cpu_trend
ram_projected = ram_avg + ram_trend

Downsizing is only blocked if the projected value would reach or exceed the threshold (CPU ≥ 20% or RAM ≥ 40%). This means a server at 5% CPU growing +5pp (projected 10%) is still flagged for reduction, while a server at 15% CPU growing +8pp (projected 23%) is correctly held back.

Right-Sizing Simulation

Beyond detecting waste, the module suggests concrete right-sizing targets — answering the question every manager asks: "If I reduce, what should I reduce to?"

Formula:

recommended = current_allocation × 0.80

The module recommends 80% of the current allocation — directly, without rounding to predefined VM sizes. A safety check ensures the recommendation is never below the server's actual P95 peak usage.

| Step | Description | |------|-------------| | 1. Read current specs | system.cpu.num (vCPUs) and vm.memory.size[total] (RAM bytes) | | 2. Calculate 80% of current | CPU: floor(current × 0.80) — RAM: round(current × 0.80, 1 decimal) | | 3. Safety check | Ensure recommended ≥ P95 actual usage — if not, no recommendation is made |

Example:

| Host | vCPUs | vCPU Rec. | RAM | RAM Rec. | |------|-------|-----------|-----|----------| | db-mongo-dev | 4 vCPU | 3 vCPU | 7.5 GB | 6.0 GB |

A server with 7.5 GB RAM → 80% = 6.0 GB. P95 RAM usage is 21% (1.57 GB actual) — 6.0 ≥ 1.57, so the recommendation is safe.

Notes:

Recommendations only appear when the suggested size is smaller than current (no upsizing suggestions).
If system.cpu.num or vm.memory.size[total] items are not available, the columns show "N/A".
The "—" symbol means no reduction is recommended (current size is already optimal or near-optimal).

Safeguards:

P95 = 0% → recommendation is skipped entirely (likely missing or broken data).
P95 safety floor → if 80% of current would be below actual P95 usage, no recommendation is made.
Minimum RAM: 2 GB — the module never recommends less than 2 GB.
Minimum vCPU: 1 — the module never recommends less than 1 vCPU.

Requirements

Zabbix: 7.0.0 to 7.4.x (tested on 7.4.7)
PHP: 8.0 or higher
Hosts must be monitored with standard OS templates (Linux by Zabbix Agent, etc.)
Trends data must be available (at least 1 hour of collection for trends to populate)

Installation

Git Clone

cd /usr/share/zabbix/ui/modules/
git clone https://github.com/Lfijho/ZabbixFinOps.git

Enable the Module

Log in to the Zabbix frontend as an Admin user
Navigate to Administration → General → Modules
Click Scan directory
Find "Zabbix FinOps Toolkit" in the list
Click Enable

The module will appear in the menu under Monitoring → Infrastructure Cost Analyzer.

Usage

Navigate to Monitoring → Infrastructure Cost Analyzer
(Optional) Filter by host group using the multiselect dropdown
Click Apply to filter results
Click any column header (Waste Score, Efficiency, CPU Avg, RAM Avg) to sort
Click Export CSV to download the report

Understanding the Results Table

| Column | Description | |--------|-------------| | Host | Server hostname | | Host Group | Zabbix host group(s) | | CPU Avg % | Average CPU utilization over 30 days | | CPU Max % | Absolute peak CPU over 30 days | | CPU P95 % | 95th percentile of hourly CPU peaks (ignores top 5% spikes) | | RAM Avg % | Average memory utilization over 30 days | | RAM Max % | Absolute peak memory over 30 days | | RAM P95 % | 95th percentile of hourly RAM peaks (ignores top 5% spikes) | | Disk Avg % | Average root filesystem usage | | Net In / Net Out | Average network throughput | | Load Avg | Average system load | | Waste Score | How underutilized (higher = more waste) | | Efficiency | How well utilized (higher = better) | | Trend | CPU/RAM usage direction (+ growth, - decline) | | Recommendation | Actionable suggestion |

Module Structure

ZabbixFinOpsToolkit/
├── manifest.json                          # Module metadata and action registration
├── Module.php                             # Menu registration (Monitoring → Infrastructure Cost Analyzer)
├── actions/
│   ├── CostAnalyzer.php                   # Main controller — queries trends, calculates scores
│   └── CostAnalyzerCsvExport.php          # CSV export controller
├── views/
│   ├── finops.costanalyzer.view.php       # HTML table view with filters and badges
│   └── finops.costanalyzer.csv.php        # CSV output template
├── assets/
│   └── css/
│       └── finops-toolkit.css             # Color indicators and card styles
├── docs/
│   └── screenshot-placeholder.png         # Screenshot (replace with actual)
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
└── LICENSE

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Quick Start for Contributors

Fork this repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes
Test in a Zabbix 7.x environment
Submit a Pull Request

Changelog

See CHANGELOG.md for version history.

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Acknowledgments

Zabbix — the enterprise monitoring platform this module extends
The FinOps community for inspiration on cloud/infrastructure cost optimization

ZabbixFinOps

Install / Use

README