Open source security data lake for AWS

Matano Open Source Security data lake is an open source cloud-native security data lake, built for security teams on AWS.

[!NOTE] Matano offers a commercial managed Cloud SIEM for a complete enterprise Security Operations platform. Learn more.

<div> <h3 align="center"> <a href="https://www.matano.dev/docs">Docs</a> <span> | </span> <a href="https://www.matano.dev">Website</a> <span> | </span> <a href="https://discord.gg/YSYfHMbfZQ">Community</a> </h3> </div>

Features

Security Data Lake: Normalize unstructured security logs into a structured realtime data lake in your AWS account.
Collect All Your Logs: Integrates out of the box with 50+ sources for security logs and can easily be extended with custom sources.
Detection-as-Code: Use Python to build realtime detections as code. Support for automatic import of Sigma detections to Matano.
Log Transformation Pipeline: Supports custom VRL (Vector Remap Language) scripting to parse, enrich, normalize and transform your logs as they are ingested without managing any servers.
No Vendor Lock-In: Uses an open table format (Apache Iceberg) and open schema standards (ECS), to give you full ownership of your security data in a vendor-neutral format.
Bring Your Own Analytics: Query your security lake directly from any Iceberg-compatible engine (AWS Athena, Snowflake, Spark, Trino etc.) without having to copy data around.
Serverless: Fully serverless and designed specifically for AWS and focuses on enabling high scale, low cost, and zero-ops.

Architecture

👀 Use cases

Reduce SIEM costs.
Augment your SIEM with a security data lake for additional context during investigations.
Write detections-as-code using Python to detect suspicious behavior & create contextualized alerts.
ECS-compatible serverless alternative to ELK / Elastic Security stack.

✨ Integrations

Managed log sources

Alert destinations

Query engines

Quick start

View the complete installation instructions

Installation

Install the matano CLI to deploy Matano into your AWS account, and manage your deployment.

Linux

curl -OL https://github.com/matanolabs/matano/releases/download/nightly/matano-linux-x64.sh
chmod +x matano-linux-x64.sh
sudo ./matano-linux-x64.sh

macOS

curl -OL https://github.com/matanolabs/matano/releases/download/nightly/matano-macos-x64.sh
chmod +x matano-macos-x64.sh
sudo ./matano-macos-x64.sh

Deployment

Read the complete docs on getting started

To get started, run the matano init command.

Make sure you have AWS credentials in your environment (or in an AWS CLI profile).
The interactive CLI wizard will walk you through getting started by generating an initial Matano directory for you, initializing your AWS account, and deploying into your AWS account.
Initial deployment takes a few minutes.

Directory structure

Once initialized, your Matano directory is used to control & manage all resources in your project e.g. log sources, detections, and other configuration. It is structured as follows:

➜  example-matano-dir git:(main) tree
├── detections
│   └── aws_root_credentials
│       ├── detect.py
│       └── detection.yml
├── log_sources
│   ├── cloudtrail
│   │   ├── log_source.yml
│   │   └── tables
│   │       └── default.yml
│   └── zeek
│       ├── log_source.yml
│       └── tables
│           └── dns.yml
├── matano.config.yml
└── matano.context.json

When onboarding a new log source or authoring a detection, run matano deploy from anywhere in your project to deploy the changes to your account.

🔧 Log Transformation & Data Normalization

Read the complete docs on configuring custom log sources

Vector Remap Language (VRL), allows you to easily onboard custom log sources and encourages you to normalize fields according to the Elastic Common Schema (ECS) to enable enhanced pivoting and bulk search for IOCs across your security data lake.

Users can define custom VRL programs to parse and transform unstructured logs as they are being ingested through one of the supported mechanisms for a log source (e.g. S3, SQS).

VRL is an expression-oriented language designed for transforming observability data (e.g. logs) in a safe and performant manner. It features a simple syntax and a rich set of built-in functions tailored specifically to observability use cases.

Example: parsing JSON

Let's have a look at a simple example. Imagine that you're working with HTTP log events that look like this:

{
  "line": "{\"status\":200,\"srcIpAddress\":\"1.1.1.1\",\"message\":\"SUCCESS\",\"username\":\"ub40fan4life\"}"
}

You want to apply these changes to each event:

Parse the raw line string into JSON, and explode the fields to the top level
Rename srcIpAddress to the source.ip ECS field
Remove the username field
Convert the message to lowercase

Adding this VRL program to your log source as a transform step would acco

Matano

Install / Use

README