File.d
A blazing fast tool for building data pipelines: read, process and output events. Our community: https://t.me/file_d_community
Install / Use
/learn @ozontech/File.dREADME

Overview
file.d is a blazing fast tool for building data pipelines: read, process, and output events. Primarily developed to read from files, but also supports numerous input/action/output plugins.
⚠ Although we use it in production,
it still isn't v1.0.0. Please, test your pipelines carefully on dev/stage environments.
Contributing
file.d is an open-source project and contributions are very welcome!
Please make sure to read our contributing guide before creating an issue and opening a PR!
Motivation
Well, we already have several similar tools: vector, filebeat, logstash, fluend-d, fluent-bit, etc.
Performance tests state that best ones achieve a throughput of roughly 100MB/sec. Guys, it's 2023 now. HDDs and NICs can handle the throughput of a few GB/sec and CPUs processes dozens of GB/sec. Are you sure 100MB/sec is what we deserve? Are you sure it is fast?
Main features
- Fast: more than 10x faster compared to similar tools
- Predictable: it uses pooling, so memory consumption is limited
- Reliable: doesn't lose data due to commitment mechanism
- Container / cloud / kubernetes native
- Simply configurable with YAML
- Prometheus-friendly: transform your events into metrics on any pipeline stage
- Vault-friendly: store sensitive info and get it for any pipeline parameter
- Well-tested and used in production to collect logs from Kubernetes cluster with 3000+ total CPU cores
Performance
On MacBook Pro 2017 with two physical cores file.d can achieve the following throughput:
- 1.7GB/s in
files > devnullcase - 1.0GB/s in
files > json decode > devnullcase
More benchmarks can be found here.
TBD: throughput on production servers.
Plugins
Input: dmesg, fake, file, http, journalctl, k8s, kafka, socket
Action: add_file_name, add_host, cardinality, convert_date, convert_log_level, convert_utf8_bytes, debug, decode, discard, flatten, hash, join, join_template, json_decode, json_encode, json_extract, keep_fields, mask, modify, move, parse_es, parse_re2, remove_fields, rename, set_time, split, throttle
Output: clickhouse, devnull, elasticsearch, file, gelf, http, kafka, loki, postgres, s3, splunk, stdout
Logging system
We have also developed a scalable and high-performance database for logs — seq-db. File.d can send data to seq-db using Elasticsearch Bulk API. An example of sending data from file.d to seq-db can be found here.
What's next
Join our community in Telegram: https://t.me/file_d_community <br>Generated using insane-doc
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
