703 skills found · Page 1 of 24
risingwavelabs / RisingwaveEvent streaming platform for agents, apps, and analytics. Continuously ingest, transform, and serve event data in real time, at scale.
adithya-s-k / OmniparseIngest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
jitsucom / JitsuJitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
lakesoul-io / LakeSoulLakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
apache / Incubator DevlakeApache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
airbnb / StreamalertStreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
dashbitco / BroadwayConcurrent and multi-stage data ingestion and data processing with Elixir
apache / GobblinA distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
bruin-data / BruinBuild data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
tilo / Smarter CsvFastest end-to-end CSV ingestion for Ruby (with C acceleration). SmarterCSV auto-detects formats, applies smart defaults, and returns Rails-ready hashes for seamless use with ActiveRecord, Sidekiq, parallel jobs, and S3 pipelines — even for messy user-uploaded real-world data.
datazip-inc / OlakeOLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain Parquet. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supported sources : Postgres, MongoDB, MySQL, Oracle, MSSql, DB2, Kafka, S3.
kevwan / Go Stashgo-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch.
NH-RED-TEAM / RustHoundActive Directory data ingestor for BloodHound Legacy written in Rust. 🦀
sitewhere / SitewhereSiteWhere is an industrial strength open-source application enablement platform for the Internet of Things (IoT). It provides a multi-tenant microservice-based infrastructure that includes device/asset management, data ingestion, big-data storage, and integration through a modern, scalable architecture. SiteWhere provides REST APIs for all system functionality. SiteWhere provides SDKs for many common device platforms including Android, iOS, Arduino, and any Java-capable platform such as Raspberry Pi rapidly accelerating the speed of innovation.
alanchn31 / Data Engineering ProjectsPersonal Data Engineering Projects
danielealbano / Cachegrandcachegrand - a modern data ingestion, processing and serving platform built for today's hardware
jonfairbanks / Local RagIngest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.
dgarnitz / VectorflowVectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
NationalSecurityAgency / DatawaveDataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
rax-maas / BluefloodA distributed system designed to ingest and process time series data