Results for "ingest-data"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

703 skills found · Page 1 of 24

risingwavelabs / Risingwave

8.9k

Event streaming platform for agents, apps, and analytics. Continuously ingest, transform, and serve event data in real time, at scale.

zed

apache-icebergdata-engineeringdatabase+6

Updated 6h ago

adithya-s-k / Omniparse

6.8k

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

universal

ingestion-apiocromniparser+5

Updated 6h ago

jitsucom / Jitsu

4.7k

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

universal

bigqueryclickhousedata-collection+6

Updated 6h ago

lakesoul-io / LakeSoul

3.2k

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

zed

arrowbig-datadatafusion+15

Updated 10h ago

apache / Incubator Devlake

3.0k

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

universal

dashboard-friendlydatadata-analysis+13

Updated 50m ago

airbnb / Streamalert

2.9k

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.

universal

analysisawskinesis+5

Updated 6h ago

dashbitco / Broadway

2.6k

Concurrent and multi-stage data ingestion and data processing with Elixir

universal

broadwayconcurrentdata-ingestion+3

Updated 20h ago

apache / Gobblin

2.3k

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

universal

apachedataingestion+2

Updated 1d ago

bruin-data / Bruin

1.5k

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

universal

analyticsbigquerydata-analysis+8

Updated 8h ago

tilo / Smarter Csv

1.5k

Fastest end-to-end CSV ingestion for Ruby (with C acceleration). SmarterCSV auto-detects formats, applies smart defaults, and returns Rails-ready hashes for seamless use with ActiveRecord, Sidekiq, parallel jobs, and S3 pipelines — even for messy user-uploaded real-world data.

universal

csvcsv-convertercsv-export+14

Updated 6d ago

datazip-inc / Olake

1.3k

OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain Parquet. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supported sources : Postgres, MongoDB, MySQL, Oracle, MSSql, DB2, Kafka, S3.

universal

apache-icebergcdcchange-data-capture+9

Updated 5h ago

kevwan / Go Stash

1.2k

go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch.

universal

elasticsearchelkkafka+1

Updated 6d ago

NH-RED-TEAM / RustHound

1.1k

Active Directory data ingestor for BloodHound Legacy written in Rust. 🦀

universal

active-directoryadcsbloodhound+11

Updated 15h ago

sitewhere / Sitewhere

1.0k

SiteWhere is an industrial strength open-source application enablement platform for the Internet of Things (IoT). It provides a multi-tenant microservice-based infrastructure that includes device/asset management, data ingestion, big-data storage, and integration through a modern, scalable architecture. SiteWhere provides REST APIs for all system functionality. SiteWhere provides SDKs for many common device platforms including Android, iOS, Arduino, and any Java-capable platform such as Raspberry Pi rapidly accelerating the speed of innovation.

universal

androidarduinoaws+17

Updated 6d ago

alanchn31 / Data Engineering Projects

1.0k

Personal Data Engineering Projects

universal

airflowaws-redshiftcassandra+11

Updated 49m ago

danielealbano / Cachegrand

1.0k

cachegrand - a modern data ingestion, processing and serving platform built for today's hardware

universal

cachinghigh-performanceio-uring+12

Updated 12d ago

jonfairbanks / Local Rag

737

Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.

universal

large-language-modelsllmollama+2

Updated 6h ago