7,258 skills found · Page 16 of 242
abhionlyone / Us Car Models DataIntroducing the most comprehensive and up-to-date open source dataset on US car models on Github. With over 15,000 entries covering car models manufactured between 1992 and 2023, this repository offers valuable information for anyone looking to incorporate car data into their applications. Best of all, it's completely free to use!
shafiab / HashtagCashtagMy Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
AltimateAI / Altimate CodeOpen-source agentic data engineering harness for dbt, SQL, and cloud warehouses. 100+ tools, 10 warehouses, AI-powered.
lefterisloukas / Edgar CrawlerThe only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)
GoogleCloudPlatform / Professional Services Data ValidatorUtility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
estebanpdl / OsintgptAn open-source intelligence (OSINT) analysis tool leveraging GPT-powered embeddings and vector search engines for efficient data processing
Gsync / JobsyncJobSync is a self-hosted, open-source job application tracker and AI-powered career assistant. Built with Next.js and Shadcn UI, it helps job seekers manage their search journey with AI resume review, job matching, task logging, and application analytics—all while keeping your data private.
cptn-io / El Cptncptn.io is an open-source platform that helps develop and deploy integrations and data pipelines quickly and easily.
oazabir / DroptilesDroptiles is a "Windows 8 Start" like Metro-style Web 2.0 Dashboard. It compromises of Live Tiles. Tiles are mini apps that can fetch data from external sources. Clicking on a tile launches the full application.
xark-argo / ArgoARGO is an open-source AI Agent platform that brings Local Manus to your desktop. With one-click model downloads, seamless closed LLM integration, and offline-first RAG knowledge bases, ARGO becomes a DeepResearch powerhouse for autonomous thinking, task planning, and 100% of your data stays locally. Support Win/Mac/Docker.
dromara / CloudEonCloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
raspofabs / DodbooksourcecodeSource code to the data-oriented design book
tim-learn / SHOTcode released for our ICML 2020 paper "Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation"
OpenClinica / OpenClinicaOpenClinica is the world's first commercial open source clinical trial software for Electronic Data Capture (EDC) Clinical Data Management (CDM).
jamdotdev / Jam Dev UtilitiesLightweight utils set - fast and open-source. It's got cmd+k search & everything's client-side. No ads, your data stays local.
jdorn / Php ReportsA PHP framework for displaying reports from any data source, including SQL and MongoDB
eclipse-streamsheets / StreamsheetsAn open-source tool for processing stream data using a spreadsheet-like interface.
osalvador / ReplicaDBReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases
irods / IrodsOpen Source Data Management Software
allenai / LumosCode and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"