Results for "skewed-data"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

50 skills found · Page 1 of 2

ZhiningLiu1998 / Awesome Imbalanced Learning

1.5k

😎 Everything about class-imbalanced/long-tail learning: papers, codes, frameworks, and libraries | 有关类别不平衡/长尾学习的一切：论文、代码、框架与库

universal

awesomeawesome-listclass-imbalance+10

Updated 6d ago

theodesp / Go Heaps

101

Reference implementations of heap data structures in Go - treap, skew, leftlist, pairing, fibonacci

universal

2-3-heapdata-structuresfibonacci-heap+7

Updated 7mo ago

IBM / Xgboost Smote Detect Fraud

Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!

universal

analyticsdata-analysisdata-mining+10

Updated 4d ago

zxjcarrot / 2 Tree

Tiered Indexing is a general approach to improve the memory utilization of buffer-managed data structures including B+tree, Hashing, Heap, and Log-Structured-Merge Tree for skewed workloads.

universal

Updated 2mo ago

grimmlab / PermGWAS

Efficient Permutation-based GWAS for Normal and Skewed Phenotypic Distributions

universal

gpu-accelerationgwaslinear-mixed-models+3

Updated 26d ago

ajayshewale / Sentiment Analysis Of Text Data Tweets

This project addresses the problem of sentiment analysis on Twitter. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report, only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building a good classifier. After creating a lot of custom features, utilizing bag-of-words representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at the level of 58% was achieved. Analysing the public sentiment as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like the stock exchange.

universal

Updated 5mo ago

oslabs-beta / Podz

A lightweight Kubernetes health monitor and architecture visualizer. Made to observe a Kubernetes cluster externally to provide an alternative method to the potential data-skewing and unnecessary resource usage of internal monitoring.

universal

cssdockerelectron+10

Updated 1y ago

YosiSF / EinsteinDB

In a nutshell, EinsteinDB is a persistent indexing scheme based off of LSH-KVX that exploits the distinct merits of hash index and B+-Tree index to support range scan and avoids long NVM writes for maintaining consistency; thus improving on LSH’s performance guarantees for skewed data and adopts ordered-write consistency to ensure crash consistency, while retaining the same storage and query overhead.

universal

distributed-systemshoneybadgerbfthtap+5

Updated 2mo ago

DevO2012 / Cedar

Cedar implements an updatable double-array trie, which offers fast update/lookup for skewed queries in real-world data.

universal

Updated 22d ago

dfelix / Skewt Js

Plot a skew-T log-P diagram based on sounding data

universal

Updated 17d ago

philhagen / Timeshift

A python script to shift the timestamp on syslog data. Useful for forensicators combating time skew.

universal

Updated 10mo ago

CGCL-codes / PStream

PStream is a popularity-aware differentiated distributed stream processing system, which identifies the popularity of keys in the stream data and uses a differentiated partitioning scheme. PStream greatly outperforms Storm on skew distributed data in terms of throughput and processing latency.

universal

Updated 26d ago

icsa-caps / CcKVS

An RDMA skew-aware key-value store, which implements the Scale-Out ccNUMA design, to exploit skew in order to increase performance of data-serving applications.

universal

Updated 1y ago

DibyaRath / Data Engineering Labs

Production-grade distributed data engineering labs focused on Spark internals, large-scale performance tuning, skew handling, streaming state management, and FAANG-level system design.

universal

Updated 17d ago

gmgeorg / Pylambertw

pylambertw - sklearn interface to analyze and gaussianize heavy-tailed, skewed data

universal

feature-engineeringgaussianizeheavy-tailed-distributions+7

Updated 4mo ago

atmacvit / Bincrowd

Official Implementation of ACMMM'21 paper "Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd Counting"

universal

acm-multimedia-2021binning-analysiscomputer-vision+6

Updated 1y ago

SabbaghCodes / ImbalancedLearningForSingleCellFoundationModels

Code for the benchmarking single-cell foundation models (scGPT, scBERT, and Geneformer) for cell-type annotation task using skewed single-cell cell-type distribution data.

universal

Updated 28d ago

CGCL-codes / Simois

Simois is a scalable distributed stream join system, which supports efficient join operations in two streams with highly skewed data distribution. Simois can support the completeness of the join results, and greatly outperforms the existing stream join systems in terms of system throughput and the average processing latency.

universal

Updated 1y ago

FarzadNekouee / Flight EDA To Preprocessing

An extensive exploration and preprocessing of Flight data. The project encompasses detailed EDA (Univariate, Bivariate, and Multivariate analysis), along with comprehensive data preprocessing techniques - missing value treatment, outlier management, categorical encoding, feature scaling, and skewness transformation.

universal

Updated 9mo ago

gunaprsd / SkewedDataGenerator

Skewed Data Generator for TPC-H

universal

Updated 9mo ago