50 skills found · Page 1 of 2
ZhiningLiu1998 / Awesome Imbalanced Learning😎 Everything about class-imbalanced/long-tail learning: papers, codes, frameworks, and libraries | 有关类别不平衡/长尾学习的一切:论文、代码、框架与库
theodesp / Go HeapsReference implementations of heap data structures in Go - treap, skew, leftlist, pairing, fibonacci
IBM / Xgboost Smote Detect FraudCan we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!
zxjcarrot / 2 TreeTiered Indexing is a general approach to improve the memory utilization of buffer-managed data structures including B+tree, Hashing, Heap, and Log-Structured-Merge Tree for skewed workloads.
grimmlab / PermGWASEfficient Permutation-based GWAS for Normal and Skewed Phenotypic Distributions
ajayshewale / Sentiment Analysis Of Text Data Tweets This project addresses the problem of sentiment analysis on Twitter. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report, only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building a good classifier. After creating a lot of custom features, utilizing bag-of-words representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at the level of 58% was achieved. Analysing the public sentiment as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like the stock exchange.
oslabs-beta / PodzA lightweight Kubernetes health monitor and architecture visualizer. Made to observe a Kubernetes cluster externally to provide an alternative method to the potential data-skewing and unnecessary resource usage of internal monitoring.
YosiSF / EinsteinDBIn a nutshell, EinsteinDB is a persistent indexing scheme based off of LSH-KVX that exploits the distinct merits of hash index and B+-Tree index to support range scan and avoids long NVM writes for maintaining consistency; thus improving on LSH’s performance guarantees for skewed data and adopts ordered-write consistency to ensure crash consistency, while retaining the same storage and query overhead.
DevO2012 / CedarCedar implements an updatable double-array trie, which offers fast update/lookup for skewed queries in real-world data.
dfelix / Skewt JsPlot a skew-T log-P diagram based on sounding data
philhagen / TimeshiftA python script to shift the timestamp on syslog data. Useful for forensicators combating time skew.
CGCL-codes / PStreamPStream is a popularity-aware differentiated distributed stream processing system, which identifies the popularity of keys in the stream data and uses a differentiated partitioning scheme. PStream greatly outperforms Storm on skew distributed data in terms of throughput and processing latency.
icsa-caps / CcKVSAn RDMA skew-aware key-value store, which implements the Scale-Out ccNUMA design, to exploit skew in order to increase performance of data-serving applications.
DibyaRath / Data Engineering LabsProduction-grade distributed data engineering labs focused on Spark internals, large-scale performance tuning, skew handling, streaming state management, and FAANG-level system design.
gmgeorg / Pylambertwpylambertw - sklearn interface to analyze and gaussianize heavy-tailed, skewed data
atmacvit / BincrowdOfficial Implementation of ACMMM'21 paper "Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd Counting"
SabbaghCodes / ImbalancedLearningForSingleCellFoundationModelsCode for the benchmarking single-cell foundation models (scGPT, scBERT, and Geneformer) for cell-type annotation task using skewed single-cell cell-type distribution data.
CGCL-codes / SimoisSimois is a scalable distributed stream join system, which supports efficient join operations in two streams with highly skewed data distribution. Simois can support the completeness of the join results, and greatly outperforms the existing stream join systems in terms of system throughput and the average processing latency.
FarzadNekouee / Flight EDA To PreprocessingAn extensive exploration and preprocessing of Flight data. The project encompasses detailed EDA (Univariate, Bivariate, and Multivariate analysis), along with comprehensive data preprocessing techniques - missing value treatment, outlier management, categorical encoding, feature scaling, and skewness transformation.
gunaprsd / SkewedDataGeneratorSkewed Data Generator for TPC-H