Smrt
Handle class imbalance intelligently by using variational auto-encoders to generate synthetic observations of your minority class.
Install / Use
/learn @tgsmith61591/SmrtREADME
Synthetic Minority Reconstruction Technique (SMRT)
Handle your class imbalance more intelligently by using SMOTE's younger, more sophisticated cousin
Installation
Installation is easy. After cloning the project onto your machine and installing the required dependencies,
simply use the setup.py file:
$ git clone https://github.com/tgsmith61591/smrt.git
$ cd smrt
$ python setup.py install
About
SMRT (Sythetic Minority Reconstruction Technique) is the new SMOTE (Synthetic Minority Oversampling TEchnique). Using variational auto-encoders, SMRT learns the latent factors that best reconstruct the observations in each minority class, and then generates synthetic observations until the minority class is represented at a user-defined ratio in relation to the majority class size.
SMRT avoids one of SMOTE's greatest risks: In SMOTE, when drawing random observations from whose k-nearest
neighbors to synthetically reconstruct, the possibility exists that a "border point," or an observation very close to
the decision boundary may be selected. This could result in the synthetically-generated observations lying
too close to the decision boundary for reliable classification, and could lead to the degraded performance
of an estimator. SMRT avoids this risk implicitly, as the VariationalAutoencoder
learns a distribution that is generalizable to the lowest-error (i.e., most archetypal) observations.
See the paper for more in-depth reference.
Example
The SMRT example is an ipython notebook with reproducible code and data that compares an imbalanced variant of the MNIST dataset after being balanced with both SMOTE and SMRT. The following are several of the resulting images produced from both SMOTE and SMRT, respectively. Even visually, it's evident that SMRT better synthesizes data that resembles the input data.
Original:
The MNIST dataset was amended to contain only zeros and ones in an unbalanced (~1:100, respectively) ratio. The top row are the original MNIST images, the second row is the SMRT-generated images, and the bottom row is the SMOTE-generated images: <br/> <img src="examples/img/mnist_smrt_smote.png" width="600" alt="Original"/>
Notes
- See examples for usage
- See the paper for more in-depth documentation
- Information on the authors
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
research_rules
Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
