Glove
:earth_americas: Compute Glove vectors using a co-occurence matrix
Install / Use
/learn @JonathanRaiman/GloveREADME
Glove
Cython general implementation of the Glove multi-threaded training.
GloVe is an unsupervised learning algorithm for generating vector representations for words. Training is done using a co-occcurence matrix from a corpus. The resulting representations contain structure useful for many other tasks.
The paper describing the model is here.
The original implementation for this Machine Learning model can be found here.
@author Jonathan Raiman
Example
To use this package you need a sparse co-occurence matrix. This matrix is represented by nested dictionaries that use ints as keys with a 0-index.
For instance below we have a corpus of 3 indices. Below 0 co-occurs with 2, 3.5 times:
import glove
cooccur = {
0: {
0: 1.0,
2: 3.5
},
1: {
2: 0.5
},
2: {
0: 3.5,
1: 0.5,
2: 1.2
}
}
model = glove.Glove(cooccur, d=50, alpha=0.75, x_max=100.0)
for epoch in range(25):
err = model.train(batch_size=200, workers=9, batch_size=50)
print("epoch %d, error %.3f" % (epoch, err), flush=True)
The trained embeddings are now present under model.W.
Usage
The model is controlled by setting several hyperpameters.
Glove.init()
cooccurencedict<int, dict<int, float>> : the co-occurence matrixalphafloat : (default 0.75) hyperparameter for controlling the exponent for normalized co-occurence counts.x_maxfloat : (default 100.0) hyperparameter for controlling smoothing for common items in co-occurence matrix.dint : (default 50) how many embedding dimensions for learnt vectorsseedint : (default 1234) the random seed
Glove.train
step_sizefloat : the learning rate for the modelworkersint : number of worker threads used for trainingbatch_sizeint : how many examples should each thread receive (controls the size of the job queue)
Related Skills
node-connect
341.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.4kCommit, push, and open a PR
