Octomender
Get repo recommendation based on your GitHub star history. (EoS)
Install / Use
/learn @yilinjuang/OctomenderREADME
Octomender
Github Repo Recommender System.
Octomender = Octocat + Recommender
Get repo recommendation based on your GitHub star history.
<a href="https://octomend.com">~~[HELP] Algorithm Testing~~</a> End of Service
~~The recommendation algorithm is deployed and being tested on octomend.com.~~
~~Visit octomend.com to help improve the recommendation.~~
End of Service since GitHub published "Discover Repositories" service.
Dependencies
- redis: An in-memory database that persists on disk
Core
- hireids: Minimalistic C client for Redis >= 1.2
- OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming
Preprocessing
- redis-py: Redis Python Client
Website
- Flask: A microframework for Python based on Werkzeug, Jinja 2 and good intentions
- GitHub-Flask: Flask extension for authenticating users with GitHub and making requests to the API
- gunicorn: A Python WSGI HTTP Server for UNIX
- google-cloud-datestore: Low-level Java and Python client libraries for Google Cloud Datastore
Dataset
Build Core
cd core; make
Preprocessing
parse.py
Parse raw json data files into three pickle data files.
- output-data-basename.user: map of user id (str) to user name (str)
- output-data-basename.repo: map of repo id (int) to repo name (str)
- output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
-m, --member parse MemberEvent.
-w, --watch parse WatchEvent.
Ex: parse.py -m 2017-06-01-0.json data
Ex: parse.py --watch json/2017-05/ data/2017-05
Refer raw json data format to GitHub API v3.
parse_mp.py
Ditto, but run with multiprocessing. Default number of processes is 16.
Usage: parse_mp.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
-m, --member parse MemberEvent.
-w, --watch parse WatchEvent.
n-process number of processes when multiprocessing.
Ex: parse.py -m 2017-06-01-0.json data
Ex: parse.py --watch json/2017-05/ data/2017-05 32
mergedata.py
Merge multiple pickle data files into one.
Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex: mergedata.py data/2016-010203/ data/2016-Q1
graph2redis.py
Insert graph data into redis database.
Usage: graph2redis.py <input-edgelist> <redis-port>
Ex: graph2redis.py data/2016-Q1.edge 6379
Thanks
importpython and reddit.


License
MIT
