Jobdescription2jobtitle
classify a job description (or noisy job title) into a ONET job title
Install / Use
/learn @afshinrahimi/Jobdescription2jobtitleREADME
================
jobdescription2jobtitle readme
Introduction
This program given a piece of text such as a cv, job summary or a Linkdein profile converts it to a 300d vector (using average of word vectors) and ranks ONET job titles based on similarity to that description. The ONET is a standard dataset consisting of about 1100 job titles and their description. It includes other information about jobs that we didn't use here.
For each job title and description, a 300d average word vector is built. Given a piece of text the program finds the most similar job titles related to that text.
The similarity/distance distribution of a piece of text to a 1100d job titles can be used for comparison to another piece of text to see if both pieces of text are corresponding to one person or not using cosine distance between them.
If two pieces of text correspond to the same person their distance to 1100 job titles should be similar (their cosine distance should be low).
The cosine distance between two pieces of text can be used as a single feature when trying to decide if two pieces of text correspond to a single person or not.
To run the program gensim should be installed and the pre-trained Google word2vec file should be downloaded and the path in the source changed accordingly.
Pre-trained word vectors
download it from https://docs.google.com/uc?id=0B7XkCwpI5KDYNlNUTTlSS21pQmM&export=download and move it into the resources directory.
Job Title and Description
can be downloaded from ONET dataset here https://www.onetcenter.org/dl_files/database/db_21_0_text/Occupation%20Data.txt.
Contact
Afshin Rahimi afshinrahimi@gmail.com
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
