JobFunnel
Scrape job websites into a single spreadsheet with no duplicates.
Install / Use
/learn @PaulMcInnis/JobFunnelREADME
<img src="logo/jobfunnel_banner.svg" alt="JobFunnel Banner" width=400/><br/>
Automated tool for scraping job postings into a .csv file.
Benefits over job search sites:
- Never see the same job twice!
- No advertising.
- See jobs from multiple job search websites all in one place.

Project status: Archived
JobFunnel was built in an era when major job boards exposed mostly static HTML with simple pagination. It made it easy to pull your search results from multiple sources into a single CSV/Excel file.
Since then, most job boards have moved to much more aggressive anti-automation and bot-detection. Re-implementing JobFunnel on top of full browser automation (e.g., Playwright/Selenium) is technically possible, but too slow, fragile, and operationally complex to deliver the “fast, one-shot CLI” experience this project was designed for.
Rather than pretend this still works, I'm archiving the project as a historical/educational artifact. Thanks to everyone who used and starred JobFunnel over the years ❤️
Installation
JobFunnel requires Python 3.11 or later.
pip install git+https://github.com/PaulMcInnis/JobFunnel.git
Usage
By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.
Configure
You can search for jobs with YAML configuration files or by passing command arguments.
Download the demo settings.yaml by running the below command:
wget https://git.io/JUWeP -O my_settings.yaml
NOTE:
-
It is recommended to provide as few search keywords as possible (i.e.
Python,AI). -
JobFunnel currently supports
CANADA_ENGLISH,USA_ENGLISH,UK_ENGLISH,FRANCE_FRENCH, andGERMANY_GERMANlocales.
Scrape
Run funnel with your settings YAML to populate your master CSV file with jobs from available providers:
funnel load -s my_settings.yaml
Review
Open the master CSV file and update the per-job status:
-
Set to
interested,applied,intervieworofferto reflect your progression on the job. -
Set to
archive,rejectedordeleteto remove a job from this search. You can review 'blocked' jobs within yourblock_list_file.
Advanced Usage
-
Automating Searches <br /> JobFunnel can be easily automated to run nightly with crontab <br /> For more information see the crontab document.
-
Writing your own Scrapers <br /> If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details.
-
Remote Work <br /> Bypass a frustrating user experience looking for remote work by setting the search parameter
remotenessto match your desired level, i.e.FULLY_REMOTE. -
Adding Support for X Language / Job Website <br /> JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.
-
Blocking Companies <br /> Filter undesired companies by adding them to your
company_block_listin your YAML or pass them by command line as-cbl. -
Job Age Filter <br /> You can configure the maximum age of scraped listings (in days) by configuring
max_listing_days. -
Reviewing Jobs in Terminal <br /> You can review the job list in the command line:
column -s, -t < master_list.csv | less -#2 -N -S -
Respectful Delaying <br /> Respectfully scrape your job posts with our built-in delaying algorithms.
To better understand how to configure delaying, check out this Jupyter Notebook which breaks down the algorithm step by step with code and visualizations.
-
Recovering Lost Data <br /> JobFunnel can re-build your master CSV from your
cache_folderwhere all the historic scrape data is located:funnel --recover -
Running by CLI <br /> You can run JobFunnel using CLI only, review the command structure via:
funnel inline -h
CAPTCHA
JobFunnel does not solve CAPTCHA. If, while scraping, you receive a
Unable to extract jobs from initial search result page:\ error.
Then open that url on your browser and solve the CAPTCHA manually.
Developer Guide
For contributors and developers who want to work on JobFunnel, this section will guide you through setting up the development environment and the tools we use to maintain code quality and consistency.
Developer Mode Installation
To get started, install JobFunnel in developer mode. This will install all necessary dependencies, including development tools such as testing, linting, and formatting utilities.
To install JobFunnel in developer mode, use the following command:
pip install -e '.[dev]'
This command not only installs the package in an editable state but also sets up pre-commit hooks for automatic code quality checks.
Pre-Commit Hooks
The following pre-commit hooks are configured to run automatically when you commit changes to ensure the code follows consistent style and quality guidelines:
Black: Automatically formats Python code to ensure consistency.isort: Sorts and organizes imports according to the Black style.Prettier: Formats non-Python files such as YAML and JSON.Flake8: Checks Python code for style guide violations.
While the pre-commit package is installed when you run pip install -e '.[dev]', you still need to initialize the hooks by running the following command once:
pre-commit install
How Pre-Commit Hooks Work
The pre-commit hooks will automatically run when you attempt to make a commit. If any formatting issues are found, the hooks will fix them (for Black and isort), or warn you about style violations (for Flake8). This ensures that all committed code meets the project’s quality standards.
You can also manually run the pre-commit hooks at any time with:
pre-commit run --all-files
This is useful to check the entire codebase before committing or as part of a larger code review. Please fix all style guide violations (or provide a reason to ignore) before committing to the repository.
Running Tests
We use pytest to run tests and ensure that the code behaves as expected. Code coverage is automatically generated every time you run the tests.
To run all tests, use the following command:
pytest
This will execute the test suite and automatically generate a code coverage report.
If you want to see a detailed code coverage report, you can run:
pytest --cov-report=term-missing
This will display which lines of code were missed in the test coverage directly in your terminal output.
<!-- links -->