Sciscraper
A bulk academic PDF extractor program, designed specifically for papers about behavioral science and design.
Install / Use
/learn @Pathos315/SciscraperREADME
sciscraper
sciscraper is a Python package and command line interface (CLI) for scraping scientific articles from various sources.
It should work on Python 3.8+; on MacOS, Windows, and Linux.
Installation
You can install sciscraper via pip:
pip install sciscraper
Usage
sciscraper offers the following scraping choices:
- directory: takes a directory of .pdf files, and returns a .csv file of bibliographic data for each;
- wordscore: takes a .csv file of bibliographic data for multiple papers and returns a .csv with a percentage value of its relevance to the configured query;
- citations: takes a .csv file of bibliographic data for multiple papers and returns a .csv of their citations (i.e. the ensuing papers that cited them);
- reference: takes a .csv file of bibliographic data for multiple papers and returns a .csv of their references (i.e. the papers that were referenced in the originals);
- download: experimental takes a .csv file of bibliographic data for multiple papers, attempts to download .pdfs of each into a directory; and,
- images: takes a .csv file of bibliographic data for multiple papers, attempts to download charts and figures of the papers from SemanticScholar.
You can initialize sciscraper from the terminal by entering sciscraper followed by -m, the designated scraping choice (see above), and the target file. For example:
sciscraper -m directory <folder pathname goes here...>
Or alternatively:
sciscraper -m wordscore <filename.csv>
And so forth.
As Featured on ArjanCodes' Code Roast
- PART ONE: -> https://youtu.be/MXM6VEtf8SE
- PART TWO: -> https://www.youtube.com/watch?v=6ac4Um2Vicg
Special Thanks
- ArjanCodes
- Michele Cotrufo
- Nathan Lippi
- Jon Watson Rooney
- Colin Meret
- James Murphy (mCoding)
- Micael Jarniac
Maintainer
John Fallot john.fallot@gmail.com
License
The MIT License Copyright (c) 2021- John Fallot
Related Skills
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
50PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
pm
PM Agent Rule This rule is triggered when the user types `@pm` and activates the Product Manager agent persona.
