Picnic
A weekly basket with the latest published research in political science
Install / Use
/learn @sumtxt/PicnicREADME
Paper Picnic 2.0
A weekly basket with the latest published research in political science. On Fridays at 2 AM UTC, we query the Crossref API for new research articles that appeared in the previous 7 days across many journals in political science and adjacent fields. paper-picnic.com/
The crawler lives in the main branch of the backend while the website is rendered from the gh-pages branch.
Setup
Local Development
-
Install Python 3.11
pyenv install 3.11 pyenv local 3.11 -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies
pip install -r requirements.txt # For development (includes testing tools) pip install -r requirements-dev.txt -
Configure environment variables
Create a
.envfile in the project root:OPENAI_APIKEY=your_openai_api_key CROSSREF_EMAIL=your_email@example.com
GitHub Actions Setup
After forking the repository, you need to configure repository settings:
-
Enable Workflow Permissions
- Go to Settings > Actions > General
- Scroll to "Workflow permissions"
- Allow workflows to read and write in the repository
-
Set Repository Secrets
- Go to Security > Secrets and Variables > Actions
- Add the following secrets:
OPENAI_APIKEY- OpenAI API key for article classificationCROSSREF_EMAIL- Your email for polite Crossref API requestsRESEND_API_KEY- Resend.com API key for email notificationsRESEND_EMAIL_FROM- Sender email addressRESEND_EMAIL_TO- Recipient email address
Usage
Local Crawl
Run the crawler
python main.py
Use the parameters in ./src/config.py to disable some features of the crawler for local testing purposes.
Run Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=term --cov-report=html
# Run specific test file
pytest tests/test_parsers.py
Project Structure
picnic/
├── src/ # Source code modules
│ ├── config.py # Configuration and constants
│ ├── crossref_client.py # Crossref API client
│ ├── openai_client.py # OpenAI API integration
| ├── osf_client.py # OSF API client
│ ├── parsers.py # Response parsing
│ ├── filters.py # Article filtering logic
│ ├── data_processor.py # Data cleaning and deduplication
│ ├── json_renderer.py # JSON output formatting
│ └── stats_updater.py # Statistics management
├── main.py # Main crawl script
├── tests/ # Unit tests
├── parameters/ # Journal/OSF configurations
├── memory/ # Crawl history for deduplication
├── output/ # Generated JSON files and statistics
├── notification/ # Email notification system (Node.js)
├── .github/workflows/ # GitHub Actions automation
└── requirements.txt # Python dependencies
How It Works
The crawler (main.py) runs two parallel workflows:
1. Crossref Journal Crawl
- Tests Crossref API endpoints (public vs polite) to select the faster one
- Queries
/worksendpoint with batched ISSNs from parameters/journals.json - Searches both
createdandpublisheddates (default: 14 days ago to 1 day ago) - Parses metadata and removes duplicates using memory/doi.csv
- Merges journal info and applies filters:
- Standard: Removes editorials, ToCs, errata by title pattern
- Nature: Keeps only articles with
/sin URL - Science: Keeps only articles with abstracts ≥200 chars
- AI (optional): Uses GPT-4o-mini to classify social science relevance
- Outputs to output/publications.json
2. OSF Preprints Crawl
- Loads subject filter from parameters/osf_subjects.json ("Social and Behavioral Sciences")
- Queries OSF API date-by-date within crawl window
- Parses metadata, deduplicates versions (keeps latest), removes past preprints using memory/osf_ids.csv
- Outputs to output/preprints.json
3. Statistics & Automation
- Stats: Counts articles per journal, updates output/stats.csv
- GitHub Actions:
- Update Website workflow syncs outputs to
gh-pagesbranch - Send Notification workflow triggers after the Crawl workflow completes
- Sends an email with a subset of publications via Resend.com
- Update Website workflow syncs outputs to
Behavior is configurable via src/config.py (crawl window, memory updates, filter toggles, etc.)
History
The first version of the crawler went live in August 2024. Paper Picnic 2.0, rewritten in Python by Claude Code based on the original R version, launched in February 2026 after running side-by-side since January. The legacy R crawler remains available in the main_v0 branch, and the original website in gh-pages_v0.
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
