Mediacommtools
Tools/resources for journalism, media, communication, computational social science
Install / Use
/learn @peterdalle/MediacommtoolsREADME
Tools for media and communication research
A list of digital tools and resources for journalism, media and communication research, and computational social science.
Table of contents
- Find datasets
- Content analysis, text analysis, text mining, annotation
- Compare differences between texts, find duplicate files
- Television
- Social networking sites and specific sites
- Scrape and extract news articles
- Online archives and archiving
- Journal articles, citations, bibliometrics
- Literature search
- Find retracted articles
- Behavioral and cognitive experiments
- Graphics, network visualization, and maps
- Convert and clean data
- Survey scales and measures
- Survey software
- Statistics and questionable research practices (QRPs)
- Statistical software
- Organize photos, citations, and references
- Education
- Organizations
- Open science, preregistration, code/data sharing
- Text, writing
- Humor
- See also
Find datasets
Search engines:
- Google Dataset Search - search engine for datasets.
- re3data.org
- Metadata Search
- Dimensions - search among 8+ million datasets (with about 2 million of those datasets that link to the original article)
Archives and lists:
- A dataset with political datasets - Cabinets, citizens, constitutions, political institutions, parties and politicians, democracy, economics, elections, international relations, media, policy, political elites (.xlsx, .csv, .Rdata, .sav).
- Consortium of European Social Science Data Archives (CESSDA)
- Inter-university Consortium for Political and Social Research (ICPSR)
- Open Stats Lab data from psychological studies that is intended to be used in education.
- Awesome Public Datasets - awesome list of (large-scale) public datasets on the Internet (on-going collection).
- Common Crawl - download a copy of the web, several billion web pages (250+ terabyte of data), updated regularly.
- Öppna Data - open data in Sweden (from Riksarkivet).
- PPEG Database - information on political parties, presidents, elections, and governments around the world.
Survey data:
- European Social Survey - survey conducted across Europe since 2001. Face-to-face interviews every two years on new cross-sectional samples.
- European Values Study - large-scale, cross-national survey since 1981 about basic human values (e.g., ideas, beliefs, preferences, attitudes). New surveys every nine years.
- The General Social Survey 1972– (USA)
- Latin American Public Opinion Project (North and South America)
- SOM Institute 1986– (Sweden)
- Norsk senter for forskningsdata (Norway)
Media data:
- IMDB Ratings for TV/Streaming Series - dataset of ratings given in IMDB to episodes of popular TV and Streaming series (includes R code).
- The Twitter Parliamentarian Database - database with Twitter politics across 26 countries.
- Hate speech data - datasets in many languages annotated for hate speech, online abuse, and offensive language. Useful to create machine learning models.
- Upworthy Research Archive - a time series of 32,487 behavior experiments from the U.S. media website Upworthy.
Content analysis, text analysis, text mining, annotation
- Text Mining for Social Scientists and Digital Humanists (GitHub) (R)
- Lexicoder - multi-platform software for automated content analysis of text (Java).
- SentimentAnalysis - sentiment analysis of text (R).
- Topic Models Learning and R Resources
- Count Words in a PDF Document (online tool).
- TAPoR - curated lists of widely used research tools in the digital humanities for studying texts.
- Datavyu - code and annotate video (Win/Mac app).
- Sentiment Classification for News Articles - easy-to-use, high-quality sentiment classification for news articles (Python).
Compare differences between texts, find duplicate files
- Diff Checker - compare two texts for differences (online tool).
- Online LaTeX diff tool - to compare text differences in LaTeX documents using latexdiff (online tool).
- Auslogics Duplicate File Finder - finds duplicate files regardless of their filenames (Windows app).
- comparefiles - scans a directory for identical files or similar text files (Python).
Television
- Stanford Cable TV News Analyzer - tool to count screen time of who and what is in cable TV news (from the Internet Archive TV News).
Social networking sites and specific sites
- Facepager - fetches publicly available data from Facebook, Twitter and other JSON-based API:s (Python).
- facebook-page-post-scraper - data scraper for Facebook Pages (Python).
- PolitEcho (GitHub) - shows you the political biases of your Facebook friends and news feed (Chrome extension).
- netvizz - collection of scripts that help with downloading data from the Facebook platform for research purposes (important about Facebook API changes, read Facebook’s app review and how independent research just got a lot harder).
- Facebook API.
- twarc - command line tool for archiving Twitter JSON (Python).
- tweetbotornot - detect Twitter bots via machine learning (R).
- Twint - Twitter scraping and open source intelligens (OSINT) tool that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations (Python).
- Tinfoleak (GitHub) - open-source tool for Twitter intelligence analysis (Python).
- Tweetbeaver - convert @name to ID, check if two accounts follow each other, download a user's favorites, search within a user's favorites, download a user's timeline etc (online tool).
- scrape-twitter - Command line interfaces to scrape profiles, timelines, connections, likes, search and conversations with the use of API (Node.js).
- Chorus - free Twitter harvesting and visual analytics suite for social science research (Windows).
- Twitter API.
- Twitter - helpful tools - Twitter lists helpful tools for data access, data analysis, data visualization, and hosting.
Wikipedia
- Pageviews Analysis tool for Wikimedia Foundation wikis (GitHub) - number of page views for any Wikipedia page (online tool, PHP).
- WikiMedia REST API - access to Wikipedia content, data, and statistics (online API).
- WikiShark - Wikipedia article traffic (page views) since 2008, updated every hour or so.
- Page view statistics for Wikimedia projects 2008-2016 - download all dumps from Wikipedia with page views for all projects from 2008 to 2016.
- Analytics Datasets: Pageviews 2016 onwards - download all dumps from Wikipedia with page views for all projects from 2016 onwards. Don't forget that you can use the API instead for easier access, but note that the API only has data from year 2
View on GitHub100/100
Security Score
Audited on Mar 26, 2026
No findings
