Tm4ss.github.io
A text mining course for social scientists and digital humanists
Install / Use
/learn @tm4ss/Tm4ss.github.ioREADME
tm4ss - Text Mining for Social Scientists and Digital Humanists
This course consists of 8 tutorials written in R-markdown and further described in this paper.
You can use knitr to create the tutorial sheets as HTML notebooks from the R-markdown source code.
In the /docs folder, you have access to the rendered tutorials.
Tutorials
- Web crawling and scraping
- Text data import in R
- Frequency analysis
- Key term extraction
- Co-occurrence analysis
- Topic models (LDA)
- Text classification
- Part-of-Speech tagging / Named Entity Recognition
Click here for the rendered tutorials.
Render from source
Clone the repository
git clone https://github.com/tm4ss/tm4ss.github.io.git
Open the Tutorials.Rproj R-project file and run
rmarkdown::render_site(output_format = "html_document")
License & Citation
This course was created by Gregor Wiedemann and Andreas Niekler. It was freely released under GPLv3 in September 2017. If you use (parts of) it for your own teaching or analysis, please cite
Wiedemann, Gregor; Niekler, Andreas (2017): [Hands-on: a five day text mining course for humanists and social scientists in R](http://ceur-ws.org/Vol-1918/wiedemann.pdf). Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin.
Bibtex
@inproceedings{WN17,
author = {Gregor Wiedemann and Andreas Niekler},
title = {Hands-On: {A} Five Day Text Mining Course for Humanists and Social Scientists in {R}},
booktitle = {Proceedings of the Workshop on Teaching {NLP} for Digital Humanities
({Teach4DH@GSCL 2017}), Berlin, Germany, September 12, 2017.},
pages = {57--65},
year = {2017},
crossref = {DBLP:conf/gldv/2017teach4dh},
url = {http://ceur-ws.org/Vol-1918/wiedemann.pdf},
}
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
