Quanteda.tidy
Tidyverse extensions for quanteda
Install / Use
/learn @quanteda/Quanteda.tidyREADME
quanteda.tidy
<!-- badges: start --> <!-- badges: end -->About
quanteda.tidy extends the quanteda package with functionality
from the “tidyverse”, especially dplyr.
Note that this is not the same as tidytext, which stretches tokens
into data.frames. Instead, tidy functions operate only on document
variables, but extends these functions (from dplyr) to work on
quanteda objects as if they were tibbles or data.frames.
Installation
You can install the stable version of quanteda.tidy from CRAN:
install.packages("quanteda.tidy")
Or install the development version from GitHub:
pak::pkg_install("quanteda/quanteda.tidy")
Overview of Functions
The functions in quanteda.tidy are organized into four categories, following the dplyr documentation:
| Category | Function | Description |
|:---|:---|:---|
| Rows | filter() | Subset documents based on docvar conditions |
| Rows | slice(), slice_head(), slice_tail() | Subset documents by position |
| Rows | slice_sample() | Randomly sample documents |
| Rows | slice_min(), slice_max() | Select documents with min/max docvar values |
| Rows | arrange(), distinct() | Reorder documents; keep unique documents |
| Columns | select() | Keep or drop docvars by name |
| Columns | rename(), rename_with() | Rename docvars |
| Columns | relocate() | Change docvar column order |
| Columns | mutate(), transmute() | Create or modify docvars |
| Columns | pull() | Extract a single docvar as a vector |
| Columns | glimpse() | Get a quick overview of the corpus |
| Groups of rows | add_count() | Add count by group as a docvar |
| Groups of rows | add_tally() | Add total count as a docvar |
| Pairs of data frames | left_join() | Join corpus with external data frame |
Example
Adding a document variable for full president name:
library("quanteda.tidy", warn.conflicts = FALSE)
## Loading required package: quanteda
## Package version: 4.3.1
## Unicode version: 14.0
## ICU version: 71.1
## Parallel computing: disabled
## See https://quanteda.io for tutorials and examples.
data_corpus_inaugural %>%
mutate(fullname = paste(FirstName, President, sep = ", ")) %>%
summary(n = 5)
## Corpus consisting of 60 documents, showing 5 documents:
##
## Text Types Tokens Sentences Year President FirstName
## 1789-Washington 625 1537 23 1789 Washington George
## 1793-Washington 96 147 4 1793 Washington George
## 1797-Adams 826 2577 37 1797 Adams John
## 1801-Jefferson 717 1923 41 1801 Jefferson Thomas
## 1805-Jefferson 804 2380 45 1805 Jefferson Thomas
## Party fullname
## none George, Washington
## none George, Washington
## Federalist John, Adams
## Democratic-Republican Thomas, Jefferson
## Democratic-Republican Thomas, Jefferson
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
