DatenguideR
R wrapper for the datengui.de GraphQL API to easily access German regional statistics
Install / Use
/learn @CorrelAid/DatenguideRREADME
datenguideR <img src='man/figures/logo.png' align="right" height="139" />
<!-- badges: start --> <!-- badges: end -->Access and download German regional statistics from Datenguide
http://datengui.de. datenguideR provides a wrapper for their GraphQL
API and also includes metadata for all available statistics and regions.
Overview
Usage
First, install datenguideR from GitHub:
devtools::install_github("CorrelAid/datenguideR")
Load package:
library(datenguideR)
Examples
Get IDs of all available NUTS-1 regions:
datenguideR::dg_regions %>%
dplyr::filter(level == "nuts1") %>%
knitr::kable()
| id | name | level | parent | |:----|:-----------------------|:------|:-------| | 01 | Schleswig-Holstein | nuts1 | DG | | 02 | Hamburg | nuts1 | DG | | 03 | Niedersachsen | nuts1 | DG | | 04 | Bremen | nuts1 | DG | | 05 | Nordrhein-Westfalen | nuts1 | DG | | 06 | Hessen | nuts1 | DG | | 07 | Rheinland-Pfalz | nuts1 | DG | | 08 | Baden-Württemberg | nuts1 | DG | | 09 | Bayern | nuts1 | DG | | 10 | Saarland | nuts1 | DG | | 11 | Berlin | nuts1 | DG | | 12 | Brandenburg | nuts1 | DG | | 13 | Mecklenburg-Vorpommern | nuts1 | DG | | 14 | Sachsen | nuts1 | DG | | 15 | Sachsen-Anhalt | nuts1 | DG | | 16 | Thüringen | nuts1 | DG |
Get all available meta data on statistics, substatistics, and parameters:
datenguideR::dg_descriptions
#> # A tibble: 3,419 x 11
#> stat_name stat_description stat_description_~ substat_name substat_descrip~
#> <chr> <chr> <chr> <chr> <chr>
#> 1 AENW01 Entsorgte/behande~ "**Entsorgte/beha~ NA NA
#> 2 AENW02 Abgelagerte Abfal~ "**Abgelagerte Ab~ NA NA
#> 3 AENW03 Entsorg.u.Behandl~ "**Entsorg.u.Beha~ NA NA
#> 4 AENW04 Entsorgte/behande~ "**Entsorgte/beha~ NA NA
#> 5 AENW05 Abgelagerte Abfal~ "**Abgelagerte Ab~ NA NA
#> 6 AENW06 Entsorg.u.Behandl~ "**Entsorg.u.Beha~ NA NA
#> 7 AEW001 Entsorgungs- und ~ "**Entsorgungs- u~ NA NA
#> 8 AEW001 Entsorgungs- und ~ "**Entsorgungs- u~ EBANL1 Entsorgungs- un~
#> 9 AEW001 Entsorgungs- und ~ "**Entsorgungs- u~ EBANL1 Entsorgungs- un~
#> 10 AEW001 Entsorgungs- und ~ "**Entsorgungs- u~ EBANL1 Entsorgungs- un~
#> # ... with 3,409 more rows, and 6 more variables: param_name <chr>,
#> # param_description <chr>, stat_description_en <chr>,
#> # stat_description_full_en <chr>, substat_description_en <chr>,
#> # param_description_en <chr>
dg_search
You can also use dg_search to look for a variable of interest. The
function will match your string with any strings in the
dg_descriptions data frame, returning only rows with those matches.
Looking for variables where the string “vote” appears somewhere in the documentation:
dg_search("vote")
#> # A tibble: 90 x 11
#> stat_name stat_description stat_description_~ substat_name substat_descrip~
#> <chr> <chr> <chr> <chr> <chr>
#> 1 AI0501 Zweitstimmenantei~ "**Zweitstimmenan~ NA NA
#> 2 AI0502 Zweitstimmenantei~ "**Zweitstimmenan~ NA NA
#> 3 AI0503 Zweitstimmenantei~ "**Zweitstimmenan~ NA NA
#> 4 AI0504 Zweitstimmenantei~ "**Zweitstimmenan~ NA NA
#> 5 AI0505 Zweitstimmenantei~ "**Zweitstimmenan~ NA NA
#> 6 AI0506 Wahlbeteiligung, ~ "**Wahlbeteiligun~ NA NA
#> 7 AI0601 Stimmenanteil CDU~ "**Stimmenanteil ~ NA NA
#> 8 AI0602 Stimmenanteil SPD~ "**Stimmenanteil ~ NA NA
#> 9 AI0603 Stimmenanteil FDP~ "**Stimmenanteil ~ NA NA
#> 10 AI0604 Stimmenanteil GRÜ~ "**Stimmenanteil ~ NA NA
#> # ... with 80 more rows, and 6 more variables: param_name <chr>,
#> # param_description <chr>, stat_description_en <chr>,
#> # stat_description_full_en <chr>, substat_description_en <chr>,
#> # param_description_en <chr>
Note: Descriptions of variables are also available in English now!
Translated via the
googleLanguageR
package.
dg_search("vote") %>%
dplyr::select(stat_name, dplyr::contains("_en"))
#> # A tibble: 90 x 5
#> stat_name stat_descriptio~ stat_descriptio~ substat_descrip~ param_descripti~
#> <chr> <chr> <chr> <chr> <chr>
#> 1 AI0501 Second Vote Sha~ "** CDU / CSU s~ NA NA
#> 2 AI0502 SPD Second Vote~ "** SPD second ~ NA NA
#> 3 AI0503 FDP Second Vote~ "** Second vote~ NA NA
#> 4 AI0504 Second Vote Sha~ "** GREEN secon~ NA NA
#> 5 AI0505 Second Vote Sha~ "** Second vote~ NA NA
#> 6 AI0506 Voter Turnout, ~ "** Voter turno~ NA NA
#> 7 AI0601 CDU / CSU, Euro~ "** CDU / CSU v~ NA NA
#> 8 AI0602 SPD Vote Share,~ "** SPD vote sh~ NA NA
#> 9 AI0603 FDP Share of Vo~ "** FDP vote sh~ NA NA
#> 10 AI0604 Share of Votes ~ "** GREEN share~ NA NA
#> # ... with 80 more rows
dg_call
The main function of the package is dg_call. It gives access to all
API endpoints.
Simply pick a statistic and put it into dg_call() (infos can be
retrieved from dg_descriptions).
For example:
stat_name: AI0506 (Wahlbeteiligung, Bundestagswahl)region_id: 11 (stands for Berlin)
dg_call(region_id = "11",
year = 2017,
stat_name = "AI0506")
#> New names:
#> * name -> name...2
#> * name -> name...6
#> # A tibble: 1 x 9
#> id name...2 year value GENESIS_source name...6 stat_name stat_description
#> <chr> <chr> <int> <dbl> <chr> <chr> <chr> <chr>
#> 1 11 Berlin 2017 75.6 Regionalatlas ~ 99910 AI0506 Wahlbeteiligung~
#> # ... with 1 more variable: stat_description_en <chr>
A slightly more complex call with substatistics:
stat_name: BETR08 (Landwirtschaftliche Betriebe mit Tierhaltung)substat_name: TIERA8 (Landwirtschaftliche Betriebe mit Viehhaltung)parameter:- TIERART2 (Rinder)
- TIERART3 (Schweine)
dg_call(region_id = "11",
year = c(2001, 2003, 2007),
stat_name = "BETR08",
substat_name = "TIERA8",
parameter = c("TIERART2", "TIERART3"))
#> New names:
#> * name -> name...2
#> * name -> name...7
#> # A tibble: 6 x 15
#> id name...2 year TIERA8 value GENESIS_source name...7 stat_name
#> <chr> <chr> <int> <chr> <int> <chr> <chr> <chr>
#> 1 11 Berlin 2001 TIERAR~ 8 Allgemeine Agrarstruktu~ 41120 BETR08
#> 2 11 Berlin 2001 TIERAR~ 7 Allgemeine Agrarstruktu~ 41120 BETR08
#> 3 11 Berlin 2003 TIERAR~ 9 Allgemeine Agrarstruktu~ 41120 BETR08
#> 4 11 Berlin 2003 TIERAR~ 7 Allgemeine Agrarstruktu~ 41120 BETR08
#> 5 11 Berlin 2007 TIERAR~ 11 Allgemeine Agrarstruktu~ 41120 BETR08
#> 6 11 Berlin 2007 TIERAR~ 5 Allgemeine Agrarstruktu~ 41120 BETR08
#> # ... with 7 more variables: stat_description <chr>, substat_name <chr>,
#> # substat_description <chr>, param_description <chr>,
#> # stat_description_en <chr>, substat_description_en <chr>,
#> # param_description_en <chr>
If you give no parameters for a substat, it will default to return results for all of them.
dg_call(region_id = "11",
year = c(2001, 2003, 2007),
stat_name = "BETR08",
substat_name = "TIERA8")
#> New names:
#> * name -> name...2
#> * name -> name...7
#> # A tibble: 23 x 15
#> id name...2 year TIERA8 value GENESIS_source name...7 stat_name
#> <chr> <chr> <int> <chr> <int> <chr> <chr> <chr>
#> 1 11 Berlin 2001 TIER

