Wakefield
Generate random data sets
Install / Use
/learn @trinker/WakefieldREADME
wakefield
wakefield is designed to quickly generate random data sets. The user
passes n (number of rows) and predefined vectors to the r_data_frame
function to produce a dplyr::as_tibble object.

Table of Contents
Installation
To download the development version of wakefield:
Download the zip
ball or tar
ball, decompress
and run R CMD INSTALL on it, or use the pacman package to install
the development version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wakefield")
pacman::p_load(dplyr, tidyr, ggplot2)
Contact
You are welcome to: * submit suggestions and bug-reports at: <a href="https://github.com/trinker/wakefield/issues" class="uri">https://github.com/trinker/wakefield/issues</a> * send a pull request on: <a href="https://github.com/trinker/wakefield/" class="uri">https://github.com/trinker/wakefield/</a> * compose a friendly e-mail to: <a href="mailto:tyler.rinker@gmail.com" class="email">tyler.rinker@gmail.com</a>
Demonstration
Getting Started
The r_data_frame function (random data frame) takes n (the number of
rows) and any number of variables (columns). These columns are typically
produced from a wakefield variable function. Each of these variable
functions has a pre-set behavior that produces a named vector of n
length, allowing the user to lazily pass unnamed functions (optionally,
without call parenthesis). The column name is hidden as a varname
attribute. For example here we see the race variable function:
race(n=10)
## [1] Bi-Racial White Bi-Racial Native White White White Asian White Hispanic
## Levels: White Hispanic Black Asian Bi-Racial Native Other Hawaiian
attributes(race(n=10))
## $levels
## [1] "White" "Hispanic" "Black" "Asian" "Bi-Racial" "Native" "Other" "Hawaiian"
##
## $class
## [1] "variable" "factor"
##
## $varname
## [1] "Race"
When this variable is used inside of r_data_frame the varname is
used as a column name. Additionally, the n argument is not set within
variable functions but is set once in r_data_frame:
r_data_frame(
n = 500,
race
)
## Warning: `as_tibble()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## # A tibble: 500 x 1
## Race
## <fct>
## 1 White
## 2 White
## 3 White
## 4 White
## 5 Black
## 6 Black
## 7 White
## 8 White
## 9 Hispanic
## 10 White
## # ... with 490 more rows
The power of r_data_frame is apparent when we use many modular
variable functions:
r_data_frame(
n = 500,
id,
race,
age,
sex,
hour,
iq,
height,
died
)
## # A tibble: 500 x 8
## ID Race Age Sex Hour IQ Height Died
## <chr> <fct> <int> <fct> <times> <dbl> <dbl> <lgl>
## 1 001 White 25 Female 00:00:00 93 69 TRUE
## 2 002 White 80 Male 00:00:00 87 59 FALSE
## 3 003 White 60 Female 00:00:00 119 74 TRUE
## 4 004 Bi-Racial 54 Female 00:00:00 109 72 FALSE
## 5 005 White 75 Female 00:00:00 106 70 FALSE
## 6 006 White 54 Male 00:00:00 89 67 TRUE
## 7 007 Hispanic 67 Male 00:00:00 94 73 TRUE
## 8 008 Bi-Racial 86 Female 00:00:00 100 65 TRUE
## 9 009 Hispanic 56 Male 00:00:00 92 76 FALSE
## 10 010 Hispanic 52 Female 00:00:00 104 71 FALSE
## # ... with 490 more rows
There are 49 wakefield based variable functions to chose from,
spanning R’s various data types (see ?variables for details).
However, the user may also pass their own vector producing functions or
vectors to r_data_frame. Those with an n argument can be set by
r_data_frame:
r_data_frame(
n = 500,
id,
Scoring = rnorm,
Smoker = valid,
race,
age,
sex,
hour,
iq,
height,
died
)
## # A tibble: 500 x 10
## ID Scoring Smoker Race Age Sex Hour IQ Height Died
## <chr> <dbl> <lgl> <fct> <int> <fct> <times> <dbl> <dbl> <lgl>
## 1 001 0.833 FALSE White 20 Female 00:00:00 92 69 TRUE
## 2 002 -0.529 TRUE Hispanic 83 Female 00:00:00 99 74 TRUE
## 3 003 -0.704 TRUE Hispanic 24 Male 00:00:00 115 62 TRUE
## 4 004 -0.839 TRUE Asian 19 Female 00:00:00 113 69 TRUE
## 5 005 0.606 TRUE White 70 Male 00:00:00 95 68 FALSE
## 6 006 1.46 FALSE Other 45 Female 00:00:00 110 78 FALSE
## 7 007 -0.681 TRUE Black 47 Female 00:00:00 98 64 TRUE
## 8 008 0.541 FALSE White 88 Male 00:30:00 75 70 TRUE
## 9 009 -0.294 FALSE Hispanic 89 Male 00:30:00 104 63 FALSE
## 10 010 0.0749 FALSE Hispanic 74 Female 00:30:00 105 69 TRUE
## # ... with 490 more rows
r_data_frame(
n = 500,
id,
age, age, age,
grade, grade, grade
)
## # A tibble: 500 x 7
## ID Age_1 Age_2 Age_3 Grade_1 Grade_2 Grade_3
## <chr> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 001 67 24 89 82.4 86.8 90.6
## 2 002 55 76 27 87.3 85.4 89.8
## 3 003 60 61 22 82.2 87 90.1
## 4 004 50 19 56 96.4 86.6 95.6
## 5 005 83 77 71 88.8 87.5 84.4
## 6 006 55 71 76 87.3 96.5 86.5
## 7 007 88 36 75 92.1 91.6 93.4
## 8 008 71 48 81 87.9 91.4 80.9
## 9 009 76 78 21 86.9 93.6 84.3
## 10 010 49 68 47 85.5 93 86.6
## # ... with 490 more rows
While passing variable functions to r_data_frame without call
parenthesis is handy, the user may wish to set arguments. This can be
done through call parenthesis as we do with data.frame or
dplyr::data_frame:
r_data_frame(
n = 500,
id,
Scoring = rnorm,
Smoker = valid,
`Reading(mins)` = rpois(lambda=20),
race,
age(x = 8:14),
sex,
hour,
iq,
height(mean=50, sd = 10),
died
)
## # A tibble: 500 x 11
## ID Scoring Smoker `Reading(mins)` Race Age Sex Hour IQ Height Died
## <chr> <dbl> <lgl> <int> <fct> <int> <fct> <times> <dbl> <dbl> <lgl>
## 1 001 2.48 FALSE 10 White 9 Male 00:00:00 93 44 TRUE
## 2 002 0.566 FALSE 14 Hispanic 10 Male 00:00:00 116 58 FALSE
## 3 003 -0.563 FALSE 19 Hispanic 8 Female 00:00:00 97 64 TR
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
