SkillAgentSearch skills...

Wakefield

Generate random data sets

Install / Use

/learn @trinker/Wakefield
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

wakefield

Project Status: Active - The project has reached a stable, usable
state and is being actively
developed. Build
Status Coverage
Status DOI

wakefield is designed to quickly generate random data sets. The user passes n (number of rows) and predefined vectors to the r_data_frame function to produce a dplyr::as_tibble object.

Table of Contents

Installation

To download the development version of wakefield:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wakefield")
pacman::p_load(dplyr, tidyr, ggplot2)

Contact

You are welcome to: * submit suggestions and bug-reports at: <a href="https://github.com/trinker/wakefield/issues" class="uri">https://github.com/trinker/wakefield/issues</a> * send a pull request on: <a href="https://github.com/trinker/wakefield/" class="uri">https://github.com/trinker/wakefield/</a> * compose a friendly e-mail to: <a href="mailto:tyler.rinker@gmail.com" class="email">tyler.rinker@gmail.com</a>

Demonstration

Getting Started

The r_data_frame function (random data frame) takes n (the number of rows) and any number of variables (columns). These columns are typically produced from a wakefield variable function. Each of these variable functions has a pre-set behavior that produces a named vector of n length, allowing the user to lazily pass unnamed functions (optionally, without call parenthesis). The column name is hidden as a varname attribute. For example here we see the race variable function:

race(n=10)

##  [1] Bi-Racial White     Bi-Racial Native    White     White     White     Asian     White     Hispanic 
## Levels: White Hispanic Black Asian Bi-Racial Native Other Hawaiian

attributes(race(n=10))

## $levels
## [1] "White"     "Hispanic"  "Black"     "Asian"     "Bi-Racial" "Native"    "Other"     "Hawaiian" 
## 
## $class
## [1] "variable" "factor"  
## 
## $varname
## [1] "Race"

When this variable is used inside of r_data_frame the varname is used as a column name. Additionally, the n argument is not set within variable functions but is set once in r_data_frame:

r_data_frame(
    n = 500,
    race
)

## Warning: `as_tibble()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

## # A tibble: 500 x 1
##    Race    
##    <fct>   
##  1 White   
##  2 White   
##  3 White   
##  4 White   
##  5 Black   
##  6 Black   
##  7 White   
##  8 White   
##  9 Hispanic
## 10 White   
## # ... with 490 more rows

The power of r_data_frame is apparent when we use many modular variable functions:

r_data_frame(
    n = 500,
    id,
    race,
    age,
    sex,
    hour,
    iq,
    height,
    died
)

## # A tibble: 500 x 8
##    ID    Race        Age Sex    Hour        IQ Height Died 
##    <chr> <fct>     <int> <fct>  <times>  <dbl>  <dbl> <lgl>
##  1 001   White        25 Female 00:00:00    93     69 TRUE 
##  2 002   White        80 Male   00:00:00    87     59 FALSE
##  3 003   White        60 Female 00:00:00   119     74 TRUE 
##  4 004   Bi-Racial    54 Female 00:00:00   109     72 FALSE
##  5 005   White        75 Female 00:00:00   106     70 FALSE
##  6 006   White        54 Male   00:00:00    89     67 TRUE 
##  7 007   Hispanic     67 Male   00:00:00    94     73 TRUE 
##  8 008   Bi-Racial    86 Female 00:00:00   100     65 TRUE 
##  9 009   Hispanic     56 Male   00:00:00    92     76 FALSE
## 10 010   Hispanic     52 Female 00:00:00   104     71 FALSE
## # ... with 490 more rows

There are 49 wakefield based variable functions to chose from, spanning R’s various data types (see ?variables for details).

<!-- html table generated in R 4.0.2 by xtable 1.8-4 package --> <!-- Sat Sep 12 11:46:45 2020 --> <table> <tr> <td> age </td> <td> dice </td> <td> hair </td> <td> military </td> <td> sex\_inclusive </td> </tr> <tr> <td> animal </td> <td> dna </td> <td> height </td> <td> month </td> <td> smokes </td> </tr> <tr> <td> answer </td> <td> dob </td> <td> income </td> <td> name </td> <td> speed </td> </tr> <tr> <td> area </td> <td> dummy </td> <td> internet\_browser </td> <td> normal </td> <td> state </td> </tr> <tr> <td> car </td> <td> education </td> <td> iq </td> <td> political </td> <td> string </td> </tr> <tr> <td> children </td> <td> employment </td> <td> language </td> <td> race </td> <td> upper </td> </tr> <tr> <td> coin </td> <td> eye </td> <td> level </td> <td> religion </td> <td> valid </td> </tr> <tr> <td> color </td> <td> grade </td> <td> likert </td> <td> sat </td> <td> year </td> </tr> <tr> <td> date\_stamp </td> <td> grade\_level </td> <td> lorem\_ipsum </td> <td> sentence </td> <td> zip\_code </td> </tr> <tr> <td> death </td> <td> group </td> <td> marital </td> <td> sex </td> <td> </td> </tr> </table> <p class="caption"> <b><em>Available Variable Functions</em></b> </p>

However, the user may also pass their own vector producing functions or vectors to r_data_frame. Those with an n argument can be set by r_data_frame:

r_data_frame(
    n = 500,
    id,
    Scoring = rnorm,
    Smoker = valid,
    race,
    age,
    sex,
    hour,
    iq,
    height,
    died
)

## # A tibble: 500 x 10
##    ID    Scoring Smoker Race       Age Sex    Hour        IQ Height Died 
##    <chr>   <dbl> <lgl>  <fct>    <int> <fct>  <times>  <dbl>  <dbl> <lgl>
##  1 001    0.833  FALSE  White       20 Female 00:00:00    92     69 TRUE 
##  2 002   -0.529  TRUE   Hispanic    83 Female 00:00:00    99     74 TRUE 
##  3 003   -0.704  TRUE   Hispanic    24 Male   00:00:00   115     62 TRUE 
##  4 004   -0.839  TRUE   Asian       19 Female 00:00:00   113     69 TRUE 
##  5 005    0.606  TRUE   White       70 Male   00:00:00    95     68 FALSE
##  6 006    1.46   FALSE  Other       45 Female 00:00:00   110     78 FALSE
##  7 007   -0.681  TRUE   Black       47 Female 00:00:00    98     64 TRUE 
##  8 008    0.541  FALSE  White       88 Male   00:30:00    75     70 TRUE 
##  9 009   -0.294  FALSE  Hispanic    89 Male   00:30:00   104     63 FALSE
## 10 010    0.0749 FALSE  Hispanic    74 Female 00:30:00   105     69 TRUE 
## # ... with 490 more rows

r_data_frame(
    n = 500,
    id,
    age, age, age,
    grade, grade, grade
)

## # A tibble: 500 x 7
##    ID    Age_1 Age_2 Age_3 Grade_1 Grade_2 Grade_3
##    <chr> <int> <int> <int>   <dbl>   <dbl>   <dbl>
##  1 001      67    24    89    82.4    86.8    90.6
##  2 002      55    76    27    87.3    85.4    89.8
##  3 003      60    61    22    82.2    87      90.1
##  4 004      50    19    56    96.4    86.6    95.6
##  5 005      83    77    71    88.8    87.5    84.4
##  6 006      55    71    76    87.3    96.5    86.5
##  7 007      88    36    75    92.1    91.6    93.4
##  8 008      71    48    81    87.9    91.4    80.9
##  9 009      76    78    21    86.9    93.6    84.3
## 10 010      49    68    47    85.5    93      86.6
## # ... with 490 more rows

While passing variable functions to r_data_frame without call parenthesis is handy, the user may wish to set arguments. This can be done through call parenthesis as we do with data.frame or dplyr::data_frame:

r_data_frame(
    n = 500,
    id,
    Scoring = rnorm,
    Smoker = valid,
    `Reading(mins)` = rpois(lambda=20),  
    race,
    age(x = 8:14),
    sex,
    hour,
    iq,
    height(mean=50, sd = 10),
    died
)

## # A tibble: 500 x 11
##    ID    Scoring Smoker `Reading(mins)` Race       Age Sex    Hour        IQ Height Died 
##    <chr>   <dbl> <lgl>            <int> <fct>    <int> <fct>  <times>  <dbl>  <dbl> <lgl>
##  1 001    2.48   FALSE               10 White        9 Male   00:00:00    93     44 TRUE 
##  2 002    0.566  FALSE               14 Hispanic    10 Male   00:00:00   116     58 FALSE
##  3 003   -0.563  FALSE               19 Hispanic     8 Female 00:00:00    97     64 TR

Related Skills

View on GitHub
GitHub Stars256
CategoryDevelopment
Updated27d ago
Forks29

Languages

R

Security Score

85/100

Audited on Mar 5, 2026

No findings