Wakefield

Generate random data sets

Generate Convert Improve

Install / Use

/learn @trinker/Wakefield

About this skill

Quality Score

0/100

README

wakefield

wakefield is designed to quickly generate random data sets. The user passes n (number of rows) and predefined vectors to the r_data_frame function to produce a dplyr::as_tibble object.

Installation
Contact
Demonstration

Installation

To download the development version of wakefield:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wakefield")
pacman::p_load(dplyr, tidyr, ggplot2)

Contact

You are welcome to: * submit suggestions and bug-reports at: <a href="https://github.com/trinker/wakefield/issues" class="uri">https://github.com/trinker/wakefield/issues</a> * send a pull request on: <a href="https://github.com/trinker/wakefield/" class="uri">https://github.com/trinker/wakefield/</a> * compose a friendly e-mail to: <a href="mailto:tyler.rinker@gmail.com" class="email">tyler.rinker@gmail.com</a>

Demonstration

Getting Started

The r_data_frame function (random data frame) takes n (the number of rows) and any number of variables (columns). These columns are typically produced from a wakefield variable function. Each of these variable functions has a pre-set behavior that produces a named vector of n length, allowing the user to lazily pass unnamed functions (optionally, without call parenthesis). The column name is hidden as a varname attribute. For example here we see the race variable function:

race(n=10)

##  [1] Bi-Racial White     Bi-Racial Native    White     White     White     Asian     White     Hispanic 
## Levels: White Hispanic Black Asian Bi-Racial Native Other Hawaiian

attributes(race(n=10))

## $levels
## [1] "White"     "Hispanic"  "Black"     "Asian"     "Bi-Racial" "Native"    "Other"     "Hawaiian" 
## 
## $class
## [1] "variable" "factor"  
## 
## $varname
## [1] "Race"

When this variable is used inside of r_data_frame the varname is used as a column name. Additionally, the n argument is not set within variable functions but is set once in r_data_frame:

r_data_frame(
    n = 500,
    race
)

## Warning: `as_tibble()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

## # A tibble: 500 x 1
##    Race    
##    <fct>   
##  1 White   
##  2 White   
##  3 White   
##  4 White   
##  5 Black   
##  6 Black   
##  7 White   
##  8 White   
##  9 Hispanic
## 10 White   
## # ... with 490 more rows

The power of r_data_frame is apparent when we use many modular variable functions:

r_data_frame(
    n = 500,
    id,
    race,
    age,
    sex,
    hour,
    iq,
    height,
    died
)

## # A tibble: 500 x 8
##    ID    Race        Age Sex    Hour        IQ Height Died 
##    <chr> <fct>     <int> <fct>  <times>  <dbl>  <dbl> <lgl>
##  1 001   White        25 Female 00:00:00    93     69 TRUE 
##  2 002   White        80 Male   00:00:00    87     59 FALSE
##  3 003   White        60 Female 00:00:00   119     74 TRUE 
##  4 004   Bi-Racial    54 Female 00:00:00   109     72 FALSE
##  5 005   White        75 Female 00:00:00   106     70 FALSE
##  6 006   White        54 Male   00:00:00    89     67 TRUE 
##  7 007   Hispanic     67 Male   00:00:00    94     73 TRUE 
##  8 008   Bi-Racial    86 Female 00:00:00   100     65 TRUE 
##  9 009   Hispanic     56 Male   00:00:00    92     76 FALSE
## 10 010   Hispanic     52 Female 00:00:00   104     71 FALSE
## # ... with 490 more rows

There are 49 wakefield based variable functions to chose from, spanning R’s various data types (see ?variables for details).

<table> <tr> <td> age </td> <td> dice </td> <td> hair </td> <td> military </td> <td> sex\_inclusive </td> </tr> <tr> <td> animal </td> <td> dna </td> <td> height </td> <td> month </td> <td> smokes </td> </tr> <tr> <td> answer </td> <td> dob </td> <td> income </td> <td> name </td> <td> speed </td> </tr> <tr> <td> area </td> <td> dummy </td> <td> internet\_browser </td> <td> normal </td> <td> state </td> </tr> <tr> <td> car </td> <td> education </td> <td> iq </td> <td> political </td> <td> string </td> </tr> <tr> <td> children </td> <td> employment </td> <td> language </td> <td> race </td> <td> upper </td> </tr> <tr> <td> coin </td> <td> eye </td> <td> level </td> <td> religion </td> <td> valid </td> </tr> <tr> <td> color </td> <td> grade </td> <td> likert </td> <td> sat </td> <td> year </td> </tr> <tr> <td> date\_stamp </td> <td> grade\_level </td> <td> lorem\_ipsum </td> <td> sentence </td> <td> zip\_code </td> </tr> <tr> <td> death </td> <td> group </td> <td> marital </td> <td> sex </td> <td> </td> </tr> </table> <p class="caption"> <b><em>Available Variable Functions</em></b> </p>

However, the user may also pass their own vector producing functions or vectors to r_data_frame. Those with an n argument can be set by r_data_frame:

r_data_frame(
    n = 500,
    id,
    Scoring = rnorm,
    Smoker = valid,
    race,
    age,
    sex,
    hour,
    iq,
    height,
    died
)

## # A tibble: 500 x 10
##    ID    Scoring Smoker Race       Age Sex    Hour        IQ Height Died 
##    <chr>   <dbl> <lgl>  <fct>    <int> <fct>  <times>  <dbl>  <dbl> <lgl>
##  1 001    0.833  FALSE  White       20 Female 00:00:00    92     69 TRUE 
##  2 002   -0.529  TRUE   Hispanic    83 Female 00:00:00    99     74 TRUE 
##  3 003   -0.704  TRUE   Hispanic    24 Male   00:00:00   115     62 TRUE 
##  4 004   -0.839  TRUE   Asian       19 Female 00:00:00   113     69 TRUE 
##  5 005    0.606  TRUE   White       70 Male   00:00:00    95     68 FALSE
##  6 006    1.46   FALSE  Other       45 Female 00:00:00   110     78 FALSE
##  7 007   -0.681  TRUE   Black       47 Female 00:00:00    98     64 TRUE 
##  8 008    0.541  FALSE  White       88 Male   00:30:00    75     70 TRUE 
##  9 009   -0.294  FALSE  Hispanic    89 Male   00:30:00   104     63 FALSE
## 10 010    0.0749 FALSE  Hispanic    74 Female 00:30:00   105     69 TRUE 
## # ... with 490 more rows

r_data_frame(
    n = 500,
    id,
    age, age, age,
    grade, grade, grade
)

## # A tibble: 500 x 7
##    ID    Age_1 Age_2 Age_3 Grade_1 Grade_2 Grade_3
##    <chr> <int> <int> <int>   <dbl>   <dbl>   <dbl>
##  1 001      67    24    89    82.4    86.8    90.6
##  2 002      55    76    27    87.3    85.4    89.8
##  3 003      60    61    22    82.2    87      90.1
##  4 004      50    19    56    96.4    86.6    95.6
##  5 005      83    77    71    88.8    87.5    84.4
##  6 006      55    71    76    87.3    96.5    86.5
##  7 007      88    36    75    92.1    91.6    93.4
##  8 008      71    48    81    87.9    91.4    80.9
##  9 009      76    78    21    86.9    93.6    84.3
## 10 010      49    68    47    85.5    93      86.6
## # ... with 490 more rows

While passing variable functions to r_data_frame without call parenthesis is handy, the user may wish to set arguments. This can be done through call parenthesis as we do with data.frame or dplyr::data_frame:

r_data_frame(
    n = 500,
    id,
    Scoring = rnorm,
    Smoker = valid,
    `Reading(mins)` = rpois(lambda=20),  
    race,
    age(x = 8:14),
    sex,
    hour,
    iq,
    height(mean=50, sd = 10),
    died
)

## # A tibble: 500 x 11
##    ID    Scoring Smoker `Reading(mins)` Race       Age Sex    Hour        IQ Height Died 
##    <chr>   <dbl> <lgl>            <int> <fct>    <int> <fct>  <times>  <dbl>  <dbl> <lgl>
##  1 001    2.48   FALSE               10 White        9 Male   00:00:00    93     44 TRUE 
##  2 002    0.566  FALSE               14 Hispanic    10 Male   00:00:00   116     58 FALSE
##  3 003   -0.563  FALSE               19 Hispanic     8 Female 00:00:00    97     64 TR

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。