DataCombine
R tools for combining data sets.
Install / Use
/learn @christophergandrud/DataCombineREADME
DataCombine
Christopher Gandrud
Please report any bugs or suggestions at: https://github.com/christophergandrud/DataCombine/issues.
Motivation and Functions
DataCombine is a set of miscellaneous tools intended to make combining data sets--especially time-series cross-section data--easier. The package is continually being developed as I turn lines of code that I frequently use into single functions. It currently includes the following functions:
-
CasesTablefunction added to report cases after list-wise deletion of missing values for time-series cross-sectional data. -
change: calculates the absolute, percentage, and proportion change from a specified lag, including within groups. -
CountSpell: function that returns a variable counting the spell number for an observation. Works with grouped data. -
dMerge: merges 2 data frames and report/drop/keeps only duplicates. -
DropNA: drops rows from a data frame when they have missing (NA) values on a given variable(s). -
FillDown: fills in missing (NA) values with the previous non-missing value -
FillIn: fills in missing values of a variable from one data frame with the values from another variable. -
FindDups: find duplicated values in a data frame and subset it to either include or not include them. -
FindReplace: replaces multiple patterns found in a character string column of a data frame. -
grepl.sub: subsets a data frame if a specified pattern is found in a character string. -
InsertRow: allows user to insert a row into a data frame. Largely implements: Ari B. Friedman's function. -
MoveFront: moves variables to the front of a data frame. This can be useful if you have a data frame with many variables and want to move a variable or variables to the front. -
NaVar: create new variable(s) indicating if there are missing values in other variable(s). -
shift: creates lag and lead variables, including for time-series cross-sectional data. The shifted variable is returned to a new vector. This function is largely based on TszKin Julian's shift function (Link removed). -
slide: creates lag and lead variables, including for time-series cross-sectional data. The slid variable are added to the original data frame. This expands the capabilities ofshift. -
slideMA: creates a moving average for a period before or after each time point for a given variable. -
SpreadDummy: spread a dummy variable (1's and 0') over a specified time period and for specified groups. -
StartEnd: finds the starting and ending time points of a spell, including for time-series cross-sectional data. -
rmExcept: removes all objects from a workspace except those specified by the user. -
TimeExpand: expands a data set so that it includes an observation for each time point in a sequence. Works with grouped data. -
TimeFill: creates a continuousUnit-Time-Dummydata frame from a data frame withUnit-Start-Endtimes. -
VarDrop: drops one or more variables from a data frame.
Updates
I will continue to add to the package as I build data sets and run across other pesky tasks I do repeatedly that would be simpler if they were completed by a single function.
Installation
DataCombine is on CRAN.
You can also install the most recent stable version with install_github from
the devtools:
devtools::install_github('christophergandrud/DataCombine')
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。

