Impute.jl
Imputation methods for missing data in julia
Install / Use
/learn @invenia/Impute.jlREADME
Impute
Impute.jl provides various methods for handling missing data in Vectors, Matrices and Tables.
Installation
julia> using Pkg; Pkg.add("Impute")
Quickstart
Let's start by loading our dependencies:
julia> using DataFrames, Impute
We'll also want some test data containing missings to work with:
julia> df = Impute.dataset("test/table/neuro") |> DataFrame
469×6 DataFrame
Row │ V1 V2 V3 V4 V5 V6
│ Float64? Float64? Float64 Float64? Float64? Float64?
─────┼───────────────────────────────────────────────────────────────
1 │ missing -203.7 -84.1 18.5 missing missing
2 │ missing -203.0 -97.8 25.8 134.7 missing
3 │ missing -249.0 -92.1 27.8 177.1 missing
4 │ missing -231.5 -97.5 27.0 150.3 missing
5 │ missing missing -130.1 25.8 160.0 missing
6 │ missing -223.1 -70.7 62.1 197.5 missing
7 │ missing -164.8 -12.2 76.8 202.8 missing
8 │ missing -221.6 -81.9 27.5 144.5 missing
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
463 │ -242.6 -142.0 -21.8 69.8 148.7 missing
464 │ -235.9 -128.8 -33.1 68.8 177.1 missing
465 │ missing -140.8 -38.7 58.1 186.3 missing
466 │ missing -149.5 -40.3 62.8 139.7 242.5
467 │ -247.6 -157.8 -53.3 28.3 122.9 227.6
468 │ missing -154.9 -50.8 28.1 119.9 201.1
469 │ missing -180.7 -70.9 33.7 114.8 222.5
454 rows omitted
Our first instinct might be to drop all observations, but this leaves us too few rows to work with:
julia> Impute.filter(df; dims=:rows)
4×6 DataFrame
Row │ V1 V2 V3 V4 V5 V6
│ Float64 Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ -247.0 -132.2 -18.8 28.2 81.4 237.9
2 │ -234.0 -140.8 -56.5 28.0 114.3 222.9
3 │ -215.8 -114.8 -18.4 65.3 171.6 249.7
4 │ -247.6 -157.8 -53.3 28.3 122.9 227.6
We could try imputing the values with linear interpolation, but that still leaves missing data at the head and tail of our dataset:
julia> Impute.interp(df)
469×6 DataFrame
Row │ V1 V2 V3 V4 V5 V6
│ Float64? Float64? Float64 Float64? Float64? Float64?
─────┼───────────────────────────────────────────────────────────────────
1 │ missing -203.7 -84.1 18.5 missing missing
2 │ missing -203.0 -97.8 25.8 134.7 missing
3 │ missing -249.0 -92.1 27.8 177.1 missing
4 │ missing -231.5 -97.5 27.0 150.3 missing
5 │ missing -227.3 -130.1 25.8 160.0 missing
6 │ missing -223.1 -70.7 62.1 197.5 missing
7 │ missing -164.8 -12.2 76.8 202.8 missing
8 │ missing -221.6 -81.9 27.5 144.5 missing
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
463 │ -242.6 -142.0 -21.8 69.8 148.7 224.125
464 │ -235.9 -128.8 -33.1 68.8 177.1 230.25
465 │ -239.8 -140.8 -38.7 58.1 186.3 236.375
466 │ -243.7 -149.5 -40.3 62.8 139.7 242.5
467 │ -247.6 -157.8 -53.3 28.3 122.9 227.6
468 │ missing -154.9 -50.8 28.1 119.9 201.1
469 │ missing -180.7 -70.9 33.7 114.8 222.5
454 rows omitted
Finally, we can chain multiple simple methods together to give a complete dataset:
julia> Impute.interp(df) |> Impute.locf() |> Impute.nocb()
469×6 DataFrame
Row │ V1 V2 V3 V4 V5 V6
│ Float64? Float64? Float64 Float64? Float64? Float64?
─────┼────────────────────────────────────────────────────────────
1 │ -233.6 -203.7 -84.1 18.5 134.7 222.7
2 │ -233.6 -203.0 -97.8 25.8 134.7 222.7
3 │ -233.6 -249.0 -92.1 27.8 177.1 222.7
4 │ -233.6 -231.5 -97.5 27.0 150.3 222.7
5 │ -233.6 -227.3 -130.1 25.8 160.0 222.7
6 │ -233.6 -223.1 -70.7 62.1 197.5 222.7
7 │ -233.6 -164.8 -12.2 76.8 202.8 222.7
8 │ -233.6 -221.6 -81.9 27.5 144.5 222.7
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
463 │ -242.6 -142.0 -21.8 69.8 148.7 224.125
464 │ -235.9 -128.8 -33.1 68.8 177.1 230.25
465 │ -239.8 -140.8 -38.7 58.1 186.3 236.375
466 │ -243.7 -149.5 -40.3 62.8 139.7 242.5
467 │ -247.6 -157.8 -53.3 28.3 122.9 227.6
468 │ -247.6 -154.9 -50.8 28.1 119.9 201.1
469 │ -247.6 -180.7 -70.9 33.7 114.8 222.5
454 rows omitted
Warning:
- Your approach should depend on the properties of you data (e.g., MCAR, MAR, MNAR).
- In-place calls aren't guaranteed to mutate the original data, but it will try avoid copying if possible.
In the future, it may be possible to detect whether in-place operations are permitted on an array or table using traits:
- https://github.com/JuliaData/Tables.jl/issues/116
- https://github.com/JuliaDiffEq/ArrayInterface.jl/issues/22
Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
