SubsetSelection.jl
Fast Subset Selection algorithm for Statistics/Machine Learning
Install / Use
/learn @jeanpauphilet/SubsetSelection.jlREADME
SubsetSelection
SubsetSelection is a Julia package that computes sparse L2-regularized estimators. Sparsity is enforced through explicit cardinality constraint or L0-penalty. Supported loss functions for regression are least squares, L1 and L2 SVR; for classification, logistic, L1 and L2 Hinge loss. The algorithm formulates the problem as a mixed-integer saddle-point problem and solves its boolean relaxation using a dual sub-gradient approach.
Quick start
To install the package:
julia> Pkg.install("SubsetSelection")
or the have the latest version
julia> Pkg.clone("git://github.com/jeanpauphilet/SubsetSelection.jl.git")
To fit a basic model:
julia> using SubsetSelection, StatsBase
julia> n = 100; p = 10000; k = 10;
julia> indices = sort(sample(1:p, StatsBase.Weights(ones(p)/p), k, replace=false));
julia> w = sample(-1:2:1, k);
julia> X = randn(n,p); Y = X[:,indices]*w;
julia> Sparse_Regressor = subsetSelection(OLS(), Constraint(k), Y, X)
SubsetSelection.SparseEstimator(SubsetSelection.OLS(),SubsetSelection.Constraint(10),10.0,[362,1548,2361,3263,3369,3598,5221,7314,7748,9267],[5.37997,-5.51019,-5.77256,-7.27197,-6.32432,-4.97585,5.94814,4.75648,5.48098,-5.91967],[-0.224588,-1.1446,2.81566,0.582427,-0.923311,4.1153,-2.43833,0.117831,0.0982258,-1.60631 … 0.783925,-1.1055,0.841752,-1.09645,-0.397962,3.48083,-1.33903,1.44676,4.03583,1.05817],0.0,19)
The algorithm returns a SparseEstimator object with the following fields: loss (loss function used), sparsity (model to enforce sparsity), indices (features selected), w (value of the estimator on the selected features only), α (values of the associated dual variables), b (bias term), iter (number of iterations required by the algorithm).
For instance, you can access the selected features directly in the indices field:
julia> Sparse_Regressor.indices
10-element Array{Int64,1}:
362
1548
2361
3263
3369
3598
5221
7314
7748
9267
or compute predictions
julia> Y_pred = X[:,Sparse_Regressor.indices]*Sparse_Regressor.w
100-element Array{Float64,1}:
4.62918
8.59952
-16.2796
-5.611
1.62764
-50.4562
37.407
-12.3341
-4.75339
25.122
⋮
-7.98349
11.0327
-8.58172
16.904
-9.04211
-36.5475
17.2558
-22.3915
-57.9727
-6.06553
For classification, we use +1/-1 labels and the convention
P ( Y = y | X = x ) = 1 / (1+e^{- y x^T w}).
Required and optional parameters
subsetSelection has four required parameters:
- the loss function to be minimized, to be chosen among least squares (
OLS()), L1SVR (L1SVR(ɛ)), L2SVR (L2SVR(ɛ)), Logistic loss (LogReg()), Hinge Loss (L1SVM()), L2-SVM (L2SVM()). For classification, we recommend using Hinge loss or L2-SVM functions. Indeed, the Fenchel conjugate of the Logistic loss exhibits unbounded gradients, which largely hinders convergence of the algorithm and might require smaller and more steps (see optional parameters). - the model used to enforce sparsity; either by adding a hard constraint of the form "||w||_0 < k" (
Constraint(k)) or by adding a penalty of the form "+ λ ||w||_0" (Penalty(λ)) to the objective. For tractability issues, we highly recommend using an explicit constraint instead of a penalty, for it ensures the size of the support remains bounded through the algorithm. - the vector of outputs
Yof sizen, the sample size. In classification settings,Yshould be a vector of ±1s. - the matrix of covariates
Xof sizen×p, wherenandpare the number of samples and features respectively.
In addition, subsetSelection accepts the following optional parameters:
- an initialization for the selected features,
indInit. - an initialization for the dual variables,
αInit. - the value of the ℓ2-regularization parameter
γ, set to 1/√n by default. intercept, a boolean. If true, an intercept/bias term is computed as well. By default, set to false.- the maximum number of iterations in the sub-gradient algorithm,
maxIter. - the value of the gradient stepsize
δ. By default, the stepsize is set to 1e-3, which demonstrates very good empirical performance. However, smaller stepsizes might be needed when dealing with very large datasets or when the Logistic loss is used. - the number of gradient updates of dual variable α performed per update of primal variable s,
gradUp. anticyclinga boolean. If true, the algorithm stops as soon as the support is not unchanged from one iteration to another. Empirically, the accuracy of the resulting support is strongly sensitive to noise - to use with caution. By default, set to false.averaginga boolean. If true, the dual solution is averaged over past iterates. By default, set to true.
Best practices
- Tuning the regularization parameter
γ: By default,γis set to 1/√n, which is an appropriate scaling in most regression instances. For an optimal performance, and especially in classification or noisy settings, we recommend performing a grid search and using cross-validation to assess out-of-sample performance. The grid search should start with a very low value forγ, such as
γ = 1.*p / k / n / maximum(sum(X[train,:].^2,2))
and iteratively increase it by a factor 2. Mean square error or Area Under the Curve (see ROCAnalysis for implementation) are commonly used performance metrics for regression and classification tasks respectively.
- Instances where the algorithm fails to converge have been reported. If you occur such cases, try normalize the data matrix
Xand relaunch the algorithm. If the algorithm still fails to converge, reduce the stepsizeδby a factor 10 or 100 and increase the number of iterationsmaxIterby a factor at least 2.
Reference
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
