zoofs Logo Header

<div align="center"> <h1> 🐾 zoofs ( Zoo Feature Selection ) </h1>

</div>

zoofs is a Python library for performing feature selection using a variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's an easy to use, flexible and powerful tool to reduce your feature size.

🌟 Like this Project? Give us a star !

📘 Documentation

https://jaswinder9051998.github.io/zoofs/

🔗 Whats new in V0.1.24

pass kwargs through objective function
improved logger for results
added harris hawk algorithm
now you can pass timeout as a parameter to stop operation after the given number of second(s). An amazing alternative to passing number of iterations
Feature score hashing of visited feature sets to increase the overall performance

🛠 Installation

Using pip

Use the package manager to install zoofs.

pip install zoofs

📜 Available Algorithms

| Algorithm Name | Class Name | Description | References doi | |----------|-------------|-------------|-------------| | Particle Swarm Algorithm | ParticleSwarmOptimization | Utilizes swarm behaviour | https://doi.org/10.1007/978-3-319-13563-2_51 | | Grey Wolf Algorithm | GreyWolfOptimization | Utilizes wolf hunting behaviour | https://doi.org/10.1016/j.neucom.2015.06.083 | | Dragon Fly Algorithm | DragonFlyOptimization | Utilizes dragonfly swarm behaviour | https://doi.org/10.1016/j.knosys.2020.106131 | | Harris Hawk Algorithm | HarrisHawkOptimization | Utilizes hawk hunting behaviour | https://link.springer.com/chapter/10.1007/978-981-32-9990-0_12 | | Genetic Algorithm Algorithm | GeneticOptimization | Utilizes genetic mutation behaviour | https://doi.org/10.1109/ICDAR.2001.953980 | | Gravitational Algorithm | GravitationalOptimization | Utilizes newtons gravitational behaviour | https://doi.org/10.1109/ICASSP.2011.5946916 |

More algos soon, stay tuned !

[Try It Now?]

⚡️ Usage

Define your own objective function for optimization !

Classification Example

from sklearn.metrics import log_loss
# define your own objective function, make sure the function receives four parameters,
#  fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):      
    model.fit(X_train,y_train)  
    P=log_loss(y_valid,model.predict_proba(X_valid))
    return P

# import an algorithm !  
from zoofs import ParticleSwarmOptimization
# create object of algorithm
algo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
                                       population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMClassifier()                                       
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()

Regression Example

from sklearn.metrics import mean_squared_error
# define your own objective function, make sure the function receives four parameters,
#  fit your model and return the objective value !
def objective_function_topass(model,X_train, y_train, X_valid, y_valid):      
    model.fit(X_train,y_train)  
    P=mean_squared_error(y_valid,model.predict(X_valid))
    return P

# import an algorithm !  
from zoofs import ParticleSwarmOptimization
# create object of algorithm
algo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
                                       population_size=20,minimize=True)
import lightgbm as lgb
lgb_model = lgb.LGBMRegressor()                                       
# fit the algorithm
algo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your results
algo_object.plot_history()

Suggestions for Usage

As available algorithms are wrapper algos, it is better to use ml models that build quicker, e.g lightgbm, catboost.
Take sufficient amount for 'population_size' , as this will determine the extent of exploration and exploitation of the algo.
Ensure that your ml model has its hyperparamters optimized before passing it to zoofs algos.

objective score plot

objective score Header

Algorithms

<details> <summary markdown="span"> Particle Swarm Algorithm </summary>

Particle Swarm

In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

class zoofs.ParticleSwarmOptimization(objective_function,n_iteration=50,population_size=50,minimize=True,c1=2,c2=2,w=0.9)

| | | |----------|-------------| | Parameters | objective_function : user made function of the signature 'func(model,X_train,y_train,X_test,y_test)'. <br/> <dl> <dd> The function must return a value, that needs to be minimized/maximized. </dd> </dl> n_iteration : int, default=1000 <br/> <dl> <dd> Number of time the algorithm will run </dd> </dl> timeout: int = None <br/> <dl> <dd> Stop operation after the given number of second(s). If this argument is set to None, the operation is executed without time limitation and n_iteration is followed </dd> </dl> population_size : int, default=50 <br/> <dl> <dd> Total size of the population </dd> </dl> minimize : bool, default=True <br/> <dl> <dd> Defines if the objective value is to be maximized or minimized </dd> </dl> c1 : float, default=2.0 <br/> <dl> <dd> first acceleration coefficient of particle swarm </dd> </dl> c2 : float, default=2.0 <br/> <dl> <dd> second acceleration coefficient of particle swarm </dd> </dl> w : float, default=0.9 <br/> <dl> <dd> weight parameter </dd> </dl>| | Attributes | best_feature_list : array-like <br/> <dl> <dd> Final best set of features </dd> </dl> |

Methods

| Methods | Class Name | |----------|-------------| | fit | Run the algorithm | | plot_history | Plot results achieved across iteration |

fit(model,X_train, y_train, X_test, y_test,verbose=True)

| | | |----------|-------------| | Parameters | model : <br/> <dl> <dd> machine learning model's object </dd> </dl> X_train : pandas.core.frame.DataFrame of shape (n_samples, n_features) <br/><dl> <dd> Training input samples to be used for machine learning model </dd> </dl> y_train : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples) <br/> <dl> <dd> The target values (class labels in classification, real numbers in regression). </dd> </dl> X_valid : pandas.core.frame.DataFrame of shape (n_samples, n_features) <br/> <dl> <dd> Validation input samples </dd> </dl> y_valid : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples) <br/> <dl> <dd> The Valid

Zoofs

Install / Use

README