18 skills found
sherpa-ai / SherpaHyperparameter optimization that enables researchers to experiment, visualize, and scale quickly.
chncwang / LaozhongyiLaozhongyi is an automatic hyperparameter tuning program based on grid search and simulated annealing.
storopoli / Topic ModellingHandy Jupyter Notebooks that I use in for Topic Modeling. Including text mining from PDF files, text preprocessing, Latent Dirichlet Allocation (LDA), hyperparameters grid search and Topic Modeling visualiation.
paudelprabesh / Hyperparameter Tuning In LSTM NetworkHyperparameter Tuning in LSTM using Genetic Algorithm, Bayesian Optimization, Random Search, Grid Search.
manajalali / Voltage Regulation Using SVMThis code is the implementation of the following paper: M. Jalali, V. Kekatos, N. Gatsis and D. Deka, "Designing Reactive Power Control Rules for Smart Inverters Using Support Vector Machines," in IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1759-1770, March 2020, doi: 10.1109/TSG.2019.2942850. The main code 1) loads the data. However, the data is not included here. the code should be modified accordingly. 2) Finds the hyperparameters uding cross validation. While the main file includes the optimization using l2 loss function, the functions for l1 oprimization are included here with suffix "2". 3) Solves the volatge regulation optimization problem using the Mosek solver. 4) Solves the volatge regulation problem using the optimal power flow problem and the local control rules as well. The main function includes the following functions: 1) Preprocessing: the scaling, oversizing, centering and normalzing of the data. 2) KFCrossvalid_SVM: finds the hyperparameters using crossvalidation 3) mosek_crossValid (mosek2_crossValid): located inside the KFCrossvalid_SVM which solves theoptimization problem 4) SVM_gauss_mosek and SVM_lin_mosek: solve the ctual optimization problems for finding the parameters a abd b for reative power control rules. 5) localControl: finds the reactive power local control rules 6) eval_SVM_gauss, eval_SVM_lin: evaluates the reactive power control rules given the measurements and obtained parameters 7) optimalGlobal (SOCP): solves the central optimal power flow problem
MOUHASSINE-badreddine / Predicting Rental Price MoroccoCreation of regression model from an extracted (web scrapping with BeautifulSoup) dataset from Moroccan announcements website (mubawab.ma) by trying linear algorithms ( Ordinary Least Squares and Lasso algorithms) , Tree-based algorithms and Ensemble algorithms (Random Forest Regressor and Gradient Boosting regressor) and using grid search to optimize Gradient Boosting regressor hyperparameters.
Yifeng-He / Human Activity Recognition From Accelerometer Data Using Ensemble LearningThis project aims to classify the human activities using ensemble learning method. In this project, we compared the recognition accuracy among different classifiers, visualized the data using seaborn library and t-SNE, and tuned the hyperparameters using grid search and k-fold cross-validation.
danny-1k / Torch GsPytorch wrapper for grid search of hyperparameters
matthieuheitz / SweetsweepRun parameter sweeps and visualize results effortlessly
shashwat23 / Titanic Survival PredictionTitanic-Machine-Learning-from-Disaster This repository contains a machine learning project for predicting survival of passengers who travelled on Titanic Ship in 1912. Problem Description- This project highlights my approach to the introductory machine learning competition on Kaggle website- Titanic: Machine Learning from Disaster [1]. The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. This project analyses which people were likely to survive. In particular, tools of machine learning have been used to predict which passengers survived the tragedy. Project Description This project has been made in Python v3.4. It uses various data processing, visualisation and machine learning packages such as numpy, pandas, matplotlib, scikit-learn etc. which should be installed if the code is run on a local machine. The project uses a 5 step process (general procedure) for it's predicting task which is as follows [2]: Perform a statistical analysis of the data and look over it's characteristics such as data type of columns, number of instances, correlation of each attribute with the output variable, finding mean and other information about data, correlation matrix etc. After performing statistical analysis, do a visual analysis by plotting the data. Do analyse the scatter_matrix, plot box plots etc. so as to know which attributes are relevant and which are not. Remove irrelevant attributes from the dataset for further analysis. Make a list of all machine learning algorithms that can give good prediction results and spot check each one of them (apply each one of them on the dataset) to find which one is better for prediction. Use k-fold cross validation to calculate performance characteristics of each of the learners (accuracy, precision, recall, area under ROC curve etc.). Take some of the good performing algorithms and perform a grid search/ randomised search over it's hyperparameters to find the optimal hyperparameters for the prediction task. Ensure that the optimal hyperparameters do not overfit the data, by performing k-fold cross validations on learners using these tuned hyperparametes as well. Use an ensemble or Voting Classifier on the above selected algorithms to achieve better performance or use any one of the above algorithm directly to perform predictions. Keep iterating over the above steps again and again and tune them according to the need so as to achieve better performance. File Description titanic_predictor - contains python code for predicting survival. my_solution.csv - contains sample output file generated from algorithm. train.csv- contains training data test.csv - contains testing data for making predictions readme.md - for guide to this project.
klimanyusuf / Combating Twitter Hate Speech Using ML And NLPUsing NLP and ML, make a model to identify hate speech (racist or sexist tweets) in Twitter. Problem Statement: Twitter is the biggest platform where anybody and everybody can have their views heard. Some of these voices spread hate and negativity. Twitter is wary of its platform being used as a medium to spread hate. You are a data scientist at Twitter, and you will help Twitter in identifying the tweets with hate speech and removing them from the platform. You will use NLP techniques, perform specific cleanup for tweets data, and make a robust model. Domain: Social Media Analysis to be done: Clean up tweets and build a classification model by using NLP techniques, cleanup specific for tweets data, regularization and hyperparameter tuning using stratified k-fold and cross validation to get the best model. Content: id: identifier number of the tweet Label: 0 (non-hate) /1 (hate) Tweet: the text in the tweet Tasks: Load the tweets file using read_csv function from Pandas package. Get the tweets into a list for easy text cleanup and manipulation. To cleanup: Normalize the casing. Using regular expressions, remove user handles. These begin with '@’. Using regular expressions, remove URLs. Using TweetTokenizer from NLTK, tokenize the tweets into individual terms. Remove stop words. Remove redundant terms like ‘amp’, ‘rt’, etc. Remove ‘#’ symbols from the tweet while retaining the term. Extra cleanup by removing terms with a length of 1. Check out the top terms in the tweets: First, get all the tokenized terms into one large list. Use the counter and find the 10 most common terms. Data formatting for predictive modeling: Join the tokens back to form strings. This will be required for the vectorizers. Assign x and y. Perform train_test_split using sklearn. We’ll use TF-IDF values for the terms as a feature to get into a vector space model. Import TF-IDF vectorizer from sklearn. Instantiate with a maximum of 5000 terms in your vocabulary. Fit and apply on the train set. Apply on the test set. Model building: Ordinary Logistic Regression Instantiate Logistic Regression from sklearn with default parameters. Fit into the train data. Make predictions for the train and the test set. Model evaluation: Accuracy, recall, and f_1 score. Report the accuracy on the train set. Report the recall on the train set: decent, high, or low. Get the f1 score on the train set. Looks like you need to adjust the class imbalance, as the model seems to focus on the 0s. Adjust the appropriate class in the LogisticRegression model. Train again with the adjustment and evaluate. Train the model on the train set. Evaluate the predictions on the train set: accuracy, recall, and f_1 score. Regularization and Hyperparameter tuning: Import GridSearch and StratifiedKFold because of class imbalance. Provide the parameter grid to choose for ‘C’ and ‘penalty’ parameters. Use a balanced class weight while instantiating the logistic regression. Find the parameters with the best recall in cross validation. Choose ‘recall’ as the metric for scoring. Choose stratified 4 fold cross validation scheme. Fit into the train set. What are the best parameters? Predict and evaluate using the best estimator. Use the best estimator from the grid search to make predictions on the test set. What is the recall on the test set for the toxic comments? What is the f_1 score?
junmoan / Mnist Hyperparameter OptimizationHyperparameter Optimization using Grid Search, Randomized Search, and Bayesian Optimization
saadhaxxan / Optimizing Hyperparameters Using Grid SearchOptimizing-Hyperparameters-Using-Grid-Search-Deep-Learning
willjobs / Hyperparameter SearchComparison of Bayesian hyperparameter optimization with grid search and random search, on neural networks, decision trees, random forests, and KNN
trcook / Rbc ModelA reinforcement learning approach to a basic RBC model
ohmthanap / Telecom Customer Churn PredictionsDeveloped a churn prediction classification model using various techniques including: EDA, Decision trees, Naive Bayes, AdaBoost, MLP, Bagging, RF, KNN, logistic regression, SVM, Hyperparameter tuning using Grid Search CV and Randomized Search CV.
kamruleee51 / Diabetes Classification DatasetIn this article, we proposed a new labeled diabetes dataset from a South Asian country (Bangladesh). Additionally, we recommended an automated classification pipeline, introducing a weighted ensemble of several Machine Learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). The critical hyperparameters of these ML models are tuned using a grid search hyperparameter optimization approach. Missing values imputation, feature selection, and K-fold cross-validation were also incorporated into the designed framework.
klimanyusuf / Twitter HateDESCRIPTION Using NLP and ML, make a model to identify hate speech (racist or sexist tweets) in Twitter. Problem Statement: Twitter is the biggest platform where anybody and everybody can have their views heard. Some of these voices spread hate and negativity. Twitter is wary of its platform being used as a medium to spread hate. You are a data scientist at Twitter, and you will help Twitter in identifying the tweets with hate speech and removing them from the platform. You will use NLP techniques, perform specific cleanup for tweets data, and make a robust model. Domain: Social Media Analysis to be done: Clean up tweets and build a classification model by using NLP techniques, cleanup specific for tweets data, regularization and hyperparameter tuning using stratified k-fold and cross-validation to get the best model. Content: id: identifier number of the tweet Label: 0 (non-hate) /1 (hate) Tweet: the text in the tweet Tasks: Load the tweets file using read_csv function from Pandas package. Get the tweets into a list for easy text cleanup and manipulation. To cleanup: Normalize the casing. Using regular expressions, remove user handles. These begin with '@’. Using regular expressions, remove URLs. Using TweetTokenizer from NLTK, tokenize the tweets into individual terms. Remove stop words. Remove redundant terms like ‘amp’, ‘rt’, etc. Remove ‘#’ symbols from the tweet while retaining the term. Extra cleanup by removing terms with a length of 1. Check out the top terms in the tweets: First, get all the tokenized terms into one large list. Use the counter and find the 10 most common terms. Data formatting for predictive modeling: Join the tokens back to form strings. This will be required for the vectorizers. Assign x and y. Perform train_test_split using sklearn. We’ll use TF-IDF values for the terms as a feature to get into a vector space model. Import TF-IDF vectorizer from sklearn. Instantiate with a maximum of 5000 terms in your vocabulary. Fit and apply on the train set. Apply on the test set. Model building: Ordinary Logistic Regression Instantiate Logistic Regression from sklearn with default parameters. Fit into the train data. Make predictions for the train and the test set. Model evaluation: Accuracy, recall, and f_1 score. Report the accuracy on the train set. Report the recall on the train set: decent, high, or low. Get the f1 score on the train set. Looks like you need to adjust the class imbalance, as the model seems to focus on the 0s. Adjust the appropriate class in the LogisticRegression model. Train again with the adjustment and evaluate. Train the model on the train set. Evaluate the predictions on the train set: accuracy, recall, and f_1 score. Regularization and Hyperparameter tuning: Import GridSearch and StratifiedKFold because of class imbalance. Provide the parameter grid to choose for ‘C’ and ‘penalty’ parameters. Use a balanced class weight while instantiating the logistic regression. Find the parameters with the best recall in cross validation. Choose ‘recall’ as the metric for scoring. Choose stratified 4 fold cross validation scheme. Fit into the train set. What are the best parameters? Predict and evaluate using the best estimator. Use the best estimator from the grid search to make predictions on the test set. What is the recall on the test set for the toxic comments?