11 skills found
theusual / Kaggle Yelp Business Rating PredictionCode for Kaggle Contest "RecSys2013: Yelp Business Rating Prediction" -- Predicting Business Ratings on Yelp.com Using Machine Learning. My final submission scored an RMSE of 1.2307 which earned a rank of #7 out of ~250 teams. This a collection of code that is meant to be run from python console, it is not a stand-alone program
AshwiniDPrabhu / Stock Prediction Using Twitter Sentiment AnalysisNo description available
tanishq-ctrl / House Price Prediction And VisualizationThis repository contains code and data for analyzing real estate trends, predicting house prices, estimating time on the market, and building an interactive dashboard for visualization. It is structured to cater to data scientists, real estate analysts, and developers looking to understand property market dynamics.
KrishArul26 / Air Quality Index Prediction With DeploymentIndia is one of the countries with the highest air pollution country. Generally, air pollution is assessed by PM value or air quality index value. For my further analysis, I have selected PM-2.5 value to determine the air quality prediction and the India-Bangalore region. Also, the data was collected through web scraping with the help of Beautiful Soup.
ashinde8 / Data Preprocessing And Machine Learning- The dataset consists of 1042 rows and 20 columns. This is a regression problem where we can the target variable is 'price' which I have predicted using Machine Learning Modeling. - Dropped the columns 'id', 'time_created','time_updated','external_id','url','latitude' and 'longitude' from the dataset, as these variables do not provide information significant in modeling. - Here I have observed that the variable 'status' has only one value throughout the dataset i.e. 'active', hence I have can drop this variable as it is not providing us significant information. - I observed that the variables 'bedrooms' ,'bathrooms', 'garages' ,'parkings' ,'offering' ,'erf_size' ,' floor_size' have missing values and the target variable 'price' also has missing values. Hence I took care of this by filling the missing values of the independent features and the target variable. - After making the above observation I filled the two rows which have value '[None]' in the property_type column with 'house' as the value for the'agency' variable for these rows is 'rawson' and the mode for the variable 'property_type' for the agency 'rawson' is 'house' and also mode for the 'property_type' variable for the area 'Constantia' is also 'house' - Predicted the missing Values Using Imputers From sklearn.preprocessing - Here I used the KNNImputer to fill the missing values in the variables 'price', "garages","parkings","erf_size","floor_size" by predicting the values using the KNNImputer library. - We go through a range of values from 1 to 20, for the parameter 'n_neighbors' in the KNNImputer, as we want to find which value of 'n_neighbors' gives the maximum value of correlation between the target variable 'price' and the feature 'floor_size'. The reason I have selected the variable 'floor_size' to calculate the correlation with the target variable 'price' is that, before imputing the missing values the target variable 'price' had the highest corrleation with the independent variable 'floor_size' which was 0.5319914806523912. Now I am finding the maximum correaltion value between the target variable 'price' and the variable 'floor_size' after the missing values are imputed using the KNNImputer, for different values of the parameter 'n_neighbors' and then compare it with 0.5319914806523912, whcih is the correlation for the original dataset whcih consists of missing values. - Here we observe that the maximum correlation between the target variable 'price' and the independent variable 'floor_size' is 0.4233518730063556, when the value for 'n_neighbors' is 6. This value is less than the value of correlation for the orignal dataset, hence we move on to another Imputer to fill the missing values as after the missing values were filled using the KNNImputer the correlation decreased whcih is not desirable. - Here we observe that the correlation between the target variable 'price' and the independent variable 'floor_size' is 0.6703992976511615 after the imputation of missing values using IterativeImpueter. This value is more than the correlation value for the original dataset. Hence we allow the imputation of the missing values using IterativeImputer into the orignal dataset. - Now while filling the variable 'bathrooms' and 'bedrooms'; there are 4 and 14 NaN values respectively. Hence I have decided to fill the values on a case by case basis. I have decided to fill the 'NaN' values based on their 'property_type'. So for filling the 'bathrooms' variable which has 'property_type' as 'house', I have filled these values with the mode for the 'bathrooms' and 'bedrooms' variable. Similarly I have done the same for the other 'property_type' 'apartment'. - Performed Data Visualizations for the features to draw more insights. - Here, you can see outliers in the target variable 'price' from the above figure. While price outliers would not be a concern because it is the target feature,the presence of outliers in predictors, in this case there aren't any, would affect the model’s performance. Detecting outliers and choosing the appropriate scaling method to minimize their effect would ultimately improve performance. - From the correlation matrix, we can see that there is varying extent to which the independent variables are correlated with the target. Lower correlation means weak linear relationship but there may be a strong non-linear relationship so, we can’t pass any judgement at this level, let the algorithm work for us. - Build the regression models Linear Regression, XGBoost, AdaBoost, Decision Tree, Random Forest, KNN and SVM. - Performed Hyperparameter tuning for all the above algorithms. - Predicted the prices using the above models and used the metrics RMSE, R -square and Adjusted R-square. - As expected, the Adjusted R² score is slightly lower than the R² score for each model and if we evaluate based on this metric, the best fit model would be XGBoost with the highest Adjusted R² score and the worst would be SVM Regressor with the least R² score. - However, this metric is only a relative measure of fitness so, we must look at the RMSE values. - In this case, XGBoost and SVM have the lowest and highest RMSE values respectively and the rest models are in the exact same order as their Adjusted R² scores.
sharmaroshan / Predicting Money Spent At ResortIt is From Analytics Vidhya Hackathons, Sponsored by Club Mahindra. It is based on Regression Problem, Where Accuracy matters the most, It is measured by RMSE Score. Different Techniques such as Stacking, Ensembling, Boosting and Scientific Operations such box-cox Operations to reduce skewness of the data.
hulaba / Crop Yield Prediction Comparison Using ML DL TechniquesIn this project, we compare and predict the yield of five crops (wheat, barley, jowar, rapeseed & mustard, and bajra) in Rajasthan (district-wise) using three machine learning techniques: random forest, lasso regression and SVM, and two deep learning techniques: gradient descent and RNN LSTM. To apply the models to our data, we divided it into training and testing datasets. Each model is tested twice: once with only "area" and "production" in mind, and then again with additional factors (rainfall and soil type) in mind to predict crop yield. To find the model that most accurately predicts the yield, R2 score, Root Mean Squared Error (RMSE) and Mean Average Error (MAE) are calculated for each model.
ajayarunachalam / RegressorMetricGraphPlotPython package to simplify plotting of common evaluation metrics for regression models. Metrics included are pearson correlation coefficient (r), coefficient of determination (r-squared), mean squared error (mse), root mean squared error(rmse), root mean squared relative error (rmsre), mean absolute error (mae), mean absolute percentage error (mape), etc.
newsteps8 / Air Pollution Forecast Based On DL And ML ModelsThis repository has been created for air pollution forecast in the coming hours in Beijing.
mahendra047 / Stock Price Prediction Using Recurrent Neural Network LSTM In this project, I made an attempt to build a LSTM-RNN model to predict stock prices using keras with tensorflow(backend). The training data comes from historical closing prices of various stock indices and news sentiment score. The accuracy of the stock price prediction is measured by Root Mean Square Error (RMSE). We did some experiments on the network's hyper-parameters such as LSTM cell hidden state size, truncated back propagation length and depth of the network. Last but not the least, we built a website using this prediction model as engine with Flask and python.
dataclergy / Forecasting Hourly Energy Consumption With PythonThe energy sector is one of the largest and most important sectors out there. The ability to efficiently forecast hourly energy consumption plays an important role on how energy is distributed and consumed. Deep learning algorithms have played vital roles in prediction and forecasting problems alike. In this example, the deep learning algorithm technique known as Recurrent Neural Networks (RNN) and Long-Term Short Memory (LSTM) are applied on a time series data set consisting of hourly energy consumption for different counties according to their clients and activities with the aim of making forecast on future energy consumption. Models generally performed better by reducing batch size and by increasing epoch sizes. Having evaluated the results using RMSE, MAE and R2 scores, the LSTM and RNN models are both seen to have excellent performances in the forecasting of hourly energy consumption.