Results for "elbow-method"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

19 skills found

arvkevi / Kneed

805

Knee point detection in Python :chart_with_upwards_trend:

universal

data-analysisdata-scienceelbow-method+4

Updated 12d ago

smazzanti / Are You Still Using Elbow Method

150

No description available

universal

Updated 1mo ago

jtemporal / Kmeans E Cotovelo

No description available

universal

clusteringclustering-validationdatascience+3

Updated 3y ago

hl-public / SOMPY Robust Clustering

Modification of SOMPY repo with robust K-means clustering (bootstrapped SSE elbow method)

universal

Updated 6mo ago

stefmolin / Ml Utils

Machine learning utility functions and classes.

universal

adjusted-r-squaredconfusion-matrixelbow-method+11

Updated 2y ago

kennedyCzar / NLP PROJECT BOOK INSIGHTS WITH PLOTLY

Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.

universal

callbacksclustering-algorithmcorpus-processing+17

Updated 2mo ago

sharmaroshan / Clustering Of Mall Customers

Clustering Analysis Performed on the Customers of a Mall based on some common attributes such as salary, buying habits, age and purchasing power etc, using Machine Learning Algorithms.

universal

agglomerative-clusteringbeginnerclustering+6

Updated 1y ago

Amey-Thakur / TSF UNSUPERVISED MACHINE LEARNING

Task: From the given ‘Iris’ dataset, predict the optimum number of clusters and represent it visually.

universal

ameyameythakurdata-science+17

Updated 1mo ago

AbbasPak / Pattern Recognition By Using Principal Component Analysis PCA

Analysing practical examples by using principal component analysis (PCA) and Clustring

universal

elbow-methodhierarchical-clusteringpca+1

Updated 5mo ago

kennedyCzar / HIGH DIMENSIONAL DATA CLUSTERING

Implementation of hierarchical clustering on small n-sample dataset with very high dimension. Together with the visualization results implemented in R and python

universal

clusteringelbow-methodhierachical+3

Updated 2y ago

sanketmaneDS / Principal Component Analysis

This repository contains introductory notebooks for principal component analysis.

universal

elbow-methodprincipal-component-analysissilhouette-score

Updated 1y ago

uah-crablab / Imu Constraint Matlab El Wr

Code pertaining to the manuscript 'Drift-free joint angle calculation using inertial measurement units without magnetometers: an exploration of sensor fusion methods for the elbow and wrist'

universal

Updated 1y ago

aiok03 / Final Mining

Descriptive statistics and Explanatory data analysis In order to have an idea of the received data, we look through our table transactions and train. The shape of the train is 6000 rows and 2 columns (client_id and target – gender). Also we considered the info of transactions and noticed that there are no empty values, all of them are equal to 130039. After that we merged two tables and called it as data. To display unique codes and types we used ‘unique’ function and noticed that unique codes 173 and unique types 61. Using ‘describe’ function we can see minimal code, type, sum and the same parameters but maximum. The first hypothesis was to find what gender makes lots of requests. For conveniency we used for loop to make values in percentile view. And according to the barplot the biggest number of processes are made by females. The second hypothesis was to find the code with the biggest sum. For that we grouped by code and counted the mean of all sums. This list we converted from series to frame for further working process. The problem was that the code interpreted the code as the index, that’s why we have to fix it with ‘reset_index’ function. After that we plotted the graph and noticed that the most high sum is with 4722 code and proved it with another code under the graph. The third hypothesis is to find the distribution of sums relatively to the gender. But the first graph didn’t replaced this information because the scatter of the data is too high. The sign is not normally distributed and it is not symmetrical. It is hard to asses, that’s why we grouped information by gender and counted mean of the sum. According to this information we noticed that males spend more money than women. The same process we made with median and got the same conclusion. And since the mean and median values are not equal, our assumption about unnormalized data was proved. The last hypothesis was to find number of clients for each type and code – to find the most popular request within clients. For that we applied ‘str’ to each parameter for correct visualization on the graph. Counted the number of each request for type and code and reflected it in the graphs. According to them the most popular is 1010 type and 6011 code. Lastly, for further working process we returned type and code to the int type. Feature engineering Client’s balance condition We took every sum from dataframe data, grouped for every client and found the sum for each of them. We calculated the income and expenses for each client. Some clients with minus value made more expenses, some of them not, that means that he got more income. In minus is 0, in plus is 1. RFM In RFM section we started from Recency. For each client we grouped the information about them and found the maximum date where the transaction was done. The datetime column consisted from two values – date and time, for further working process in future engineering section we divided them for different columns. The most recent day we equaled to 457 and according to this value started to count the recency of last transactions for each client by subtraction. The next step is Frequency. We used ‘group by’ function and counted appearance of each client in our database. The last step is Monetary (to count expenses). Using group by function and condition, where the sum is less than 0 (expenses are negative values), we counted the total expenses of each client and noticed one point. That some clients didn’t spend any money at all. Segmentation based on RFM We merged all the tables into one and made a rank according to the best values in each segment using percentage. Using the formula we divided clients by 5 score scale, by this database and elbow method, plotted the graph, where 3 clusters were optimal solution. With KMeans library we plotted the k-mean illustration of clients according to the distance from randomly chosen centroids, showed distribution of clients in clusters. After the work done we gathered basic table with clusters using prefixes to each of them. Clustering for codes Now we'll work with codes to create clustering codes, and we'll utilize TF IDF and k-means to do it. We will also employ limitization, tokenization, and stop word elimination. We import the pymorphy2 library for limiting, and limiting is when words take their original form. Tokenization by sentences is the process of dividing a written language into component sentences. We also need to delete stop words, a stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query. We would not want these words to take up space in our database, or taking up valuable processing time. We also make use of the re – Regular expression operations library, which is a library for regular expression operations. In this section we also use MorphAnalyzer() - Morphological analysis is the identification of a word's features based on how it is spelt. Morphological analysis does not make use of information about nearby words. For morphological analysis of words, there is a MorphAnalyzer class in pymorphy2. If we apply directly the clustering on those matrix, we will have issues as our matrices are very sparse and the computation of distances will be a mess. What we can do, is to perform IS to reduce data to a dense matrix of dimension 156 by applying SVD. Singular Value Decomposition (SVD) is one of the widely used methods for dimensionality reduction. We defined that 156 is the right number in our case. We used the Silhouette score to evaluate the quality of clusters created using K-Means. By Silhouette score we chose number of clusters and performed k means clustering on our tf-idf matrix. Then we tried to do a visualization of our clusters and we applied t-sne . t-SNE is a tool to visualize high-dimensional data. And then we added clusters to data and df dataframe. Finally we created word cloud by our clusters Clustering for types Data cleaning for types Firstly, we noticed that there were 155 types. However in data, there are 61 types. When we merge the data and that types, the total number of types become 58. This means that 3 types have no any description and that’s why we replace them with the mode value. Also we found that some types have type description ‘н.д’ which means no data and their total number in data is 26. Also we noticed that type description repeats for several types and we dropped duplicates and replaced them with first accurancy type in data. Creating clusters for types We manually divided them into the 5 categories according to dome key words in description. And merged them with our dataframe. Then we noticed outliers in recency and frequency. We found 0.999 and 0.001 quantile, where the first one is considered as the high, and the second is the low boundary. Everything above 0,999 and below 0.001 is considered as an outlier. We removed them for both recency and frequency. After that we checked dataframe by describe and concluded that everything become normal. Supervised learning The time for prediction came. We divided our dataframe into train and test and used KNN, Decision Tree Classifier and Random Forest, Logistic Regression for further predictions. We decided to investigate the accuracy from 1 to 20 with step 2 for each neighbor in train and test. And built the plot. The best result is accuracy 58 for 19 neighbors. Decision Tree gave us 54 for test set and Random Forest’s accuracy was 64. We investigated feature importance for both of them and noticed that monetary had the most influence on predicting the data. For Grid Search we manually set the hyper parameters and for cross validation equals to four folds. Best estimater for random forest classifier for grid search was found. After that good estimaters were chosen for random forest, and the same accuracy occurred. Best accuracy for random forest with default hyper parameters. We built confusion matrix and calculated recall, precision and f-1 score. Also we decided to build lofistic regression but the accuracy was too small, that’s why we build roc-auc and precision-recall curve. Conclusion All the models showed that taken data was not enough and actually not the best for gender prediction. Actions for increase the accuracy were done, such as adding more features, removing outliers. According to this investigation the best choice was random forest.

ronikobrosly / Automated Elbow Method

My implementation of Mu Zhu's method for an automated elbow method

universal

Updated 1y ago

AIArpi / Auto Find Optimal Kmeans

Automatic selection of k in k-means clustering using the Elbow method

universal

Updated 3y ago

muhammadhamzah8 / Airline Customer Value Analysis Case With Clustering K Means

Segment airline customers, analyze the characteristics of different customer categories, compare the value of customers from different customer categories, provide personalized services for categories of customers with different values, and formulate the right marketing strategy.

zed

airlineanalysisclustering+8

Updated 25d ago

Elisayiqin / Machine Learning Sec.3

全球新冠肺炎的数据分析，包括基础知识有：kmeans算法设计，SSE算法设计，分级聚类算法设计，cophenetic distance 算法设计。

universal

cophenetic-distancecovid19elbow-method+3

Updated 2y ago

DharshanSR / Partitioning Clustering

This project clusters white wines based on their chemical properties to understand their relationship with quality ratings, using techniques like k-means and PCA.

universal

bssclusteringelbow-method+6

Updated 7mo ago

mdarm / Hyperspectral Image Clustering

Project on hyperspectral-image clustering for the Μ402 - Clustering Algorithms course, NKUA, Fall 2022.

universal

adjusted-rand-indexcalinski-harabaz-scoreclustering-algorithms+7

Updated 2y ago