45 skills found · Page 1 of 2
Teichlab / CelltypistA tool for semi-automatic cell type classification
Aastha2104 / Parkinson Disease PredictionIntroduction Parkinson’s Disease is the second most prevalent neurodegenerative disorder after Alzheimer’s, affecting more than 10 million people worldwide. Parkinson’s is characterized primarily by the deterioration of motor and cognitive ability. There is no single test which can be administered for diagnosis. Instead, doctors must perform a careful clinical analysis of the patient’s medical history. Unfortunately, this method of diagnosis is highly inaccurate. A study from the National Institute of Neurological Disorders finds that early diagnosis (having symptoms for 5 years or less) is only 53% accurate. This is not much better than random guessing, but an early diagnosis is critical to effective treatment. Because of these difficulties, I investigate a machine learning approach to accurately diagnose Parkinson’s, using a dataset of various speech features (a non-invasive yet characteristic tool) from the University of Oxford. Why speech features? Speech is very predictive and characteristic of Parkinson’s disease; almost every Parkinson’s patient experiences severe vocal degradation (inability to produce sustained phonations, tremor, hoarseness), so it makes sense to use voice to diagnose the disease. Voice analysis gives the added benefit of being non-invasive, inexpensive, and very easy to extract clinically. Background Parkinson's Disease Parkinson’s is a progressive neurodegenerative condition resulting from the death of the dopamine containing cells of the substantia nigra (which plays an important role in movement). Symptoms include: “frozen” facial features, bradykinesia (slowness of movement), akinesia (impairment of voluntary movement), tremor, and voice impairment. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. Performance Metrics TP = true positive, FP = false positive, TN = true negative, FN = false negative Accuracy: (TP+TN)/(P+N) Matthews Correlation Coefficient: 1=perfect, 0=random, -1=completely inaccurate Algorithms Employed Logistic Regression (LR): Uses the sigmoid logistic equation with weights (coefficient values) and biases (constants) to model the probability of a certain class for binary classification. An output of 1 represents one class, and an output of 0 represents the other. Training the model will learn the optimal weights and biases. Linear Discriminant Analysis (LDA): Assumes that the data is Gaussian and each feature has the same variance. LDA estimates the mean and variance for each class from the training data, and then uses properties of statistics (Bayes theorem , Gaussian distribution, etc) to compute the probability of a particular instance belonging to a given class. The class with the largest probability is the prediction. k Nearest Neighbors (KNN): Makes predictions about the validation set using the entire training set. KNN makes a prediction about a new instance by searching through the entire set to find the k “closest” instances. “Closeness” is determined using a proximity measurement (Euclidean) across all features. The class that the majority of the k closest instances belong to is the class that the model predicts the new instance to be. Decision Tree (DT): Represented by a binary tree, where each root node represents an input variable and a split point, and each leaf node contains an output used to make a prediction. Neural Network (NN): Models the way the human brain makes decisions. Each neuron takes in 1+ inputs, and then uses an activation function to process the input with weights and biases to produce an output. Neurons can be arranged into layers, and multiple layers can form a network to model complex decisions. Training the network involves using the training instances to optimize the weights and biases. Naive Bayes (NB): Simplifies the calculation of probabilities by assuming that all features are independent of one another (a strong but effective assumption). Employs Bayes Theorem to calculate the probabilities that the instance to be predicted is in each class, then finds the class with the highest probability. Gradient Boost (GB): Generally used when seeking a model with very high predictive performance. Used to reduce bias and variance (“error”) by combining multiple “weak learners” (not very good models) to create a “strong learner” (high performance model). Involves 3 elements: a loss function (error function) to be optimized, a weak learner (decision tree) to make predictions, and an additive model to add trees to minimize the loss function. Gradient descent is used to minimize error after adding each tree (one by one). Engineering Goal Produce a machine learning model to diagnose Parkinson’s disease given various features of a patient’s speech with at least 90% accuracy and/or a Matthews Correlation Coefficient of at least 0.9. Compare various algorithms and parameters to determine the best model for predicting Parkinson’s. Dataset Description Source: the University of Oxford 195 instances (147 subjects with Parkinson’s, 48 without Parkinson’s) 22 features (elements that are possibly characteristic of Parkinson’s, such as frequency, pitch, amplitude / period of the sound wave) 1 label (1 for Parkinson’s, 0 for no Parkinson’s) Project Pipeline pipeline Summary of Procedure Split the Oxford Parkinson’s Dataset into two parts: one for training, one for validation (evaluate how well the model performs) Train each of the following algorithms with the training set: Logistic Regression, Linear Discriminant Analysis, k Nearest Neighbors, Decision Tree, Neural Network, Naive Bayes, Gradient Boost Evaluate results using the validation set Repeat for the following training set to validation set splits: 80% training / 20% validation, 75% / 25%, and 70% / 30% Repeat for a rescaled version of the dataset (scale all the numbers in the dataset to a range from 0 to 1: this helps to reduce the effect of outliers) Conduct 5 trials and average the results Data a_o a_r m_o m_r Data Analysis In general, the models tended to perform the best (both in terms of accuracy and Matthews Correlation Coefficient) on the rescaled dataset with a 75-25 train-test split. The two highest performing algorithms, k Nearest Neighbors and the Neural Network, both achieved an accuracy of 98%. The NN achieved a MCC of 0.96, while KNN achieved a MCC of 0.94. These figures outperform most existing literature and significantly outperform current methods of diagnosis. Conclusion and Significance These robust results suggest that a machine learning approach can indeed be implemented to significantly improve diagnosis methods of Parkinson’s disease. Given the necessity of early diagnosis for effective treatment, my machine learning models provide a very promising alternative to the current, rather ineffective method of diagnosis. Current methods of early diagnosis are only 53% accurate, while my machine learning model produces 98% accuracy. This 45% increase is critical because an accurate, early diagnosis is needed to effectively treat the disease. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. With an earlier diagnosis, much of this degradation could have been slowed or treated. My results are very significant because Parkinson’s affects over 10 million people worldwide who could benefit greatly from an early, accurate diagnosis. Not only is my machine learning approach more accurate in terms of diagnostic accuracy, it is also more scalable, less expensive, and therefore more accessible to people who might not have access to established medical facilities and professionals. The diagnosis is also much simpler, requiring only a 10-15 second voice recording and producing an immediate diagnosis. Future Research Given more time and resources, I would investigate the following: Create a mobile application which would allow the user to record his/her voice, extract the necessary vocal features, and feed it into my machine learning model to diagnose Parkinson’s. Use larger datasets in conjunction with the University of Oxford dataset. Tune and improve my models even further to achieve even better results. Investigate different structures and types of neural networks. Construct a novel algorithm specifically suited for the prediction of Parkinson’s. Generalize my findings and algorithms for all types of dementia disorders, such as Alzheimer’s. References Bind, Shubham. "A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction." International Journal of Computer Science and Information Technologies 6 (2015): n. pag. International Journal of Computer Science and Information Technologies. 2015. Web. 8 Mar. 2017. Brooks, Megan. "Diagnosing Parkinson's Disease Still Challenging." Medscape Medical News. National Institute of Neurological Disorders, 31 July 2014. Web. 20 Mar. 2017. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection', Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM. BioMedical Engineering OnLine 2007, 6:23 (26 June 2007) Hashmi, Sumaiya F. "A Machine Learning Approach to Diagnosis of Parkinson’s Disease."Claremont Colleges Scholarship. Claremont College, 2013. Web. 10 Mar. 2017. Karplus, Abraham. "Machine Learning Algorithms for Cancer Diagnosis." Machine Learning Algorithms for Cancer Diagnosis (n.d.): n. pag. Mar. 2012. Web. 20 Mar. 2017. Little, Max. "Parkinsons Data Set." UCI Machine Learning Repository. University of Oxford, 26 June 2008. Web. 20 Feb. 2017. Ozcift, Akin, and Arif Gulten. "Classifier Ensemble Construction with Rotation Forest to Improve Medical Diagnosis Performance of Machine Learning Algorithms." Computer Methods and Programs in Biomedicine 104.3 (2011): 443-51. Semantic Scholar. 2011. Web. 15 Mar. 2017. "Parkinson’s Disease Dementia." UCI MIND. N.p., 19 Oct. 2015. Web. 17 Feb. 2017. Salvatore, C., A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopez, G. Arabia, M. Morelli, M.c. Gilardi, and A. Quattrone. "Machine Learning on Brain MRI Data for Differential Diagnosis of Parkinson's Disease and Progressive Supranuclear Palsy."Journal of Neuroscience Methods 222 (2014): 230-37. 2014. Web. 18 Mar. 2017. Shahbakhi, Mohammad, Danial Taheri Far, and Ehsan Tahami. "Speech Analysis for Diagnosis of Parkinson’s Disease Using Genetic Algorithm and Support Vector Machine."Journal of Biomedical Science and Engineering 07.04 (2014): 147-56. Scientific Research. July 2014. Web. 2 Mar. 2017. "Speech and Communication." Speech and Communication. Parkinson's Disease Foundation, n.d. Web. 22 Mar. 2017. Sriram, Tarigoppula V. S., M. Venkateswara Rao, G. V. Satya Narayana, and D. S. V. G. K. Kaladhar. "Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from Voice Dataset." SpringerLink. Springer, Cham, 01 Jan. 1970. Web. 17 Mar. 2017.
rezakj / ICellRiCellR is an interactive R package designed to facilitate the analysis and visualization of high-throughput single-cell sequencing data. It supports a variety of single-cell technologies, including scRNA-seq, scVDJ-seq, scATAC-seq, CITE-Seq, and Spatial Transcriptomics (ST).
cole-trapnell-lab / GarnettAutomated cell type classification
ZJUFanLab / ScDeepSortCell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network
nsalomonis / AltanalyzeAltAnalyze is a multi-functional and easy-to-use software package for automated single-cell and bulk gene and splicing analyses. Easy-to-use precompiled graphical user-interface versions available from our website.
deweylab / CellOCellO: Gene expression-based hierarchical cell type classification using the Cell Ontology
CahanLab / CellNetCellNet: network biology applied to stem cell engineering
jianhuupenn / ItClustIterative transfer learning with neural network improves clustering and cell type classification in single-cell RNA-seq analysis
majidghgol / TabularCellTypeClassificationCode and experiment data for ICDM'19 paper, tabular cell classification using pre-trained cell embeddings. Note that the code and data is cleaned for release and is not the exact version used in the paper, and may not produce exact results.
sayantann11 / Clustering Modelsfor MLlustering in Machine Learning Introduction to Clustering It is basically a type of unsupervised learning method . An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labelled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a set of examples. Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. For ex– The data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters, and we can identify that there are 3 clusters in the below picture. It is not necessary for clusters to be a spherical. Such as : DBSCAN: Density-based Spatial Clustering of Applications with Noise These data points are clustered by using the basic concept that the data point lies within the given constraint from the cluster centre. Various distance methods and techniques are used for calculation of the outliers. Why Clustering ? Clustering is very much important as it determines the intrinsic grouping among the unlabeled data present. There are no criteria for a good clustering. It depends on the user, what is the criteria they may use which satisfy their need. For instance, we could be interested in finding representatives for homogeneous groups (data reduction), in finding “natural clusters” and describe their unknown properties (“natural” data types), in finding useful and suitable groupings (“useful” data classes) or in finding unusual data objects (outlier detection). This algorithm must make some assumptions which constitute the similarity of points and each assumption make different and equally valid clusters. Clustering Methods : Density-Based Methods : These methods consider the clusters as the dense region having some similarity and different from the lower dense region of the space. These methods have good accuracy and ability to merge two clusters.Example DBSCAN (Density-Based Spatial Clustering of Applications with Noise) , OPTICS (Ordering Points to Identify Clustering Structure) etc. Hierarchical Based Methods : The clusters formed in this method forms a tree-type structure based on the hierarchy. New clusters are formed using the previously formed one. It is divided into two category Agglomerative (bottom up approach) Divisive (top down approach) examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies) etc. Partitioning Methods : These methods partition the objects into k clusters and each partition forms one cluster. This method is used to optimize an objective criterion similarity function such as when the distance is a major parameter example K-means, CLARANS (Clustering Large Applications based upon Randomized Search) etc. Grid-based Methods : In this method the data space is formulated into a finite number of cells that form a grid-like structure. All the clustering operation done on these grids are fast and independent of the number of data objects example STING (Statistical Information Grid), wave cluster, CLIQUE (CLustering In Quest) etc. Clustering Algorithms : K-means clustering algorithm – It is the simplest unsupervised learning algorithm that solves clustering problem.K-means algorithm partition n observations into k clusters where each observation belongs to the cluster with the nearest mean serving as a prototype of the cluster . Applications of Clustering in different fields Marketing : It can be used to characterize & discover customer segments for marketing purposes. Biology : It can be used for classification among different species of plants and animals. Libraries : It is used in clustering different books on the basis of topics and information. Insurance : It is used to acknowledge the customers, their policies and identifying the frauds. City Planning: It is used to make groups of houses and to study their values based on their geographical locations and other factors present. Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous zones. References : Wiki Hierarchical clustering Ijarcs matteucc analyticsvidhya knowm
yanailab / MoanaMoana: Robust cell type classification of single-cell RNA-Seq data
ruiyi-zhang / ScPretrainscPretrain: Multi-task self-supervised learning for cell type classification
dissatisfaction-ai / ScHierarchyA toolking for cell type hierarchies: marker selection & consistent classification
sxslabjhu / CellTypeClassificationDeep Learning Based Cell Type Classification with Low-Res Brightfield Single Cell Images
LucasESBS / ScoreCTscoreCT: Automated Cell Type Annotation
CahanLab / PySingleCellNetsingleCellNet in Python
igordot / ClustermoleUnbiased single-cell transcriptomic data cell type identification
fsrt16 / Introduction To Genomic Data Sciences Breast Cancer Detection# Breast-cancer-risk-prediction > Necessity, who is the mother of invention. – Plato* ## Welcome to my GitHub repository on Using Predictive Analytics model to diagnose breast cancer. --- ### Objective: The repository is a learning exercise to: * Apply the fundamental concepts of machine learning from an available dataset * Evaluate and interpret my results and justify my interpretation based on observed data set * Create notebooks that serve as computational records and document my thought process. The analysis is divided into four sections, saved in juypter notebooks in this repository 1. Identifying the problem and Data Sources 2. Exploratory Data Analysis 3. Pre-Processing the Data 4. Build model to predict whether breast cell tissue is malignant or Benign ### [Notebook 1](https://github.com/ShiroJean/Breast-cancer-risk-prediction/blob/master/NB1_IdentifyProblem%2BDataClean.ipynb): Identifying the problem and Getting data. **Notebook goal:Identify the types of information contained in our data set** In this notebook I used Python modules to import external data sets for the purpose of getting to know/familiarize myself with the data to get a good grasp of the data and think about how to handle the data in different ways. ### [Notebook 2](https://github.com/ShiroJean/Breast-cancer-risk-prediction/blob/master/NB2_ExploratoryDataAnalysis.ipynb) Exploratory Data Analysis **Notebook goal: Explore the variables to assess how they relate to the response variable** In this notebook, I am getting familiar with the data using data exploration and visualization techniques using python libraries (Pandas, matplotlib, seaborn. Familiarity with the data is important which will provide useful knowledge for data pre-processing) ### [Notebook 3](https://github.com/ShiroJean/Breast-cancer-risk-prediction/blob/master/NB3_DataPreprocesing.ipynb) Pre-Processing the data **Notebook goal:Find the most predictive features of the data and filter it so it will enhance the predictive power of the analytics model.** In this notebook I use feature selection to reduce high-dimension data, feature extraction and transformation for dimensionality reduction. This is essential in preparing the data before predictive models are developed. ### [Notebook 4](https://github.com/ShiroJean/Breast-cancer-risk-prediction/blob/master/NB4_PredictiveModelUsingSVM.ipynb) Predictive model using Support Vector Machine (svm) **Notebook goal: Construct predictive models to predict the diagnosis of a breast tumor.** In this notebook, I construct a predictive model using SVM machine learning algorithm to predict the diagnosis of a breast tumor. The diagnosis of a breast tumor is a binary variable (benign or malignant). I also evaluate the model using confusion matrix the receiver operating curves (ROC), which are essential in assessing and interpreting the fitted model. ### [Notebook 5](https://github.com/ShiroJean/Breast-cancer-risk-prediction/blob/master/NB_5%20OptimizingSVMClassifier.ipynb): Optimizing the Support Vector Classifier **Notebook goal: Construct predictive models to predict the diagnosis of a breast tumor.** In this notebook, I aim to tune parameters of the SVM Classification model using scikit-learn.
dia2018 / What Is The Difference Between AI And Machine LearningArtificial Intelligence and Machine Learning have empowered our lives to a large extent. The number of advancements made in this space has revolutionized our society and continue making society a better place to live in. In terms of perception, both Artificial Intelligence and Machine Learning are often used in the same context which leads to confusion. AI is the concept in which machine makes smart decisions whereas Machine Learning is a sub-field of AI which makes decisions while learning patterns from the input data. In this blog, we would dissect each term and understand how Artificial Intelligence and Machine Learning are related to each other. What is Artificial Intelligence? The term Artificial Intelligence was recognized first in the year 1956 by John Mccarthy in an AI conference. In layman terms, Artificial Intelligence is about creating intelligent machines which could perform human-like actions. AI is not a modern-day phenomenon. In fact, it has been around since the advent of computers. The only thing that has changed is how we perceive AI and define its applications in the present world. The exponential growth of AI in the last decade or so has affected every sphere of our lives. Starting from a simple google search which gives the best results of a query to the creation of Siri or Alexa, one of the significant breakthroughs of the 21st century is Artificial Intelligence. The Four types of Artificial Intelligence are:- Reactive AI – This type of AI lacks historical data to perform actions, and completely reacts to a certain action taken at the moment. It works on the principle of Deep Reinforcement learning where a prize is awarded for any successful action and penalized vice versa. Google’s AlphaGo defeated experts in Go using this approach. Limited Memory – In the case of the limited memory, the past data is kept on adding to the memory. For example, in the case of selecting the best restaurant, the past locations would be taken into account and would be suggested accordingly. Theory of Mind – Such type of AI is yet to be built as it involves dealing with human emotions, and psychology. Face and gesture detection comes close but nothing advanced enough to understand human emotions. Self-Aware – This is the future advancement of AI which could configure self-representations. The machines could be conscious, and super-intelligent. Two of the most common usage of AI is in the field of Computer Vision, and Natural Language Processing. Computer Vision is the study of identifying objects such as Face Recognition, Real-time object detection, and so on. Detection of such movements could go a long way in analyzing the sentiments conveyed by a human being. Natural Language Processing, on the other hand, deals with textual data to extract insights or sentiments from it. From ChatBot Development to Speech Recognition like Amazon’s Alexa or Apple’s Siri all uses Natural Language to extract relevant meaning from the data. It is one of the widely popular fields of AI which has found its usefulness in every organization. One other application of AI which has gained popularity in recent times is the self-driving cars. It uses reinforcement learning technique to learn its best moves and identify the restrictions or blockage in front of the road. Many automobile companies are gradually adopting the concept of self-driving cars. What is Machine Learning? Machine Learning is a state-of-the-art subset of Artificial Intelligence which let machines learn from past data, and make accurate predictions. Machine Learning has been around for decades, and the first ML application that got popular was the Email Spam Filter Classification. The system is trained with a set of emails labeled as ‘spam’ and ‘not spam’ known as the training instance. Then a new set of unknown emails is fed to the trained system which then categorizes it as ‘spam’ or ‘not spam.’ All these predictions are made by a certain group of Regression, and Classification algorithms like – Linear Regression, Logistic Regression, Decision Tree, Random Forest, XGBoost, and so on. The usability of these algorithms varies based on the problem statement and the data set in operation. Along with these basic algorithms, a sub-field of Machine Learning which has gained immense popularity in recent times is Deep Learning. However, Deep Learning requires enormous computational power and works best with a massive amount of data. It uses neural networks whose architecture is similar to the human brain. Machine Learning could be subdivided into three categories – Supervised Learning – In supervised learning problems, both the input feature and the corresponding target variable is present in the dataset. Unsupervised Learning – The dataset is not labeled in an unsupervised learning problem i.e., only the input features are present, but not the target variable. The algorithms need to find out the separate clusters in the dataset based on certain patterns. Reinforcement Learning – In this type of problems, the learner is rewarded with a prize for every correct move, and penalized for every incorrect move. The application of Machine Learning is diversified in various domains like Banking, Healthcare, Retail, etc. One of the use cases in the banking industry is predicting the probability of credit loan default by a borrower given its past transactions, credit history, debt ratio, annual income, and so on. In Healthcare, Machine Learning is often been used to predict patient’s stay in the hospital, the likelihood of occurrence of a disease, identifying abnormal patterns in the cell, etc. Many software companies have incorporated Machine Learning in their workflow to steadfast the process of testing. Various manual, repetitive tasks are being replaced by machine learning models. Comparison Between AI and Machine Learning Machine Learning is the subset of Artificial Intelligence which has taken the advancement in AI to a whole new level. The thought behind letting the computer learn from themselves and voluminous data that are getting generated from various sources in the present world has led to the emergence of Machine Learning. In Machine Learning, the concept of neural networks plays a significant role in allowing the system to learn from themselves as well as maintaining its speed, and accuracy. The group of neural nets lets a model rectifying its prior decision and make a more accurate prediction next time. Artificial Intelligence is about acquiring knowledge and applying them to ensure success instead of accuracy. It makes the computer intelligent to make smart decisions on its own akin to the decisions made by a human being. The more complex the problem is, the better it is for AI to solve the complexity. On the other hand, Machine Learning is mostly about acquiring knowledge and maintaining better accuracy instead of success. The primary aim is to learn from the data to automate specific tasks. The possibilities around Machine Learning and Neural Networks are endless. A set of sentiments could be understood from raw text. A machine learning application could also listen to music, and even play a piece of appropriate music based on a person’s mood. NLP, a field of AI which has made some ground-breaking innovations in recent years uses Machine Learning to understand the nuances in natural language and learn to respond accordingly. Different sectors like banking, healthcare, manufacturing, etc., are reaping the benefits of Artificial Intelligence, particularly Machine Learning. Several tedious tasks are getting automated through ML which saves both time and money. Machine Learning has been sold these days consistently by marketers even before it has reached its full potential. AI could be seen as something of the old by the marketers who believe Machine Learning is the Holy Grail in the field of analytics. The future is not far when we would see human-like AI. The rapid advancement in technology has taken us closer than ever before to inevitability. The recent progress in the working AI is much down to how Machine Learning operates. Both Artificial Intelligence and Machine Learning has its own business applications and its usage is completely dependent on the requirements of an organization. AI is an age-old concept with Machine Learning picking up the pace in recent times. Companies like TCS, Infosys are yet to unleash the full potential of Machine Learning and trying to incorporate ML in their applications to keep pace with the rapidly growing Analytics space. Conclusion The hype around Artificial Intelligence and Machine Learning are such that various companies and even individuals want to master the skills without even knowing the difference between the two. Often both the terms are misused in the same context. To master Machine Learning, one needs to have a natural intuition about the data, ask the right questions, and find out the correct algorithms to use to build a model. It often doesn’t requiem how computational capacity. On the other hand, AI is about building intelligent systems which require advanced tools and techniques and often used in big companies like Google, Facebook, etc. There is a whole host of resources to master Machine Learning and AI. The Data Science blogs of Dimensionless is a good place to start with. Also, There are Online Data Science Courses which cover the various nitty gritty of Machine Learning.