38 skills found · Page 1 of 2
Aastha2104 / Parkinson Disease PredictionIntroduction Parkinson’s Disease is the second most prevalent neurodegenerative disorder after Alzheimer’s, affecting more than 10 million people worldwide. Parkinson’s is characterized primarily by the deterioration of motor and cognitive ability. There is no single test which can be administered for diagnosis. Instead, doctors must perform a careful clinical analysis of the patient’s medical history. Unfortunately, this method of diagnosis is highly inaccurate. A study from the National Institute of Neurological Disorders finds that early diagnosis (having symptoms for 5 years or less) is only 53% accurate. This is not much better than random guessing, but an early diagnosis is critical to effective treatment. Because of these difficulties, I investigate a machine learning approach to accurately diagnose Parkinson’s, using a dataset of various speech features (a non-invasive yet characteristic tool) from the University of Oxford. Why speech features? Speech is very predictive and characteristic of Parkinson’s disease; almost every Parkinson’s patient experiences severe vocal degradation (inability to produce sustained phonations, tremor, hoarseness), so it makes sense to use voice to diagnose the disease. Voice analysis gives the added benefit of being non-invasive, inexpensive, and very easy to extract clinically. Background Parkinson's Disease Parkinson’s is a progressive neurodegenerative condition resulting from the death of the dopamine containing cells of the substantia nigra (which plays an important role in movement). Symptoms include: “frozen” facial features, bradykinesia (slowness of movement), akinesia (impairment of voluntary movement), tremor, and voice impairment. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. Performance Metrics TP = true positive, FP = false positive, TN = true negative, FN = false negative Accuracy: (TP+TN)/(P+N) Matthews Correlation Coefficient: 1=perfect, 0=random, -1=completely inaccurate Algorithms Employed Logistic Regression (LR): Uses the sigmoid logistic equation with weights (coefficient values) and biases (constants) to model the probability of a certain class for binary classification. An output of 1 represents one class, and an output of 0 represents the other. Training the model will learn the optimal weights and biases. Linear Discriminant Analysis (LDA): Assumes that the data is Gaussian and each feature has the same variance. LDA estimates the mean and variance for each class from the training data, and then uses properties of statistics (Bayes theorem , Gaussian distribution, etc) to compute the probability of a particular instance belonging to a given class. The class with the largest probability is the prediction. k Nearest Neighbors (KNN): Makes predictions about the validation set using the entire training set. KNN makes a prediction about a new instance by searching through the entire set to find the k “closest” instances. “Closeness” is determined using a proximity measurement (Euclidean) across all features. The class that the majority of the k closest instances belong to is the class that the model predicts the new instance to be. Decision Tree (DT): Represented by a binary tree, where each root node represents an input variable and a split point, and each leaf node contains an output used to make a prediction. Neural Network (NN): Models the way the human brain makes decisions. Each neuron takes in 1+ inputs, and then uses an activation function to process the input with weights and biases to produce an output. Neurons can be arranged into layers, and multiple layers can form a network to model complex decisions. Training the network involves using the training instances to optimize the weights and biases. Naive Bayes (NB): Simplifies the calculation of probabilities by assuming that all features are independent of one another (a strong but effective assumption). Employs Bayes Theorem to calculate the probabilities that the instance to be predicted is in each class, then finds the class with the highest probability. Gradient Boost (GB): Generally used when seeking a model with very high predictive performance. Used to reduce bias and variance (“error”) by combining multiple “weak learners” (not very good models) to create a “strong learner” (high performance model). Involves 3 elements: a loss function (error function) to be optimized, a weak learner (decision tree) to make predictions, and an additive model to add trees to minimize the loss function. Gradient descent is used to minimize error after adding each tree (one by one). Engineering Goal Produce a machine learning model to diagnose Parkinson’s disease given various features of a patient’s speech with at least 90% accuracy and/or a Matthews Correlation Coefficient of at least 0.9. Compare various algorithms and parameters to determine the best model for predicting Parkinson’s. Dataset Description Source: the University of Oxford 195 instances (147 subjects with Parkinson’s, 48 without Parkinson’s) 22 features (elements that are possibly characteristic of Parkinson’s, such as frequency, pitch, amplitude / period of the sound wave) 1 label (1 for Parkinson’s, 0 for no Parkinson’s) Project Pipeline pipeline Summary of Procedure Split the Oxford Parkinson’s Dataset into two parts: one for training, one for validation (evaluate how well the model performs) Train each of the following algorithms with the training set: Logistic Regression, Linear Discriminant Analysis, k Nearest Neighbors, Decision Tree, Neural Network, Naive Bayes, Gradient Boost Evaluate results using the validation set Repeat for the following training set to validation set splits: 80% training / 20% validation, 75% / 25%, and 70% / 30% Repeat for a rescaled version of the dataset (scale all the numbers in the dataset to a range from 0 to 1: this helps to reduce the effect of outliers) Conduct 5 trials and average the results Data a_o a_r m_o m_r Data Analysis In general, the models tended to perform the best (both in terms of accuracy and Matthews Correlation Coefficient) on the rescaled dataset with a 75-25 train-test split. The two highest performing algorithms, k Nearest Neighbors and the Neural Network, both achieved an accuracy of 98%. The NN achieved a MCC of 0.96, while KNN achieved a MCC of 0.94. These figures outperform most existing literature and significantly outperform current methods of diagnosis. Conclusion and Significance These robust results suggest that a machine learning approach can indeed be implemented to significantly improve diagnosis methods of Parkinson’s disease. Given the necessity of early diagnosis for effective treatment, my machine learning models provide a very promising alternative to the current, rather ineffective method of diagnosis. Current methods of early diagnosis are only 53% accurate, while my machine learning model produces 98% accuracy. This 45% increase is critical because an accurate, early diagnosis is needed to effectively treat the disease. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. With an earlier diagnosis, much of this degradation could have been slowed or treated. My results are very significant because Parkinson’s affects over 10 million people worldwide who could benefit greatly from an early, accurate diagnosis. Not only is my machine learning approach more accurate in terms of diagnostic accuracy, it is also more scalable, less expensive, and therefore more accessible to people who might not have access to established medical facilities and professionals. The diagnosis is also much simpler, requiring only a 10-15 second voice recording and producing an immediate diagnosis. Future Research Given more time and resources, I would investigate the following: Create a mobile application which would allow the user to record his/her voice, extract the necessary vocal features, and feed it into my machine learning model to diagnose Parkinson’s. Use larger datasets in conjunction with the University of Oxford dataset. Tune and improve my models even further to achieve even better results. Investigate different structures and types of neural networks. Construct a novel algorithm specifically suited for the prediction of Parkinson’s. Generalize my findings and algorithms for all types of dementia disorders, such as Alzheimer’s. References Bind, Shubham. "A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction." International Journal of Computer Science and Information Technologies 6 (2015): n. pag. International Journal of Computer Science and Information Technologies. 2015. Web. 8 Mar. 2017. Brooks, Megan. "Diagnosing Parkinson's Disease Still Challenging." Medscape Medical News. National Institute of Neurological Disorders, 31 July 2014. Web. 20 Mar. 2017. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection', Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM. BioMedical Engineering OnLine 2007, 6:23 (26 June 2007) Hashmi, Sumaiya F. "A Machine Learning Approach to Diagnosis of Parkinson’s Disease."Claremont Colleges Scholarship. Claremont College, 2013. Web. 10 Mar. 2017. Karplus, Abraham. "Machine Learning Algorithms for Cancer Diagnosis." Machine Learning Algorithms for Cancer Diagnosis (n.d.): n. pag. Mar. 2012. Web. 20 Mar. 2017. Little, Max. "Parkinsons Data Set." UCI Machine Learning Repository. University of Oxford, 26 June 2008. Web. 20 Feb. 2017. Ozcift, Akin, and Arif Gulten. "Classifier Ensemble Construction with Rotation Forest to Improve Medical Diagnosis Performance of Machine Learning Algorithms." Computer Methods and Programs in Biomedicine 104.3 (2011): 443-51. Semantic Scholar. 2011. Web. 15 Mar. 2017. "Parkinson’s Disease Dementia." UCI MIND. N.p., 19 Oct. 2015. Web. 17 Feb. 2017. Salvatore, C., A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopez, G. Arabia, M. Morelli, M.c. Gilardi, and A. Quattrone. "Machine Learning on Brain MRI Data for Differential Diagnosis of Parkinson's Disease and Progressive Supranuclear Palsy."Journal of Neuroscience Methods 222 (2014): 230-37. 2014. Web. 18 Mar. 2017. Shahbakhi, Mohammad, Danial Taheri Far, and Ehsan Tahami. "Speech Analysis for Diagnosis of Parkinson’s Disease Using Genetic Algorithm and Support Vector Machine."Journal of Biomedical Science and Engineering 07.04 (2014): 147-56. Scientific Research. July 2014. Web. 2 Mar. 2017. "Speech and Communication." Speech and Communication. Parkinson's Disease Foundation, n.d. Web. 22 Mar. 2017. Sriram, Tarigoppula V. S., M. Venkateswara Rao, G. V. Satya Narayana, and D. S. V. G. K. Kaladhar. "Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from Voice Dataset." SpringerLink. Springer, Cham, 01 Jan. 1970. Web. 17 Mar. 2017.
chopicalqui / TurboDataMinerThe objective of this Burp Suite extension is the flexible and dynamic extraction, correlation, and structured presentation of information from the Burp Suite project as well as the flexible and dynamic on-the-fly modification of outgoing or incoming HTTP requests using Python scripts. Thus, Turbo Data Miner shall aid in gaining a better and faster understanding of the data collected by Burp Suite.
zhangprofessor / Fast Non Local Means And Asymptotic Non Local MeansNon-Local means denoising (NLM) algorithm is a milestone algorithm in the field of image processing. The proposal of NLM has opened up the non-local method which has a deep influence. This paper performed a revisit for NLM from two aspects as follows: 1. To alleviate the high computational complexity problem of NLM, a fast algorithm was constructed, which was based on cross-correlation and fast Fourier transform; 2. NLM always blur structures and textures during the noise removal, especially in the case of strong noise. To solve this problem, an Asymptotic Non-Local Means image denoising algorithm is put forward, which uses the property of noise variance to control the filtering parameters. Numerical experiments illustrate that the fast algorithm is 27 times faster than classical implementation with standard parameter configuration, and the ANLM uniformly outperforms classical NLM, in terms of both PSNR and visual effects.
linhnguyen215538 / Volatility Study• Conducted a volatility study to develop pairs trading strategy by writing web crawlers that automated extracting 30 equity and ETF spot and options prices data from CBOE and Yahoo Finance • Utilized NumPy, Pandas, and SciPy packages to calculate implied volatility, realized volatility, and risk premiums to measure how the market prices risk • Gathered and plotted daily VIX futures data with Selenium, Seaborn and Matplotlib to study volatility term structure • Examined volatility clustering and built forecasting tools for market risk using correlations of daily absolute returns and volatility at different time lags
ignaciorlando / Fundus Vessel Segmentation TbmeIn this work, we present an extensive description and evaluation of our method for blood vessel segmentation in fundus images based on a discriminatively trained, fully connected conditional random field model. Standard segmentation priors such as a Potts model or total variation usually fail when dealing with thin and elongated structures. We overcome this difficulty by using a conditional random field model with more expressive potentials, taking advantage of recent results enabling inference of fully connected models almost in real-time. Parameters of the method are learned automatically using a structured output support vector machine, a supervised technique widely used for structured prediction in a number of machine learning applications. Our method, trained with state of the art features, is evaluated both quantitatively and qualitatively on four publicly available data sets: DRIVE, STARE, CHASEDB1 and HRF. Additionally, a quantitative comparison with respect to other strategies is included. The experimental results show that this approach outperforms other techniques when evaluated in terms of sensitivity, F1-score, G-mean and Matthews correlation coefficient. Additionally, it was observed that the fully connected model is able to better distinguish the desired structures than the local neighborhood based approach. Results suggest that this method is suitable for the task of segmenting elongated structures, a feature that can be exploited to contribute with other medical and biological applications.
KimManjin / StructViTThe official repository of the paper "Learning Correlation Structures for Vision Transformers" accepted to CVPR 2024.
benroberts999 / AmpsciA c++ program for high-precision atomic structure calculations of one and two valence systems. Uses Hartree-Fock + correlation potential method. Can calculate ionisation cross sections with large of energy/momentum transfer.
NaveenKaliannan / StructureFactorA simple matlab code to compute the structure factor S(q) from pair correlation function g(r)
konstantinklemmer / SpaceganSpaceGAN - Augmenting spatial correlation structures
reddyprasade / Machine Learning Interview PreparationPrepare to Technical Skills Here are the essential skills that a Machine Learning Engineer needs, as mentioned Read me files. Within each group are topics that you should be familiar with. Study Tip: Copy and paste this list into a document and save to your computer for easy referral. Computer Science Fundamentals and Programming Topics Data structures: Lists, stacks, queues, strings, hash maps, vectors, matrices, classes & objects, trees, graphs, etc. Algorithms: Recursion, searching, sorting, optimization, dynamic programming, etc. Computability and complexity: P vs. NP, NP-complete problems, big-O notation, approximate algorithms, etc. Computer architecture: Memory, cache, bandwidth, threads & processes, deadlocks, etc. Probability and Statistics Topics Basic probability: Conditional probability, Bayes rule, likelihood, independence, etc. Probabilistic models: Bayes Nets, Markov Decision Processes, Hidden Markov Models, etc. Statistical measures: Mean, median, mode, variance, population parameters vs. sample statistics etc. Proximity and error metrics: Cosine similarity, mean-squared error, Manhattan and Euclidean distance, log-loss, etc. Distributions and random sampling: Uniform, normal, binomial, Poisson, etc. Analysis methods: ANOVA, hypothesis testing, factor analysis, etc. Data Modeling and Evaluation Topics Data preprocessing: Munging/wrangling, transforming, aggregating, etc. Pattern recognition: Correlations, clusters, trends, outliers & anomalies, etc. Dimensionality reduction: Eigenvectors, Principal Component Analysis, etc. Prediction: Classification, regression, sequence prediction, etc.; suitable error/accuracy metrics. Evaluation: Training-testing split, sequential vs. randomized cross-validation, etc. Applying Machine Learning Algorithms and Libraries Topics Models: Parametric vs. nonparametric, decision tree, nearest neighbor, neural net, support vector machine, ensemble of multiple models, etc. Learning procedure: Linear regression, gradient descent, genetic algorithms, bagging, boosting, and other model-specific methods; regularization, hyperparameter tuning, etc. Tradeoffs and gotchas: Relative advantages and disadvantages, bias and variance, overfitting and underfitting, vanishing/exploding gradients, missing data, data leakage, etc. Software Engineering and System Design Topics Software interface: Library calls, REST APIs, data collection endpoints, database queries, etc. User interface: Capturing user inputs & application events, displaying results & visualization, etc. Scalability: Map-reduce, distributed processing, etc. Deployment: Cloud hosting, containers & instances, microservices, etc. Move on to the final lesson of this course to find lots of sample practice questions for each topic!
jacobgil / PccPreserving Clusters and Correlations: Dimensionality reduction with exceptional high global structure preservation
Chenguoz / PointSCNet[Symmetry 2022] Code Release of PointSCNet: Point Cloud Structure and Correlation Learning based on Space Filling Curve guided Sampling
hyunsooseol / SeolmatrixThis module is a tool for calculating correlations such as Partial, Tetrachoric, Intraclass correlation coefficients, Bootstrap agreement, Rater reliability, Generalizability Theory, Analytic Hierarchy Process, and allows users to produce Gaussian Graphical Model and Partial plot.
Ordinary0x / The 3rd EyeThe 3rd Eye is a modular OSINT (Open Source Intelligence) framework built on an agent-based, graph-driven architecture. It automates public information discovery, identity correlation, and exposure analysis across multiple platforms, and generates structured intelligence reports. The system follows a LangGraph agent design.
ShengzhiLuan / 2D Cellular MetamaterialsData and code for structure-property correlation in 2D Cellular Metamaterials
YiruS / KCNetCode for "Mining Point Cloud Local Structures by Kernel Correlation and Graph Max Pooling", CVPR, 2018.
itsoukal / AnySimAn R package for the stochastic simulation of processes with any marginal distribution and correlation structure
Arrondissement5etDemi / InverseStatMech.jlEfficient inverse statistical mechanical algorithms to generate effective potentials or configurations from pair correlation functions or structure factors.
ZhiyaoZhao / Towards Measuring Shape Similarity Of Polygons Based On Multiscale Features And Grid Context DescripIn spatial analysis application, measuring the shape similarity of polygons is crucial for polyg-onal object retrieval and shape clustering. As a complex cognition process, measuring shape similarity should be able to find out the difference of polygons as objects in observation in terms of visual perception and the difference of the region, boundary, and structure formed by the polygons from the mathematic point of view. In the existing approaches, shape similarity of polygons is calculated by only comparing their mathematic characteristics while not taking the human perception into consideration. Aiming to solve this problem, we use the features of con-text and texture of polygons, since they are basic visual perception elements to fit the cognition purpose. In this paper, we propose a contour diffusion method for the similarity measurement of polygons. By converting a polygon into grid representation, the contour feature is represent-ed as a multiscale statistic feature, and the region feature is transformed into condensed grid context features. Instead of treating shape similarity as a distance between two representations of polygons, the proposed method observes it as a correlation between textures extracted by shape feature. The experiments show that the accuracy of the proposed method is superior to that of the turning function and Fourier descriptor.
M3RG-IITD / ML DOSImplementation of unsupervised GNN (GraphSAGE) to generate node representations for disordered systems. OPTICS clustering of obtained node embeddings reveals structure-dynamics correlation.