95 skills found · Page 1 of 4
gwastro / PycbcCore package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
Shanghua-Gao / SOD100KThe official repo of the TPAMI 2021/ECCV 2020 work CSNet: A Highly Efficient Model with 100K Parameters to Study the Semantics of Salient Object Detection
baku89 / Ui StudyParameters UI Study
Aastha2104 / Parkinson Disease PredictionIntroduction Parkinson’s Disease is the second most prevalent neurodegenerative disorder after Alzheimer’s, affecting more than 10 million people worldwide. Parkinson’s is characterized primarily by the deterioration of motor and cognitive ability. There is no single test which can be administered for diagnosis. Instead, doctors must perform a careful clinical analysis of the patient’s medical history. Unfortunately, this method of diagnosis is highly inaccurate. A study from the National Institute of Neurological Disorders finds that early diagnosis (having symptoms for 5 years or less) is only 53% accurate. This is not much better than random guessing, but an early diagnosis is critical to effective treatment. Because of these difficulties, I investigate a machine learning approach to accurately diagnose Parkinson’s, using a dataset of various speech features (a non-invasive yet characteristic tool) from the University of Oxford. Why speech features? Speech is very predictive and characteristic of Parkinson’s disease; almost every Parkinson’s patient experiences severe vocal degradation (inability to produce sustained phonations, tremor, hoarseness), so it makes sense to use voice to diagnose the disease. Voice analysis gives the added benefit of being non-invasive, inexpensive, and very easy to extract clinically. Background Parkinson's Disease Parkinson’s is a progressive neurodegenerative condition resulting from the death of the dopamine containing cells of the substantia nigra (which plays an important role in movement). Symptoms include: “frozen” facial features, bradykinesia (slowness of movement), akinesia (impairment of voluntary movement), tremor, and voice impairment. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. Performance Metrics TP = true positive, FP = false positive, TN = true negative, FN = false negative Accuracy: (TP+TN)/(P+N) Matthews Correlation Coefficient: 1=perfect, 0=random, -1=completely inaccurate Algorithms Employed Logistic Regression (LR): Uses the sigmoid logistic equation with weights (coefficient values) and biases (constants) to model the probability of a certain class for binary classification. An output of 1 represents one class, and an output of 0 represents the other. Training the model will learn the optimal weights and biases. Linear Discriminant Analysis (LDA): Assumes that the data is Gaussian and each feature has the same variance. LDA estimates the mean and variance for each class from the training data, and then uses properties of statistics (Bayes theorem , Gaussian distribution, etc) to compute the probability of a particular instance belonging to a given class. The class with the largest probability is the prediction. k Nearest Neighbors (KNN): Makes predictions about the validation set using the entire training set. KNN makes a prediction about a new instance by searching through the entire set to find the k “closest” instances. “Closeness” is determined using a proximity measurement (Euclidean) across all features. The class that the majority of the k closest instances belong to is the class that the model predicts the new instance to be. Decision Tree (DT): Represented by a binary tree, where each root node represents an input variable and a split point, and each leaf node contains an output used to make a prediction. Neural Network (NN): Models the way the human brain makes decisions. Each neuron takes in 1+ inputs, and then uses an activation function to process the input with weights and biases to produce an output. Neurons can be arranged into layers, and multiple layers can form a network to model complex decisions. Training the network involves using the training instances to optimize the weights and biases. Naive Bayes (NB): Simplifies the calculation of probabilities by assuming that all features are independent of one another (a strong but effective assumption). Employs Bayes Theorem to calculate the probabilities that the instance to be predicted is in each class, then finds the class with the highest probability. Gradient Boost (GB): Generally used when seeking a model with very high predictive performance. Used to reduce bias and variance (“error”) by combining multiple “weak learners” (not very good models) to create a “strong learner” (high performance model). Involves 3 elements: a loss function (error function) to be optimized, a weak learner (decision tree) to make predictions, and an additive model to add trees to minimize the loss function. Gradient descent is used to minimize error after adding each tree (one by one). Engineering Goal Produce a machine learning model to diagnose Parkinson’s disease given various features of a patient’s speech with at least 90% accuracy and/or a Matthews Correlation Coefficient of at least 0.9. Compare various algorithms and parameters to determine the best model for predicting Parkinson’s. Dataset Description Source: the University of Oxford 195 instances (147 subjects with Parkinson’s, 48 without Parkinson’s) 22 features (elements that are possibly characteristic of Parkinson’s, such as frequency, pitch, amplitude / period of the sound wave) 1 label (1 for Parkinson’s, 0 for no Parkinson’s) Project Pipeline pipeline Summary of Procedure Split the Oxford Parkinson’s Dataset into two parts: one for training, one for validation (evaluate how well the model performs) Train each of the following algorithms with the training set: Logistic Regression, Linear Discriminant Analysis, k Nearest Neighbors, Decision Tree, Neural Network, Naive Bayes, Gradient Boost Evaluate results using the validation set Repeat for the following training set to validation set splits: 80% training / 20% validation, 75% / 25%, and 70% / 30% Repeat for a rescaled version of the dataset (scale all the numbers in the dataset to a range from 0 to 1: this helps to reduce the effect of outliers) Conduct 5 trials and average the results Data a_o a_r m_o m_r Data Analysis In general, the models tended to perform the best (both in terms of accuracy and Matthews Correlation Coefficient) on the rescaled dataset with a 75-25 train-test split. The two highest performing algorithms, k Nearest Neighbors and the Neural Network, both achieved an accuracy of 98%. The NN achieved a MCC of 0.96, while KNN achieved a MCC of 0.94. These figures outperform most existing literature and significantly outperform current methods of diagnosis. Conclusion and Significance These robust results suggest that a machine learning approach can indeed be implemented to significantly improve diagnosis methods of Parkinson’s disease. Given the necessity of early diagnosis for effective treatment, my machine learning models provide a very promising alternative to the current, rather ineffective method of diagnosis. Current methods of early diagnosis are only 53% accurate, while my machine learning model produces 98% accuracy. This 45% increase is critical because an accurate, early diagnosis is needed to effectively treat the disease. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. With an earlier diagnosis, much of this degradation could have been slowed or treated. My results are very significant because Parkinson’s affects over 10 million people worldwide who could benefit greatly from an early, accurate diagnosis. Not only is my machine learning approach more accurate in terms of diagnostic accuracy, it is also more scalable, less expensive, and therefore more accessible to people who might not have access to established medical facilities and professionals. The diagnosis is also much simpler, requiring only a 10-15 second voice recording and producing an immediate diagnosis. Future Research Given more time and resources, I would investigate the following: Create a mobile application which would allow the user to record his/her voice, extract the necessary vocal features, and feed it into my machine learning model to diagnose Parkinson’s. Use larger datasets in conjunction with the University of Oxford dataset. Tune and improve my models even further to achieve even better results. Investigate different structures and types of neural networks. Construct a novel algorithm specifically suited for the prediction of Parkinson’s. Generalize my findings and algorithms for all types of dementia disorders, such as Alzheimer’s. References Bind, Shubham. "A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction." International Journal of Computer Science and Information Technologies 6 (2015): n. pag. International Journal of Computer Science and Information Technologies. 2015. Web. 8 Mar. 2017. Brooks, Megan. "Diagnosing Parkinson's Disease Still Challenging." Medscape Medical News. National Institute of Neurological Disorders, 31 July 2014. Web. 20 Mar. 2017. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection', Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM. BioMedical Engineering OnLine 2007, 6:23 (26 June 2007) Hashmi, Sumaiya F. "A Machine Learning Approach to Diagnosis of Parkinson’s Disease."Claremont Colleges Scholarship. Claremont College, 2013. Web. 10 Mar. 2017. Karplus, Abraham. "Machine Learning Algorithms for Cancer Diagnosis." Machine Learning Algorithms for Cancer Diagnosis (n.d.): n. pag. Mar. 2012. Web. 20 Mar. 2017. Little, Max. "Parkinsons Data Set." UCI Machine Learning Repository. University of Oxford, 26 June 2008. Web. 20 Feb. 2017. Ozcift, Akin, and Arif Gulten. "Classifier Ensemble Construction with Rotation Forest to Improve Medical Diagnosis Performance of Machine Learning Algorithms." Computer Methods and Programs in Biomedicine 104.3 (2011): 443-51. Semantic Scholar. 2011. Web. 15 Mar. 2017. "Parkinson’s Disease Dementia." UCI MIND. N.p., 19 Oct. 2015. Web. 17 Feb. 2017. Salvatore, C., A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopez, G. Arabia, M. Morelli, M.c. Gilardi, and A. Quattrone. "Machine Learning on Brain MRI Data for Differential Diagnosis of Parkinson's Disease and Progressive Supranuclear Palsy."Journal of Neuroscience Methods 222 (2014): 230-37. 2014. Web. 18 Mar. 2017. Shahbakhi, Mohammad, Danial Taheri Far, and Ehsan Tahami. "Speech Analysis for Diagnosis of Parkinson’s Disease Using Genetic Algorithm and Support Vector Machine."Journal of Biomedical Science and Engineering 07.04 (2014): 147-56. Scientific Research. July 2014. Web. 2 Mar. 2017. "Speech and Communication." Speech and Communication. Parkinson's Disease Foundation, n.d. Web. 22 Mar. 2017. Sriram, Tarigoppula V. S., M. Venkateswara Rao, G. V. Satya Narayana, and D. S. V. G. K. Kaladhar. "Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from Voice Dataset." SpringerLink. Springer, Cham, 01 Jan. 1970. Web. 17 Mar. 2017.
himanshub1007 / Alzhimers Disease Prediction Using Deep Learning# AD-Prediction Convolutional Neural Networks for Alzheimer's Disease Prediction Using Brain MRI Image ## Abstract Alzheimers disease (AD) is characterized by severe memory loss and cognitive impairment. It associates with significant brain structure changes, which can be measured by magnetic resonance imaging (MRI) scan. The observable preclinical structure changes provides an opportunity for AD early detection using image classification tools, like convolutional neural network (CNN). However, currently most AD related studies were limited by sample size. Finding an efficient way to train image classifier on limited data is critical. In our project, we explored different transfer-learning methods based on CNN for AD prediction brain structure MRI image. We find that both pretrained 2D AlexNet with 2D-representation method and simple neural network with pretrained 3D autoencoder improved the prediction performance comparing to a deep CNN trained from scratch. The pretrained 2D AlexNet performed even better (**86%**) than the 3D CNN with autoencoder (**77%**). ## Method #### 1. Data In this project, we used public brain MRI data from **Alzheimers Disease Neuroimaging Initiative (ADNI)** Study. ADNI is an ongoing, multicenter cohort study, started from 2004. It focuses on understanding the diagnostic and predictive value of Alzheimers disease specific biomarkers. The ADNI study has three phases: ADNI1, ADNI-GO, and ADNI2. Both ADNI1 and ADNI2 recruited new AD patients and normal control as research participants. Our data included a total of 686 structure MRI scans from both ADNI1 and ADNI2 phases, with 310 AD cases and 376 normal controls. We randomly derived the total sample into training dataset (n = 519), validation dataset (n = 100), and testing dataset (n = 67). #### 2. Image preprocessing Image preprocessing were conducted using Statistical Parametric Mapping (SPM) software, version 12. The original MRI scans were first skull-stripped and segmented using segmentation algorithm based on 6-tissue probability mapping and then normalized to the International Consortium for Brain Mapping template of European brains using affine registration. Other configuration includes: bias, noise, and global intensity normalization. The standard preprocessing process output 3D image files with an uniform size of 121x145x121. Skull-stripping and normalization ensured the comparability between images by transforming the original brain image into a standard image space, so that same brain substructures can be aligned at same image coordinates for different participants. Diluted or enhanced intensity was used to compensate the structure changes. the In our project, we used both whole brain (including both grey matter and white matter) and grey matter only. #### 3. AlexNet and Transfer Learning Convolutional Neural Networks (CNN) are very similar to ordinary Neural Networks. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers are either convolutional, pooling or fully connected. ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. #### 3.1. AlexNet The net contains eight layers with weights; the first five are convolutional and the remaining three are fully connected. The overall architecture is shown in Figure 1. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. AlexNet maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution. The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (as shown in Figure1). The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully connected layers are connected to all neurons in the previous layer. Response-normalization layers follow the first and second convolutional layers. Max-pooling layers follow both response-normalization layers as well as the fifth convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.  The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3x3x256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3x3x192 , and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each. #### 3.2. Transfer Learning Training an entire Convolutional Network from scratch (with random initialization) is impractical[14] because it is relatively rare to have a dataset of sufficient size. An alternative is to pretrain a Conv-Net on a very large dataset (e.g. ImageNet), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. Typically, there are three major transfer learning scenarios: **ConvNet as fixed feature extractor:** We can take a ConvNet pretrained on ImageNet, and remove the last fully-connected layer, then treat the rest structure as a fixed feature extractor for the target dataset. In AlexNet, this would be a 4096-D vector. Usually, we call these features as CNN codes. Once we get these features, we can train a linear classifier (e.g. linear SVM or Softmax classifier) for our target dataset. **Fine-tuning the ConvNet:** Another idea is not only replace the last fully-connected layer in the classifier, but to also fine-tune the parameters of the pretrained network. Due to overfitting concerns, we can only fine-tune some higher-level part of the network. This suggestion is motivated by the observation that earlier features in a ConvNet contains more generic features (e.g. edge detectors or color blob detectors) that can be useful for many kind of tasks. But the later layer of the network becomes progressively more specific to the details of the classes contained in the original dataset. **Pretrained models:** The released pretrained model is usually the final ConvNet checkpoint. So it is common to see people use the network for fine-tuning. #### 4. 3D Autoencoder and Convolutional Neural Network We take a two-stage approach where we first train a 3D sparse autoencoder to learn filters for convolution operations, and then build a convolutional neural network whose first layer uses the filters learned with the autoencoder.  #### 4.1. Sparse Autoencoder An autoencoder is a 3-layer neural network that is used to extract features from an input such as an image. Sparse representations can provide a simple interpretation of the input data in terms of a small number of \parts by extracting the structure hidden in the data. The autoencoder has an input layer, a hidden layer and an output layer, and the input and output layers have same number of units, while the hidden layer contains more units for a sparse and overcomplete representation. The encoder function maps input x to representation h, and the decoder function maps the representation h to the output x. In our problem, we extract 3D patches from scans as the input to the network. The decoder function aims to reconstruct the input form the hidden representation h. #### 4.2. 3D Convolutional Neural Network Training the 3D convolutional neural network(CNN) is the second stage. The CNN we use in this project has one convolutional layer, one pooling layer, two linear layers, and finally a log softmax layer. After training the sparse autoencoder, we take the weights and biases of the encoder from trained model, and use them a 3D filter of a 3D convolutional layer of the 1-layer convolutional neural network. Figure 2 shows the architecture of the network. #### 5. Tools In this project, we used Nibabel for MRI image processing and PyTorch Neural Networks implementation.
attaoveisi / AFISMCa new observer-based adaptive fuzzy integral sliding mode controller (AFISMC) is proposed based on the Lyapunov stability theorem. The plant under study is subjected to a square-integrable disturbance and is assumed to have mismatch uncertainties both in state- and input-matrices. In addition, a norm-bounded time varying term is introduced to address the possible existence of un-modelled/nonlinear dynamics. Based on the classical sliding mode controller (SMC), the equivalent control effort is obtained to satisfy the sufficient requirement of SMC and then the control law is modified to guarantee the reachability of the system trajectory to the sliding manifold. The sliding surface is compensated based on the observed states in the form of linear matrix inequality (LMI). In order to relax the norm-bounded constrains on the control law and solve the chattering problem of SMC, a fuzzy logic (FL) inference mechanism is combined with the controller. An adaptive law is then introduced to tune the parameters of the fuzzy system on-line. Finally, by aiming at evaluating the validity of the controller and the robust performance of the closed-loop system, the proposed regulator is implemented on a real-time mechanical vibrating system.
OUCyf / MCMTpyMCMTpy is a Python package designed for seismic source study. It provides functionality for focal mechanism inversion and source parameters analysis.
tobyli / Sugilite DevelopmentSUGILITE is a new programming-by-demonstration (PBD) system that enables users to create automation on smartphones. SUGILITE uses Android’s accessibility API to support automating arbitrary tasks in any Android app (or even across multiple apps). When the user gives verbal commands that SUGILITE does not know how to execute, the user can demonstrate by directly manipulating the regular apps’ user interface. By leveraging the verbal instructions, the demonstrated procedures, and the apps’ UI hierarchy structures, SUGILITE can automatically generalize the script from the recorded actions, so SUGILITE learns how to perform tasks with different variations and parameters from a single demonstration. Extensive error handling and context checking support forking the script when new situations are encountered, and provide robustness if the apps change their user interface. Our lab study suggests that users with little or no programming knowledge can successfully automate smartphone tasks using SUGILITE.
ShelvanLee / XFEM# XFEM_Fracture2D ### Description This is a Matlab program that can be used to solve fracture problems involving arbitrary multiple crack propagations in a 2D linear-elastic solid based on the principle of minimum potential energy. The extended finite element method is used to discretise the solid continuum considering cracks as discontinuities in the displacement field. To this end, a strong discontinuity enrichment and a square-root singular crack tip enrichment are used to describe each crack. Several crack growth criteria are available to determine the evolution of cracks over time; apart from the classic maximum tension (or hoop-stress) criterion, the minimum total energy criterion and the local symmetry criterion are implemented implicitly with respect to the discrete time-stepping. ### Key features * *Fast:* The stiffness matrix and the force vector (i.e. the equations' system) and the enrichment tracking data structures are updated at each time step only with respect to the changes in the fracture topology. This ultimately results in the major part of the computational expense in the solution to the linear system of equations rather than in the post-processing of the solution or in the assembly and updating of the equations. As Matlab offers fast and robust direct solvers, the computational times are reasonably fast. * *Robust.* Suitable for multiple crack propagations with intersections. Furthermore, the stress intensity factors are computed robustly via the interaction integral approach (with the inclusion of the terms to account for crack surface pressure, residual stresses or strains). The minimum total energy criterion and the principle of local symmetry are implemented implicitly in time. The energy release rates are computed based on the stiffness derivative approach using algebraic differentiation (rather than finite differencing of the potential energy). On the other hand, the crack growth direction based on the local symmetry criterion is determined such that the local mode-II stress intensity factor vanishes; the change in a crack tip kink angle is approximated using the ratio of the crack tip stress intensity factors. * *Easy to run.* Each job has its own input files which are independent form those of all other jobs. The code especially lends itself to running parametric studies. Various results can be saved relating to the fracture geometry, fracture mechanics parameters, and the elastic fields in the solid domain. Extensive visualisation library is available for plotting results. ### Instructions 1. Get started by running the demo to showcase some of the capabilities of the program and to determine if it can be useful for you. At the Matlab's command line enter: ```Matlab >> RUN_JOBS.m ``` This will execute a series of jobs located inside the *jobs directory* `./JOBS_LIBRARY/`. These jobs do not take very long to execute (around 5 minutes in total). 2. Subsequently, you can pick one of the jobs inside `./JOBS_LIBRARY/` by defining the job title: ```Matlab >> job_title = 'several_cracks/edge/vertical_tension' ``` 3. Then you can open all the relevant scripts for this job as follows: ```Matlab >> open_job ``` The following input scripts for the *job* will be open in the Matlab's editor: 1. `JOB_MAIN.m`: This is the job's main script. It is called when executing `RUN_JOB` (or `RUN_JOBS`) and acts like a wrapper. Notably, it can serve as a convenient interface to run parametric studies and to save intermediate simulation results. 2. `Input_Scope.m`: This defines the scope of the simulation. From which crack growth criteria to use, to what to compute and what results to show via plots and/or movies. To put it simply, the script is a bunch of "switches" that tell the program what the user wants to be done. 3. `Input_Material.m`: Defines the material's elastic properties in different regions or layers (called "phases") of the computational domain. Moreover, it defines the fracture toughness of the material (assumed to be constant in all material phases). 4. `Input_Crack.m`: Defines the initial crack geometry. 5. `Input_BC.m`: Defines boundary conditions, such as displacements, tractions, crack surface pressure (assumed to be constant in all cracks), body loads (e.g. gravity, pre-stress or pre-strain). 6. `Mesh_make.m`: In-house structured mesh generator for rectangular domains using either linear triangle or bilinear quadrilateral elements. It is possible to mesh horizontal layers using different mesh sizes. 7. `Mesh_read.m`: Gmsh based mesh reader for version-1 mesh files. Of course you can use your own mesh reader provided the output variables are of the correct format (see later). 8. `Mesh_file.m`: Specifies the mesh input file (.msh). At the moment, only Gmsh mesh files of version-1 are allowed. ### Mesh_file.m A mesh file needs to be able to output the following data or variables: * `mNdCrd`: Node coordinates, size = `[nNdStd, 2]` * `mLNodS`: Element connectivities, size = `[nElemn,nLNodS]` * `vElPhz`: Element material phase (or region) ID's, size = `[nElemn,1]` * `cBCNod`: cell of boundary nodes, cell size = `{nBound,1}`, cell element size = `[nBnNod,2]` Example mesh files are located in `./JOBS_LIBRARY/`. Gmsh version-1 file format is described [here](http://www.manpagez.com/info/gmsh/gmsh-2.4.0/gmsh_60.php). ### Additional notes * global variables are defined in `.\Routines_AuxInput\Declare_Global.m` * External libraries are `.\Other_Libs\distmesh` and `.\Other_Libs\mesh2d` ### References Two external meshing libraries are used for the local mesh refinement and remeshing at the crack tip during crack propagation or prior to a crack intersection with another crack or with a boundary of the domain. Specifically, these libraries, which are located in `.\Other_Libs\`, are the following: * [*mesh2d*](https://people.sc.fsu.edu/~jburkardt/m_src/mesh2d/mesh2d.html) by Darren Engwirda * [*distmesh*](http://persson.berkeley.edu/distmesh/) by Per-Olof Persson and Gilbert Strang. ### Issues and Support For support or questions please email [sutula.danas@gmail.com](mailto:sutula.danas@gmail.com). ### Authors Danas Sutula, University of Luxembourg, Luxembourg. If you find this code useful, we kindly ask that you consider citing us. * [Minimum energy multiple crack propagation](http://hdl.handle.net/10993/29414)
OSU-MLB / ViT PEFT Vision[CVPR'25 (Highlight)] Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
ngriere / High Frequency Data Order Book AnalyserIf you are professionals, retailers or even organisms trading on a financial market, you know that data given on online platforms is not precise enough. Banks and huge financial institutions use powerful computers to transact a large number of orders at very fast speed. Thus, thousands of buying and selling orders are launched within a fraction of a second. So why can’t you see it? High Frequency Data analyser, is an innovative program helping you to analyze high frequency data in order to increase your understanding of the market. High Frequency Data takes trading to the next level, transforming your order book into a interactive visual map. You can now observe visual trading patterns and create new trading strategies. You can chose the parameters you are interested in. ( Volume/askBid). You can observe executed orders and analyse the impact of these executions on the market. You can pause and save the graph whenever you want in order to study more deeply a situation. It plays real time information and offers you many indicators to get an accurate vision of the market.(market indicator) You can visualize the available shares for each value with a very deep order book. (avancer dans le order book). You can also export your data into Excel and draw your most significant graphs. For a deeper and precise order book analysis, download High Frequency Data analyser now!
sayantann11 / Clustering Modelsfor MLlustering in Machine Learning Introduction to Clustering It is basically a type of unsupervised learning method . An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labelled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a set of examples. Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. For ex– The data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters, and we can identify that there are 3 clusters in the below picture. It is not necessary for clusters to be a spherical. Such as : DBSCAN: Density-based Spatial Clustering of Applications with Noise These data points are clustered by using the basic concept that the data point lies within the given constraint from the cluster centre. Various distance methods and techniques are used for calculation of the outliers. Why Clustering ? Clustering is very much important as it determines the intrinsic grouping among the unlabeled data present. There are no criteria for a good clustering. It depends on the user, what is the criteria they may use which satisfy their need. For instance, we could be interested in finding representatives for homogeneous groups (data reduction), in finding “natural clusters” and describe their unknown properties (“natural” data types), in finding useful and suitable groupings (“useful” data classes) or in finding unusual data objects (outlier detection). This algorithm must make some assumptions which constitute the similarity of points and each assumption make different and equally valid clusters. Clustering Methods : Density-Based Methods : These methods consider the clusters as the dense region having some similarity and different from the lower dense region of the space. These methods have good accuracy and ability to merge two clusters.Example DBSCAN (Density-Based Spatial Clustering of Applications with Noise) , OPTICS (Ordering Points to Identify Clustering Structure) etc. Hierarchical Based Methods : The clusters formed in this method forms a tree-type structure based on the hierarchy. New clusters are formed using the previously formed one. It is divided into two category Agglomerative (bottom up approach) Divisive (top down approach) examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies) etc. Partitioning Methods : These methods partition the objects into k clusters and each partition forms one cluster. This method is used to optimize an objective criterion similarity function such as when the distance is a major parameter example K-means, CLARANS (Clustering Large Applications based upon Randomized Search) etc. Grid-based Methods : In this method the data space is formulated into a finite number of cells that form a grid-like structure. All the clustering operation done on these grids are fast and independent of the number of data objects example STING (Statistical Information Grid), wave cluster, CLIQUE (CLustering In Quest) etc. Clustering Algorithms : K-means clustering algorithm – It is the simplest unsupervised learning algorithm that solves clustering problem.K-means algorithm partition n observations into k clusters where each observation belongs to the cluster with the nearest mean serving as a prototype of the cluster . Applications of Clustering in different fields Marketing : It can be used to characterize & discover customer segments for marketing purposes. Biology : It can be used for classification among different species of plants and animals. Libraries : It is used in clustering different books on the basis of topics and information. Insurance : It is used to acknowledge the customers, their policies and identifying the frauds. City Planning: It is used to make groups of houses and to study their values based on their geographical locations and other factors present. Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous zones. References : Wiki Hierarchical clustering Ijarcs matteucc analyticsvidhya knowm
dipankarsk / Feature Selection HybridIntrusion Detection is a technique to identify the abnormal behavior of system due to attack. The unusual behavior of the environment is then identified and steps are taken and methods are formed to classify and recognize attacks. Data set containing a number of records sometimes may decrease the classifiers performance due to redundancy of data. The other problems may include memory requirements and processing power so we need to either reduce the number of data or the number of records. Feature Selection techniques are used to reduce the vertical largeness of data set. This project makes a comparative study of Particle Swarm Optimization, Genetic Algorithm and a hybrid of the two where we see that PSO being simpler swarm algorithm works for feature selection problems but since it is problem dependent and more over its stochastic approach makes it less efficient in terms of error reduction compared to GA. In standard PSO, the non-oscillatory route can quickly cause a particle to stagnate and also it may prematurely converge on sub optimal solutions that are not even guaranteed to be local optimum. A further drawback is that stochastic approaches have problem-dependent performance. This dependency usually results from the parameter settings in each algorithm. The different parameter settings for a stochastic search algorithm result in high performance variances. In this project the modification strategies are proposed in PSO using GA. Experimental results show that GA performs better than PSO for the feature selection in terms of error reduction problems whereas hybrid outperforms both the model in terms of error reduction.
shishirdas / Rain Fall Data Analysis Using Data ScienceContext Rainfall is very crucial things for any types of agricultural task. Climate related data is important to analyse agricultural and crop seeding related field, where those data can be used to show the predict the rainfall in different season also for different types of crops. Developed application can be found from http://ml.bigalogy.com/ Paper: http://dspace.uiu.ac.bd/handle/52243/178 Abstract Mankind have been attempting to predict the weather from prehistory. For good reason for knowing when to plant crops, when to build and when to prepare for drought and flood. In a nation such as Bangladesh being able to predict the weather, especially rainfall has never been so vitally important. The proposed research work pursues to produce prediction model on rainfall using the machine learning algorithms. The base data for this work has been collected from Bangladesh Meteorological Department. It is mainly focused on the development of models for long term rainfall prediction of Bangladesh divisions and districts (Weather Stations). Rainfall prediction is very important for the Bangladesh economy and day to day life. Scarcity or heavy - both rainfall effects rural and urban life to a great extent with the changing pattern of the climate. Unusual rainfall and long lasting rainy season is a great factor to take account into. We want to see whether too much unusual behavior is taking place another pattern resulting new clamatorial description. As agriculture is dependent on rain and heavy rainfall caused flood frequently leading to great loss to crops, rainfall is a very complex phenomenon which is dependent on various atmospheric, oceanic and geographical parameters. The relationship between these parameters and rainfall is unstable. Beside this changing behavior of clamatorial facts making the existing meteorological forecasting less usable to the users. Initially linear regression models were developed for monthly rainfall prediction of station and national level as per day month year. Here humidity, temperatures & wind parameters are used as predictors. The study is further extended by developing another popular regression analysis algorithm named Random Forest Regression. After then, few other classification algorithms have been used for model building, training and prediction. Those are Naive Bayes Classification, Decision Tree Classification (Entropy and Gini) and Random Forest Classification. In all model building and training predictor parameters were Station, Year, Month and Day. As the effect of rainfall affecting parameters is embedded in rainfall, rainfall was the label or dependent variable in these models. The developed and trained model is capable of predicting rainfall in advance for a month of a given year for a given area (for area we used here are the stations (weather parameters values are measured by Bangladesh Meteorological Department). The accuracy of rainfall estimation is above 65%. Accuracy percentage varies from algorithm to algorithm. Two regression analysis and three classification analysis models has been developed for rainfall prediction of 33 Bangladeshi weather station. Apache Spark library has been used for machine library in Scala programming language. The main idea behind the use of classification and regression analysis is to see the comparative difference between types of algorithms prediction output and the predictability along with usability. This thesis is a contribution to the effort of rainfall prediction within Bangladesh. It takes the strategy of applying machine learning models to historical weather data gathered in Bangladesh. As part of this work, a web-based software application was written using Apache Spark, Scala and HighCharts to demonstrate rainfall prediction using multiple machine learning models. Models are successively improved with the rainfall prediction accuracy. Content The given data has weather station and year wise monthly rainfall data of Bangladesh. Data is two format - 46 year (33 Weather Station) : From 1970 to 2016 Daily Rainfall Data Monthly Rainfall Data Columns: Station (Weather Station, along with Station Index) Year Month Day [For daily data file]
alenai97 / PEFT MLLMOfficial Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
ita-social-projects / TeachUAThe project aims to promote the Ukrainian language for study clubs in Russian-speaking regions of Ukraine. This is a web application that contains a database of clubs with the Ukrainian language of instruction. Clubs have the opportunity to register on the site and provide information about themselves. Users can search for clubs by various parameters (activity type, location, etc.)
adhikari-statgen-lab / Gwas PowerR Functions to calculate power of GWAS studies for a single associated SNP, under various parameters. Suitable for classical (i.e. single-SNP single-trait) GWAS studies using linear regression models, i.e for quantitative traits.
MichaelTJC96 / Label Flipping AttackThe project aims to evaluate the vulnerability of Federated Learning systems to targeted data poisoning attack known as Label Flipping Attack. The project studies the scenario that a malicious participant can only manipulate the raw training data on their device. Hence, non-expert malicious participants can achieve poisoning without knowing the model type, the parameters, and the Federated Learning process. In addition, the project also analyses the possibility and effectiveness of concealing the tracks while poisoning the raw data of other devices.
abdallahkhairy / GP Data Analysis And MLHuman locomotion affects our daily living activities. Losing limbs or having neurological disorders with motor deficits could affect the quality of life. Gait analysis is a systematic study of human locomotion, which is defined as body movements through aerial, aquatic, or terrestrial space. This analysis has been used to study people ambulation, registration, and reconstruction of physical location and orientation of individual limbs used to quantify and characterize human locomotion using different gait parameters including gait activities such as walking, stairs ascending/descending, … etc., phases, and spatiotemporal parameters of human gait. Additionally, gait analysis parameters can be used to evaluate the functionality of patients and wearable system users. The evaluation is based on patient's stability, energy consumption, gait symmetry, ability to recover from perturbations, and ability to perform activities of daily living. Many companies develop assistive, wearable, and rehabilitation devices for patients with lower limb neurological disorders. These devices are tested and evaluated inside controlled lab environments. However, they don’t have enough data on the patient's performance in real world and harsh environments. Collecting large datasets of device users and their gait performance data in real environment are notoriously difficult. Additionally, collecting data on less prevalent or on gait activities other than level walking, stair ascending/descending, sitting, standing, …etc. on hard surfaces is rarely attempted. However, the scope for collecting gait data from alternative sources other than traditional gait labs could be attained with the help of IoT data collection embedded on the wearable and assistive devices and well-established cloud platforms equipped with big-data analytics and data visualization capabilities. This project aims to develop a cloud platform capable of collect data from wearable and assistive devices such as prostheses, exoskeleton, gait analysis wearable sensors, …etc. using IoT technologies. This platform is capable of automatically use data mining and visualization tools. Additionally, it uses statistical and machine learning techniques to estimate gait events, gait symmetry, gait speed, gait activities, stability, energy consumption, …etc. Also, it is capable of predicting patient's progress over time. The project will be composed of two major components, hardware component and software component. In hardware component, the students will design and implement the IoT that collects the different readings for gait analysis and send them to the cloud. Meanwhile, in the software component, the students will design and implement a set of algorithms to visualize the collected data, then design and implement data analytics to automatically analyze the collected data, so that we can estimate gait events, gait symmetry, gait speed, classify gait activities, stability, energy consumption, …etc. and predicting patient's progress over time. By analyzing the collected data, the patient's progress can be predicted over time. Additionally, these data can be used through manufacturers of prostheses legs to improve their products, as well as through health-care centers to assess the patient's performance. The following figures describe the main modules of our graduation project.
kdzimm / PseudoreplicationPaperCode used to carry out parameter estimation, correlation estimation, type 1 error analysis, and power analysis for our "Pseudoreplication in Single-Cell" study