204 skills found · Page 6 of 7
ruffyleaf / Py CrispCross Industry Standard Process for Data Mining, commonly known by its acronym CRISP-DM, was a data mining process model that describes commonly used approaches that data mining experts use to tackle problems.
awsm-research / TutorialSoftware Analytics in Action: A Hands-on Tutorial on Mining, Analyzing, Modelling, and Explaining Software Data
satyamt13 / Project Amazon Reviews NLP Recommender SystemMining , pre-processing and embedding over 1 million Amazon Movie & T.V. reviews to build a multi class Naive Bayes model and later a CNN-LSTM model (that uses the Naive Bayes model as a baseline) to predict rating from text. Interpreting the original classifier using local surrogate models using the LIME library. Using LDA topic modeling to build a theme based recommender from the reviews and using a model based collaborative filtering system using SVD matrix factorization to build a second recommender system.
thukg / AMinerOpenAn open source community who focuses on developing and publishing elegant algorithms, models and tools for science big data mining and knowledge intelligence with AMiner resources.
Adehghan / UK Road Accident Severity PredictionThis project explores UK road accident data with the goal of predicting accident severity using machine learning. Techniques include clustering (KMeans, DBSCAN), association rule mining (Apriori), and classification models (Random Forest, Decision Tree, Gradient Boosting) enhanced by SMOTE for class imbalance handling.
lhmtriet / Function Level Vulnerability AssessmentReproduction package of the paper "On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models" in Mining Software Repositories (MSR) 2022
haroldeustaquio / Data Mining UNAMThis repository showcases projects from the Data Mining course at UNAM, Mexico. It includes analyses of customer behavior, sales transactions, and a sequence-to-sequence model for text generation based on the Harry Potter series, all developed and presented throughout the semester.
mboukabous / Security Intelligence On Exchanged Multimedia Messages Based On Deep LearningDeep learning (DL) approaches use various processing layers to learn hierarchical representations of data. Recently, many methods and designs of natural language processing (NLP) models have shown significant development, especially in text mining and analysis. For learning vector-space representations of text, there are famous models like Word2vec, GloVe, and fastText. In fact, NLP took a big step forward when BERT and recently GTP-3 came out. Deep Learning algorithms are unable to deal with textual data in their natural language data form which is typically unstructured information; they require special representation of data as inputs instead. Usually, natural language text data needs to be converted into internal representations form that DL algorithms can read such as feature vectors, hence the necessity to use representation learning models. These models have shown a big leap during the last years. Their set ranges from the methods that embed words into distributed representations and use the language modeling objective to adjust them as model parameters (like Word2vec, fastText, and GloVe), to recently transfer learning models (like ELMo, BERT, ULMFiT, XLNet, ALBERT, RoBERTa, and GPT-2). These last use larger corpora, more parameters, more computing resources, and instead of assigning each word with a fixed vector, they use multilayer neural networks to calculate dynamic representations for the words according to their context, which is especially useful for the words with multiple meanings.
anikethsukhtankar / Bitcoin Mining ElixirThe goal of this project was to use Elixir and the actor model to build a good solution to the bitcoin mining problem that runs well on multi-core machines. Our code was simultaneously run on 8 machines with one functioning as server and the other 7 as workers, and we were able to mine a coin with 7 leading zeroes. Technologies used: Erlang, Elixir
BlackThompson / HCI Hot Research Topic Analysis And ModelingHCI Hot Research Topic Analysis and Modeling ——NUS Summer Workshop SWS2023 Web Mining
zhengyangca / Blockchain Python SampleTiny Blockchain model example in Python with Flask REST, which implemented customized Cryptocurrency, Mining, Distributed Ledger, Decentralization & Consensus
fossology / FOSSologyMLMachine learning for a FOSSology server: rigel is for mining of data from conclusions, clearing expert corrections and bulk scans, create a model, use this model for providing a new classifier for licenses.
OrysyaStus / UCSD Data Mining CertificateModern databases can contain massive volumes of data. Within this data lies important information that can only be effectively analyzed using data mining. Data mining tools and techniques can be used to predict future trends and behaviors, allowing individuals and organizations to make proactive, knowledge-driven decisions. This expanded Data Mining for Advanced Analytics certificate provides individuals with the skills necessary to design, build, verify, and test predictive data models. Newly updated with added data sets, a robust practicum course, a survey of popular data mining tools, and additional algorithms, this program equips students with the skills to make data-driven decisions in any industry. Students begin by learning foundational data analysis and machine learning techniques for model and knowledge creation. Then students take a deep-dive into the crucial step of cleaning, filtering, and preparing the data for mining and predictive or descriptive modeling. Building upon the skills learned in the previous courses, students will then learn advanced models, machine learning algorithms, methods, and applications. In the practicum course, students will use real-life data sets from various industries to complete data mining projects, planning and executing all the steps of data preparation, analysis, learning and modeling, and identifying the predictive/descriptive model that produces the best evaluation scores. Electives allow students to learn further high-demand techniques, tools, and languages.
lfoppiano / MatSci LumEnMatSci-LumEn: Materials Science Large Language Models Evaluation for text and data mining
MrRobotsAA / FlowMinerFlowMiner: A Powerful GNN Model Based on Flow Correlation Mining for Encrypted Traffic Classification
shuangyinli / TWTMThe code (demo) is about the paper "Tag-Weighted Topic Model for Mining Semi-Structured Documents"
NamrataThakur / Social Network Link PredictionA graph mining problem where the task was to predict a link between the given nodes. Engineered different features like Jaccard Distance, Cosine-Similarity, Shortest Path, Page Rank, Adar Index, HITS score and Kartz Centrality. Finally built non-linear models to get the final F1 score as 0.92.
XuJin1992 / The Research And Implementation Of Data Mining For Geological DataData mining and knowledge discovery, refers to discover knowledge from huge amounts of data, has a broad application prospect.When faced with geological data, however, even the relatively mature existing models, there are defects performance and effect.Investigate its reason, mainly because of the inherent characteristics of geological data, high dimension, unstructured, more relevance, etc., in the data model, indexing structure knowledge representation, storage, mining, etc., is far more complicated than the traditional data. The geological data of the usual have raster, vector and so on, this paper pays attention to raster data processing.Tobler theorem tells us: geography everything associated with other things, but closer than far stronger correlation.Spatial correlation characteristics of geological data, the author of this paper, by establishing a spatial index R tree with spatial pattern mining algorithms as the guiding ideology, through the raster scanning method materialized space object space between adjacent relationship, transaction concept, thus the space with a pattern mining into the traditional association rules mining, and then take advantage of commonly used association rules to deal with some kind of geological data, to find association rules of interest. Using the simulation program to generate the geological data of the experiment, in the process of experiment, found a way to use R tree indexing can significantly speed up the generating spatial transaction set, at the same time, choose the more classic Apriori algorithm and FP - growth algorithm contrast performance, results show that the FP - growth algorithm is much faster than the Apriori algorithm, analyses the main reasons why the Apriori algorithm to generate a large number of candidate itemsets.In this paper, the main work is as follows: (1) In order to speed up the neighborhood search, choose to establish R tree spatial index, on the basis of summarizing the common scenarios to apply spatial indexing technology and the advantages and disadvantages. (2) Based on the analysis of traditional association rule mining algorithm and spatial association rule mining algorithm on the basis of the model based on event center space with pattern mining algorithm was described, and puts forward with a rule mining algorithm based on raster scanning, the algorithm by scanning for the center with a grid of R - neighborhood affairs set grid, will study data mining into the traditional data mining algorithm. (3) In the process of spatial index R tree insert, in order to prevent insertion to split after the leaf node, leading to a recursive has been split up destroy the one-way traverse, is put forward in the process of looking for insert position that records if full node number is M (M number) for each node up to insert nodes, first to divide to avoid after layers of recursive splitting up, speed up the R tree insertion efficiency. (4) On the basis of spatial transaction set preprocessing, realize the Apriori algorithm and FP-growth algorithm two kinds of classic association rule mining algorithm, performance contrast analysis.
Aryia-Behroziuan / Robot LearningIn developmental robotics, robot learning algorithms generate their own sequences of learning experiences, also known as a curriculum, to cumulatively acquire new skills through self-guided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturation, motor synergies and imitation. Association rules Main article: Association rule learning See also: Inductive logic programming Association rule learning is a rule-based machine learning method for discovering relationships between variables in large databases. It is intended to identify strong rules discovered in databases using some measure of "interestingness".[60] Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves "rules" to store, manipulate or apply knowledge. The defining characteristic of a rule-based machine learning algorithm is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learning algorithms that commonly identify a singular model that can be universally applied to any instance in order to make a prediction.[61] Rule-based machine learning approaches include learning classifier systems, association rule learning, and artificial immune systems. Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets.[62] For example, the rule {\displaystyle \{\mathrm {onions,potatoes} \}\Rightarrow \{\mathrm {burger} \}}\{{\mathrm {onions,potatoes}}\}\Rightarrow \{{\mathrm {burger}}\} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as promotional pricing or product placements. In addition to market basket analysis, association rules are employed today in application areas including Web usage mining, intrusion detection, continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component, typically a genetic algorithm, with a learning component, performing either supervised learning, reinforcement learning, or unsupervised learning. They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.[63] Inductive logic programming (ILP) is an approach to rule-learning using logic programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Inductive programming is a related field that considers any kind of programming language for representing hypotheses (and not only logic programming), such as functional programs. Inductive logic programming is particularly useful in bioinformatics and natural language processing. Gordon Plotkin and Ehud Shapiro laid the initial theoretical foundation for inductive machine learning in a logical setting.[64][65][66] Shapiro built their first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples.[67] The term inductive here refers to philosophical induction, suggesting a theory to explain observed facts, rather than mathematical induction, proving a property for all members of a well-ordered set. Models Performing machine learning involves creating a model, which is trained on some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems. Artificial neural networks Main article: Artificial neural network See also: Deep learning An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another. Artificial neural networks (ANNs), or connectionist systems, are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68]
chankwpj / CUROP2016Automatic Analysis of Music Performance Style One fundamental problem in computational music is analysis and modeling of performance style. Last year’s successful CUROP project revealed, through perceptual experiments, that players' control over rhythm is the strongest factor in the perceived quality of performance (already a publishable result). This year's project will hence investigate the computer analysis of the rhythmic component of performances in more detail, with the following aims: Implement and improve upon state-of-the-art beat detection methods. Carry out statistical analysis of rhythmic variation on a corpus of performances: Train a classifier into professional/amateur performance. Investigate to what extent rhythmic variations are controlled as opposed to random. Devise rhythmic style signatures of various performers for style recognition and retrieval. Investigate operations on rhythmic styles, e.g. apply Rachmaninoff's style to one's amateur recording. Solving the above problems is paramount to our understanding of what makes a good performance and what, quantitatively, are the differences between professional musician's styles. Applications include: musicology, teaching, automatic performance of music, high-level editing of music. This project requires integration of data mining, machine learning, and digital signal processing techniques, which are closely aligned with the expertise of the two supervisors: Dr Kirill Sidorov and Dr Andrew Jones. who are also experienced musicians. Via this project, the student will learn a variety of digital signal processing and machine learning techniques and will develop enhanced MATLAB programming skills, that are increasingly in demand for graduates. The student will work in our lab, with state-of-the-art facilities (powerful audio workstation, digital piano, audio gear). We will work collaboratively to ensure successful completion, including daily 30 minute meetings and longer weekly review meetings. The student will participate in the recently established Computational Music research sub-group. This project will contribute to longer-term development of this sub-group and foster new research avenues. Project Start/End Dates: Any 8 week period from 13th June 2016 to September 19th 2016. Contact/Supervisors: Kirill Sidorov Andrew Jones