19 skills found
microsoft / DELTDELT: Data Efficacy for Language Model Training
paras2612 / CauseBoxCausal inference is a critical task in various fields such as healthcare,economics, marketing and education. Recently, there have beensignificant advances through the application of machine learningtechniques, especially deep neural networks. Unfortunately, to-datemany of the proposed methods are evaluated on different (data,software/hardware, hyperparameter) setups and consequently it isnearly impossible to compare the efficacy of the available methodsor reproduce results presented in original research manuscripts.In this paper, we propose a causal inference toolbox (CauseBox)that addresses the aforementioned problems. At the time of thewriting, the toolbox includes seven state of the art causal inferencemethods and two benchmark datasets. By providing convenientcommand-line and GUI-based interfaces, theCauseBoxtoolboxhelps researchers fairly compare the state of the art methods intheir chosen application context against benchmark datasets.
Kasyfil97 / Fraud Transaction Detection By Balancing DistributionImagine standing at the check-out counter at the grocery store with a long line behind you and the cashier not-so-quietly announces that your card has been declined. In this moment, you probably aren’t thinking about the data science that determined your fate. Embarrassed, and certain you have the funds to cover everything needed for an epic nacho party for 50 of your closest friends, you try your card again. Same result. As you step aside and allow the cashier to tend to the next customer, you receive a text message from your bank. “Press 1 if you really tried to spend $500 on cheddar cheese.” While perhaps cumbersome (and often embarrassing) in the moment, this fraud prevention system is actually saving consumers millions of dollars per year. Researchers from the IEEE Computational Intelligence Society (IEEE-CIS) want to improve this figure, while also improving the customer experience. With higher accuracy fraud detection, you can get on with your chips without the hassle. IEEE-CIS works across a variety of AI and machine learning areas, including deep neural networks, fuzzy systems, evolutionary computation, and swarm intelligence. Today they’re partnering with the world’s leading payment service company, Vesta Corporation, seeking the best solutions for fraud prevention industry, and now you are invited to join the challenge. In this competition, you’ll benchmark machine learning models on a challenging large-scale dataset. The data comes from Vesta's real-world e-commerce transactions and contains a wide range of features from device type to product features. You also have the opportunity to create new features to improve your results. If successful, you’ll improve the efficacy of fraudulent transaction alerts for millions of people around the world, helping hundreds of thousands of businesses reduce their fraud loss and increase their revenue. And of course, you will save party people just like you the hassle of false positives. Acknowledgements: Vesta Corporation provided the dataset for this competition. Vesta Corporation is the forerunner in guaranteed e-commerce payment solutions. Founded in 1995, Vesta pioneered the process of fully guaranteed card-not-present (CNP) payment transactions for the telecommunications industry. Since then, Vesta has firmly expanded data science and machine learning capabilities across the globe and solidified its position as the leader in guaranteed ecommerce payments. Today, Vesta guarantees more than $18B in transactions annually.
lemonadeaumiel / Hybrid IDS CICIDS2018The proposed hybrid IDS is tested on two public network datasets, the CSE-CIC-IDS2018 and the TON IoT datasets, representing internal and external network traffic data. Various measures, such as accuracy, detection rates, false alarm rates, F1 scores, and model execution time, are used to assess the model's feasibility, efficacy, and efficiency. With the model learning and optimization operations, a complete and resilient IDS with both known and unknown attack detection capabilities may be produced.
VirtualPharmacist / VpVirtual Pharmacist is a web tool that interprets personal genome for the impact of genetic variation on drug response. It can take variants data (VCF format), microarray SNP genotyping data and high-throughput sequencing data as input, and report to the users how the variants in their personal genomes impact their drug response, including drug efficacy, dosage and toxicity.
HongzhiQ / SupervisedVsLLM EfficacyEvalThis is the data and code for the paper: Evaluating the Efficacy of Supervised Learning vs. Large Language Models for Identifying Cognitive Distortions and Suicidal Risks in Chinese Social Media.
datamade / Open Ee MeterData analysis & visualization of energy savings projects, to ultimately empower utilities and contractors to improve the efficacy of energy savings programs.
drmuskangarg / CAMSThis repository is created to support the paper 'CAMS: An Annotated Corpus for Causal Analysis of Mental health on Social media' which is submitted to Language Resources and Evaluation Conference 2022 we introduce a new dataset for Causal Analysis of Mental health illness in Social media posts (CAMS). We first introduce the annotation schema for this task of causal analysis. The causal analysis comprises two types of annotations, viz, causal interpretation and causal categorization. We show the efficacy of our scheme in two ways: (i) crawling and annotating 3155 Reddit data and (ii) re-annotate the publicly available SDCNL dataset of 1896 instances for interpretable causal analysis. We further combine them as CAMS dataset.
facebookresearch / Aqa StudyThis repo contains the data (question/answer pairs and their associated passages from Wikipedia) collected and used in the following paperDivyansh Kaushik, Douwe Kiela, Zachary C. Lipton, Wen-tau Yih. "On the Efficacy of Adversarial Data Collection for Question Answering Results from a Large-Scale Randomized Study." In ACL-2021.
JiaQiSJTU / IterITAn Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection
RuixinZhangMarty / Coursera UIUC Applying Data Analytics In FinanceCourse Description In this course, we will introduce a number of financial analytic techniques. You will learn why, when, and how to apply financial analytics in real-world situations. We will explore techniques to analyze time series data and how to evaluate the risk-reward trade off expounded in modern portfolio theory. While most of the focus will be on the prices, returns, and risks of corporate stocks, the analytical techniques can be leveraged in other domains. Finally, a short introduction to algorithmic trading concludes the course. After completing this course, you should be able to understand time series data, create forecasts, and determine the efficacy of the estimates. Also, you will be able to create a portfolio of assets using actual stock price data while optimizing risk and reward. Understanding financial data is an important skill as an analyst, manager, or consultant. Course Goals and Objectives Upon completion of this course, you should be able to: Understand the forecasting process. Evaluate a forecast. Describe time series data. Perform moving average analysis. Perform exponential smoothing. Develop a Holt-Winters model. Develop an ARIMA model. Understand how to create a portfolio of assets. Understand a basic trading algorithm.
haataa / IEEE CIS Fraud DetectionBuild machine learning models to improve the efficacy of fraudulent transaction alerts. The data comes from Vesta's real-world e-commerce transactions and contains a wide range of features from device type to product features.
vishwateja49 / Oncology Efficacy Orr Sas Vs R Industry-style oncology ORR efficacy table using ADaM data, implemented in SAS and R (pharmaverse) to demonstrate real-world clinical reporting and SAS→R transition skills.
HarshaVaradhanGopal / Automotive Radar Range And Velocity Estimations Interference AnalysisThis repository analyzes FMCW radar interference in automotive radar systems, focusing on target detection impact under various conditions. It explores data-driven insights and the efficacy of different filters to mitigate interference effects.
UW-GDA / ICESat 2 SnowdepthEvaluating the efficacy of ICESat-2 data to measure snow depth over Tuolumne Meadows, CA.
Daniel-Andarge / AiML Ethiopian Medical Biz DatawarehouseThe Ethiopian Medical Business Data Warehouse & Analytics Platform is a comprehensive data solution tailored to enhance the efficiency and efficacy of Ethiopia's healthcare and medical sectors.
IonutMotoi / CutPasteSatSegRepository for the paper "Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery" - IEEE IGARSS 2024
dkv204p / SPAM HAM Classification Using NLPThis repository features a spam/ham text message classifier using NLP techniques. It employs Naive Bayes with TF-IDF for feature extraction, preprocesses data, trains the model, and evaluates performance. Demonstrates NLP's efficacy in spam detection.
seni1 / Cleaning DataCleaning your data is the third step in data wrangling. It is where you fix the quality and tidiness issues that you identified in the assess step. In this lesson, you'll clean all of the issues you identified in Lesson 3 using Python and pandas. This lesson will be structured as follows: You'll get remotivated (if you aren't already) to clean the dataset for lessons 3 and 4: Phase II clinical trial data that compares the efficacy and safety of a new oral insulin to treat diabetes to injectable insulin You'll learn about the data cleaning process: defining, coding, and testing You'll address the missing data first (and learn why it is usually important to address these completeness issues first) You'll tackle the tidiness issues next (and learn why this is usually the next logical step) And finally, you'll clean up the quality issues This lesson will consist primarily of Jupyter Notebooks, of which there will be two types: one quiz notebook that you'll work with throughout the whole lesson (i.e. your work will carry over from page to page) and three solution notebooks. I'll pop in and out to introduce the larger conceptual bits. You will leverage the most common cleaning functions and methods in the pandas library to clean the nineteen quality issues and four tidiness issues identified in Lesson 3. Given your pandas experience and that this isn't a course on pandas, these functions and methods won't be covered in detail. Regardless, with this experience and your research and documentation skills, you can be confident that leaving this course you'll be able to clean any form of dirty and/or messy data that comes your way in the future.