32 skills found · Page 1 of 2
NAalytics / Assemblies Of Putative SARS CoV2 Spike Encoding MRNA Sequences For Vaccines BNT 162b2 And MRNA 1273RNA vaccines have become a key tool in moving forward through the challenges raised both in the current pandemic and in numerous other public health and medical challenges. With the rollout of vaccines for COVID-19, these synthetic mRNAs have become broadly distributed RNA species in numerous human populations. Despite their ubiquity, sequences are not always available for such RNAs. Standard methods facilitate such sequencing. In this note, we provide experimental sequence information for the RNA components of the initial Moderna (https://pubmed.ncbi.nlm.nih.gov/32756549/) and Pfizer/BioNTech (https://pubmed.ncbi.nlm.nih.gov/33301246/) COVID-19 vaccines, allowing a working assembly of the former and a confirmation of previously reported sequence information for the latter RNA. Sharing of sequence information for broadly used therapeutics has the benefit of allowing any researchers or clinicians using sequencing approaches to rapidly identify such sequences as therapeutic-derived rather than host or infectious in origin. For this work, RNAs were obtained as discards from the small portions of vaccine doses that remained in vials after immunization; such portions would have been required to be otherwise discarded and were analyzed under FDA authorization for research use. To obtain the small amounts of RNA needed for characterization, vaccine remnants were phenol-chloroform extracted using TRIzol Reagent (Invitrogen), with intactness assessed by Agilent 2100 Bioanalyzer before and after extraction. Although our analysis mainly focused on RNAs obtained as soon as possible following discard, we also analyzed samples which had been refrigerated (~4 ℃) for up to 42 days with and without the addition of EDTA. Interestingly a substantial fraction of the RNA remained intact in these preparations. We note that the formulation of the vaccines includes numerous key chemical components which are quite possibly unstable under these conditions-- so these data certainly do not suggest that the vaccine as a biological agent is stable. But it is of interest that chemical stability of RNA itself is not sufficient to preclude eventual development of vaccines with a much less involved cold-chain storage and transportation. For further analysis, the initial RNAs were fragmented by heating to 94℃, primed with a random hexamer-tailed adaptor, amplified through a template-switch protocol (Takara SMARTerer Stranded RNA-seq kit), and sequenced using a MiSeq instrument (Illumina) with paired end 78-per end sequencing. As a reference material in specific assays, we included RNA of known concentration and sequence (from bacteriophage MS2). From these data, we obtained partial information on strandedness and a set of segments that could be used for assembly. This was particularly useful for the Moderna vaccine, for which the original vaccine RNA sequence was not available at the time our study was carried out. Contigs encoding full-length spikes were assembled from the Moderna and Pfizer datasets. The Pfizer/BioNTech data [Figure 1] verified the reported sequence for that vaccine (https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/), while the Moderna sequence [Figure 2] could not be checked against a published reference. RNA preparations lacking dsRNA are desirable in generating vaccine formulations as these will minimize an otherwise dramatic biological (and nonspecific) response that vertebrates have to double stranded character in RNA (https://www.nature.com/articles/nrd.2017.243). In the sequence data that we analyzed, we found that the vast majority of reads were from the expected sense strand. In addition, the minority of antisense reads appeared different from sense reads in lacking the characteristic extensions expected from the template switching protocol. Examining only the reads with an evident template switch (as an indicator for strand-of-origin), we observed that both vaccines overwhelmingly yielded sense reads (>99.99%). Independent sequencing assays and other experimental measurements are ongoing and will be needed to determine whether this template-switched sense read fraction in the SmarterSeq protocol indeed represents the actual dsRNA content in the original material. This work provides an initial assessment of two RNAs that are now a part of the human ecosystem and that are likely to appear in numerous other high throughput RNA-seq studies in which a fraction of the individuals may have previously been vaccinated. ProtoAcknowledgements: Thanks to our colleagues for help and suggestions (Nimit Jain, Emily Greenwald, Lamia Wahba, William Wang, Amisha Kumar, Sameer Sundrani, David Lipman, Bijoyita Roy). Figure 1: Spike-encoding contig assembled from BioNTech/Pfizer BNT-162b2 vaccine. Although the full coding region is included, the nature of the methodology used for sequencing and assembly is such that the assembled contig could lack some sequence from the ends of the RNA. Within the assembled sequence, this hypothetical sequence shows a perfect match to the corresponding sequence from documents available online derived from manufacturer communications with the World Health Organization [as reported by https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/]. The 5’ end for the assembly matches the start site noted in these documents, while the read-based assembly lacks an interrupted polyA tail (A30(GCATATGACT)A70) that is expected to be present in the mRNA.
AI4HEALTH-LAB-THU / LLM AgingLeveraging Large Language Models to Assess Overall and Organ-specific Aging in Diverse Populations
Aleksobrad / Single Cell Rcc PipelineData files and code for analysis of single-cell ccRCC data for the manuscript "Tumor-Specific Cell Populations in Clear Cell Renal Carcinoma Associated with Clinical Outcome Identified Using Single-Cell Protein Activity Inference." Includes code for VIPER protein activity inference pipeline
reddyprasade / Machine Learning Interview PreparationPrepare to Technical Skills Here are the essential skills that a Machine Learning Engineer needs, as mentioned Read me files. Within each group are topics that you should be familiar with. Study Tip: Copy and paste this list into a document and save to your computer for easy referral. Computer Science Fundamentals and Programming Topics Data structures: Lists, stacks, queues, strings, hash maps, vectors, matrices, classes & objects, trees, graphs, etc. Algorithms: Recursion, searching, sorting, optimization, dynamic programming, etc. Computability and complexity: P vs. NP, NP-complete problems, big-O notation, approximate algorithms, etc. Computer architecture: Memory, cache, bandwidth, threads & processes, deadlocks, etc. Probability and Statistics Topics Basic probability: Conditional probability, Bayes rule, likelihood, independence, etc. Probabilistic models: Bayes Nets, Markov Decision Processes, Hidden Markov Models, etc. Statistical measures: Mean, median, mode, variance, population parameters vs. sample statistics etc. Proximity and error metrics: Cosine similarity, mean-squared error, Manhattan and Euclidean distance, log-loss, etc. Distributions and random sampling: Uniform, normal, binomial, Poisson, etc. Analysis methods: ANOVA, hypothesis testing, factor analysis, etc. Data Modeling and Evaluation Topics Data preprocessing: Munging/wrangling, transforming, aggregating, etc. Pattern recognition: Correlations, clusters, trends, outliers & anomalies, etc. Dimensionality reduction: Eigenvectors, Principal Component Analysis, etc. Prediction: Classification, regression, sequence prediction, etc.; suitable error/accuracy metrics. Evaluation: Training-testing split, sequential vs. randomized cross-validation, etc. Applying Machine Learning Algorithms and Libraries Topics Models: Parametric vs. nonparametric, decision tree, nearest neighbor, neural net, support vector machine, ensemble of multiple models, etc. Learning procedure: Linear regression, gradient descent, genetic algorithms, bagging, boosting, and other model-specific methods; regularization, hyperparameter tuning, etc. Tradeoffs and gotchas: Relative advantages and disadvantages, bias and variance, overfitting and underfitting, vanishing/exploding gradients, missing data, data leakage, etc. Software Engineering and System Design Topics Software interface: Library calls, REST APIs, data collection endpoints, database queries, etc. User interface: Capturing user inputs & application events, displaying results & visualization, etc. Scalability: Map-reduce, distributed processing, etc. Deployment: Cloud hosting, containers & instances, microservices, etc. Move on to the final lesson of this course to find lots of sample practice questions for each topic!
facebookresearch / EgocentricUserAdaptationIn this codebase we establish a benchmark for egocentric user adaptation based on Ego4d.First, we start from a population model which has data from many users to learn user-agnostic representations.As the user gains more experience over its lifetime, we aim to tailor the general model to user-specific expert models.
mohit9949 / Autonomous Forest Surveillance Safety System Using OpenCVThe current forest surveillance systems methods consume a lot of resources and are less efficient, not reliable and require a constant human presence whose tasks can be easily automated using new technology. To solve these problems we propose an autonomous surveillance system which uses object detection to identify specified animals. It is capable of monitoring forest fires, intruders, wildlife etc, all at once and alerts the concerned officials immediately and precisely. It has a hybrid object detection system using HAAR and Backpropagation neural network algorithms which can be used to train and detect animals and predict from the data obtained respectively. This helps in detecting various unwanted visitors, dangerous animals, or restricted tools into the forest. The system can not only store the video feed but can also determine population , track a specific animal or human and sends the pictures to your email directly along with real-time video monitoring via the internet which allows the users to monitor from anywhere in the world and sends instant alerts to your phone via an SMS even in remote areas in case of emergencies, and it stores all the data in a repository. We can control the system using a windows app which allows us to select which animals to be detected by the camera modules and their alert levels along with other settings and also provides a detailed analysis on various things like forest fires, animal population, trespassed areas etc, to users in simple charts. It is a smart, automatic, modular system which is cheap and easily expandable.
M-Anwar-Hussaini / JavaScriptCapstoneThis project is a web application that provides information about different countries. It allows users to explore details such as the continent, capital, area, and population of various countries. Users can also view comments and add their own insights about specific countries.
NIST-MNI / Build Average ModelCollection of scripts to create population-specific average anatomical models
Serien3 / SimPGA population-specific haplotype genome simulation tool developed based on pangenome data
huwenboshi / PescaEstimating proportion of population-specific and shared causal variants
rishabhathiya / Bank Marketing# Bank Marketing Dataset ## Marketing Introduction: The process by which companies create value for customers and build strong customer relationships in order to capture value from customers in return. - Kotler and Armstrong (2010). Marketing campaigns are characterized by focusing on the customer needs and their overall satisfaction. Nevertheless, there are different variables that determine whether a marketing campaign will be successful or not. There are certain variables that we need to take into consideration when making a marketing campaign. ## The 4 Ps: 1) Segment of the Population: To which segment of the population is the marketing campaign going to address and why? This aspect of the marketing campaign is extremely important since it will tell to which part of the population should most likely receive the message of the marketing campaign. 2) Distribution channel to reach the customer's place: Implementing the most effective strategy in order to get the most out of this marketing campaign. What segment of the population should we address? Which instrument should we use to get our message out? (Ex: Telephones, Radio, TV, Social Media Etc.) 3) Price: What is the best price to offer to potential clients? (In the case of the bank's marketing campaign this is not necessary since the main interest for the bank is for potential clients to open depost accounts in order to make the operative activities of the bank to keep on running.) 4) Promotional Strategy: This is the way the strategy is going to be implemented and how are potential clients going to be address. This should be the last part of the marketing campaign analysis since there has to be an indepth analysis of previous campaigns (If possible) in order to learn from previous mistakes and to determine how to make the marketing campaign much more effective. ## What is a Term Deposit? A Term deposit is a deposit that a bank or a financial institurion offers with a fixed rate (often better than just opening deposit account) in which your money will be returned back at a specific maturity time. For more information with regards to Term Deposits please click on this link from Investopedia: https://www.investopedia.com/terms/t/termdeposit.asp ## Outline: 1. Import data from dataset and perform initial high-level analysis: look at the number of rows, look at the missing values, look at dataset columns and their values respective to the campaign outcome. 2. Clean the data: remove irrelevant columns, deal with missing and incorrect values, turn categorical columns into dummy variables. 3. Use machine learning techniques to predict the marketing campaign outcome and to find out factors, which affect the success of the campaign. ## Dataset Link https://archive.ics.uci.edu/ml/datasets/Bank+Marketing ## Dataset Information The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed. There are four datasets: 1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014] 2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs. 3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs). 4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs). The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM). The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y). ## Attribute Information Input variables: #### bank client data: 1-age (numeric) 2-job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown') 3-marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed) 4-education(categorical:'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown') 5-default: has credit in default? (categorical: 'no','yes','unknown') 6-housing: has housing loan? (categorical: 'no','yes','unknown') 7-loan: has personal loan? (categorical: 'no','yes','unknown') #### related with the last contact of the current campaign: 8-contact: contact communication type (categorical: 'cellular','telephone') 9-month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec') 10-day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri') 11-duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model. #### other attributes: 12-campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13-pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14-previous: number of contacts performed before this campaign and for this client (numeric) 15-poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success') #### social and economic context attributes 16-emp.var.rate: employment variation rate - quarterly indicator (numeric) 17-cons.price.idx: consumer price index - monthly indicator (numeric) 18-cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19-euribor3m: euribor 3 month rate - daily indicator (numeric) 20-nr.employed: number of employees - quarterly indicator (numeric) Output variable (desired target): 21-y - has the client subscribed a term deposit? (binary: 'yes','no') ## License This dataset is public available for research. Citations - 1.Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014 2.Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
bensutherland / Simple Pop StatsA short analysis of population statistics given specific inputs
radrumond / ChameleonParametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. In most current practices, this is accomplished by using some form of random initialization. Instead, recent work shows that a specific initial parameter set can be learned from a population of tasks, i.e., dataset and target variable for supervised learning tasks. Using this initial parameter set leads to faster convergence for new tasks (model-agnostic meta-learning). Currently, methods for learning model initializations are limited to a population of tasks sharing the same schema, i.e., the same number, order, type and semantics of predictor and target variables. In this paper, we address the problem of meta-learning parameter initialization across tasks with different schemas, i.e., if the number of predictors varies across tasks, while they still share some variables. We propose Chameleon, a model that learns to align different predictor schemas to a common representation. We use permutations and masks of the predictors of the training tasks at hand. In experiments on real-life data sets, we show that Chameleon successfully can learn parameter initializations across tasks with different schemas providing a 26\% lift on accuracy on average over random initialization and of 5\% over a state-of-the-art method for fixed-schema learning model initializations. To the best of our knowledge, our paper is the first work on the problem of learning model initialization across tasks with different schemas.
entjos / ExclusionTableCreating tables of excluded observations. Especially usefull for any research involving specific populations, e.g. epidemiology or social science.
EddyShimwa / Metrics WebappThis mobile app provides quick and easy access to country-specific information. Users can select a country and instantly view relevant details and statistics, such as its population, capital city, currency, and more.
aniket20june / Time Table Scheduling Using Genetic AlgorithmOne of the most challenging task for every college, school, and company administrator is to create an efficient timetable every year or every semester. The herculean task encompasses a huge amount of time and staff involvement. Moreover, a minute mistake will lead to re-planning of the entire timetable. In the present project, we have tried to resolve this problem to a considerable extent so as to save precious man hours. Switching to a digital way of scheduling a time table will result in the automation of workflow. The algorithm is capable of generating timetables automatically while ensuring that no class overlap with each other. It help align proper schedules and allot faculty by mapping the classes, subjects, and teachers details into the system. We have implemented it using Genetic Algorithms. Genetic Algorithm is a search-based optimization technique based on the principles of Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to difficult problems (NP Hard Problems) that wise would take a lifetime to solve. The nature-inspired algorithms depend on their convergence rate and convergence rate depends on the fitness function and algorithmic parameters. Therefore, considering the above arguments and referring several research papers, we have finalised to go with the Genetic algorithm for given project. However, Population-based algorithms like Genetic Algorithm tend to be time-consuming. GAs have more problems to deal with constraints. Often hybrid approaches outperform specific meta-heuristics. I have implemented a combination of Heuristics and Genetic Algorithms in this project.
VirtualPregnancy / PlacentagenPython libraries to generate computational descriptions the placenta that match statistically to morphometric data. These generated models are population based and not subject specific.
41way5 / Organ PACProteomic organ-specific aging clock across diverse populations
AI-sandbox / AsPopGenStatsAncestry-specific tool for calculating population genetics statistics from Gnomix output
JingfangSI / SnpCountCUCount common and unique SNPs among several populations from a VCF format file.