399 skills found · Page 1 of 14
NAalytics / Assemblies Of Putative SARS CoV2 Spike Encoding MRNA Sequences For Vaccines BNT 162b2 And MRNA 1273RNA vaccines have become a key tool in moving forward through the challenges raised both in the current pandemic and in numerous other public health and medical challenges. With the rollout of vaccines for COVID-19, these synthetic mRNAs have become broadly distributed RNA species in numerous human populations. Despite their ubiquity, sequences are not always available for such RNAs. Standard methods facilitate such sequencing. In this note, we provide experimental sequence information for the RNA components of the initial Moderna (https://pubmed.ncbi.nlm.nih.gov/32756549/) and Pfizer/BioNTech (https://pubmed.ncbi.nlm.nih.gov/33301246/) COVID-19 vaccines, allowing a working assembly of the former and a confirmation of previously reported sequence information for the latter RNA. Sharing of sequence information for broadly used therapeutics has the benefit of allowing any researchers or clinicians using sequencing approaches to rapidly identify such sequences as therapeutic-derived rather than host or infectious in origin. For this work, RNAs were obtained as discards from the small portions of vaccine doses that remained in vials after immunization; such portions would have been required to be otherwise discarded and were analyzed under FDA authorization for research use. To obtain the small amounts of RNA needed for characterization, vaccine remnants were phenol-chloroform extracted using TRIzol Reagent (Invitrogen), with intactness assessed by Agilent 2100 Bioanalyzer before and after extraction. Although our analysis mainly focused on RNAs obtained as soon as possible following discard, we also analyzed samples which had been refrigerated (~4 ℃) for up to 42 days with and without the addition of EDTA. Interestingly a substantial fraction of the RNA remained intact in these preparations. We note that the formulation of the vaccines includes numerous key chemical components which are quite possibly unstable under these conditions-- so these data certainly do not suggest that the vaccine as a biological agent is stable. But it is of interest that chemical stability of RNA itself is not sufficient to preclude eventual development of vaccines with a much less involved cold-chain storage and transportation. For further analysis, the initial RNAs were fragmented by heating to 94℃, primed with a random hexamer-tailed adaptor, amplified through a template-switch protocol (Takara SMARTerer Stranded RNA-seq kit), and sequenced using a MiSeq instrument (Illumina) with paired end 78-per end sequencing. As a reference material in specific assays, we included RNA of known concentration and sequence (from bacteriophage MS2). From these data, we obtained partial information on strandedness and a set of segments that could be used for assembly. This was particularly useful for the Moderna vaccine, for which the original vaccine RNA sequence was not available at the time our study was carried out. Contigs encoding full-length spikes were assembled from the Moderna and Pfizer datasets. The Pfizer/BioNTech data [Figure 1] verified the reported sequence for that vaccine (https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/), while the Moderna sequence [Figure 2] could not be checked against a published reference. RNA preparations lacking dsRNA are desirable in generating vaccine formulations as these will minimize an otherwise dramatic biological (and nonspecific) response that vertebrates have to double stranded character in RNA (https://www.nature.com/articles/nrd.2017.243). In the sequence data that we analyzed, we found that the vast majority of reads were from the expected sense strand. In addition, the minority of antisense reads appeared different from sense reads in lacking the characteristic extensions expected from the template switching protocol. Examining only the reads with an evident template switch (as an indicator for strand-of-origin), we observed that both vaccines overwhelmingly yielded sense reads (>99.99%). Independent sequencing assays and other experimental measurements are ongoing and will be needed to determine whether this template-switched sense read fraction in the SmarterSeq protocol indeed represents the actual dsRNA content in the original material. This work provides an initial assessment of two RNAs that are now a part of the human ecosystem and that are likely to appear in numerous other high throughput RNA-seq studies in which a fraction of the individuals may have previously been vaccinated. ProtoAcknowledgements: Thanks to our colleagues for help and suggestions (Nimit Jain, Emily Greenwald, Lamia Wahba, William Wang, Amisha Kumar, Sameer Sundrani, David Lipman, Bijoyita Roy). Figure 1: Spike-encoding contig assembled from BioNTech/Pfizer BNT-162b2 vaccine. Although the full coding region is included, the nature of the methodology used for sequencing and assembly is such that the assembled contig could lack some sequence from the ends of the RNA. Within the assembled sequence, this hypothetical sequence shows a perfect match to the corresponding sequence from documents available online derived from manufacturer communications with the World Health Organization [as reported by https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/]. The 5’ end for the assembly matches the start site noted in these documents, while the read-based assembly lacks an interrupted polyA tail (A30(GCATATGACT)A70) that is expected to be present in the mRNA.
yago-naga / Yago3YAGO is a large semantic knowledge base, derived from Wikipedia, WordNet, WikiData, GeoNames, and other data sources
linkedin / VeniceVenice, Derived Data Platform for Planet-Scale Workloads.
derive4j / Derive4jJava 8 annotation processor and framework for deriving algebraic data types constructors, pattern-matching, folds, optics and typeclasses.
Callidon / Bloom FiltersJS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash
microsoft / Conversation Knowledge Mining Solution AcceleratorThis solution accelerator leverages Microsoft Foundry, Azure Content Understanding, Azure OpenAI Service, and Foundry IQ to enable organizations to derive insights from volumes of conversational data using generative AI. It offers key phrase extraction, topic modeling, and interactive chat experiences through an intuitive web interface.
macmade / XcleanXclean is a macOS menu bar app that provides a convenient way to clear Xcode's derived data or module cache.
sayantann11 / All Classification Templetes For MLClassification - Machine Learning This is ‘Classification’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn Classification algorithms, types of classification algorithms, support vector machines(SVM), Naive Bayes, Decision Tree and Random Forest Classifier in this tutorial. Objectives Let us look at some of the objectives covered under this section of Machine Learning tutorial. Define Classification and list its algorithms Describe Logistic Regression and Sigmoid Probability Explain K-Nearest Neighbors and KNN classification Understand Support Vector Machines, Polynomial Kernel, and Kernel Trick Analyze Kernel Support Vector Machines with an example Implement the Naïve Bayes Classifier Demonstrate Decision Tree Classifier Describe Random Forest Classifier Classification: Meaning Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well. There are 2 types of Classification: Binomial Multi-Class Classification: Use Cases Some of the key areas where classification cases are being used: To find whether an email received is a spam or ham To identify customer segments To find if a bank loan is granted To identify if a kid will pass or fail in an examination Classification: Example Social media sentiment analysis has two potential outcomes, positive or negative, as displayed by the chart given below. https://www.simplilearn.com/ice9/free_resources_article_thumb/classification-example-machine-learning.JPG This chart shows the classification of the Iris flower dataset into its three sub-species indicated by codes 0, 1, and 2. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-flower-dataset-graph.JPG The test set dots represent the assignment of new test data points to one class or the other based on the trained classifier model. Types of Classification Algorithms Let’s have a quick look into the types of Classification Algorithm below. Linear Models Logistic Regression Support Vector Machines Nonlinear models K-nearest Neighbors (KNN) Kernel Support Vector Machines (SVM) Naïve Bayes Decision Tree Classification Random Forest Classification Logistic Regression: Meaning Let us understand the Logistic Regression model below. This refers to a regression model that is used for classification. This method is widely used for binary classification problems. It can also be extended to multi-class classification problems. Here, the dependent variable is categorical: y ϵ {0, 1} A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc In this case, you model the probability distribution of output y as 1 or 0. This is called the sigmoid probability (σ). If σ(θ Tx) > 0.5, set y = 1, else set y = 0 Unlike Linear Regression (and its Normal Equation solution), there is no closed form solution for finding optimal weights of Logistic Regression. Instead, you must solve this with maximum likelihood estimation (a probability model to detect the maximum likelihood of something happening). It can be used to calculate the probability of a given outcome in a binary model, like the probability of being classified as sick or passing an exam. https://www.simplilearn.com/ice9/free_resources_article_thumb/logistic-regression-example-graph.JPG Sigmoid Probability The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve): https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-function-machine-learning.JPG In this equation, t represents data values * the number of hours studied and S(t) represents the probability of passing the exam. Assume sigmoid function: https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-probability-machine-learning.JPG g(z) tends toward 1 as z -> infinity , and g(z) tends toward 0 as z -> infinity K-nearest Neighbors (KNN) K-nearest Neighbors algorithm is used to assign a data point to clusters based on similarity measurement. It uses a supervised method for classification. The steps to writing a k-means algorithm are as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-distribution-graph-machine-learning.JPG Choose the number of k and a distance metric. (k = 5 is common) Find k-nearest neighbors of the sample that you want to classify Assign the class label by majority vote. KNN Classification A new input point is classified in the category such that it has the most number of neighbors from that category. For example: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-classification-machine-learning.JPG Classify a patient as high risk or low risk. Mark email as spam or ham. Keen on learning about Classification Algorithms in Machine Learning? Click here! Support Vector Machine (SVM) Let us understand Support Vector Machine (SVM) in detail below. SVMs are classification algorithms used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Once ideal hyperplanes are discovered, new data points can be easily classified. https://www.simplilearn.com/ice9/free_resources_article_thumb/support-vector-machines-graph-machine-learning.JPG The optimization objective is to find “maximum margin hyperplane” that is farthest from the closest points in the two classes (these points are called support vectors). In the given figure, the middle line represents the hyperplane. SVM Example Let’s look at this image below and have an idea about SVM in general. Hyperplanes with larger margins have lower generalization error. The positive and negative hyperplanes are represented by: https://www.simplilearn.com/ice9/free_resources_article_thumb/positive-negative-hyperplanes-machine-learning.JPG Classification of any new input sample xtest : If w0 + wTxtest > 1, the sample xtest is said to be in the class toward the right of the positive hyperplane. If w0 + wTxtest < -1, the sample xtest is said to be in the class toward the left of the negative hyperplane. When you subtract the two equations, you get: https://www.simplilearn.com/ice9/free_resources_article_thumb/equation-subtraction-machine-learning.JPG Length of vector w is (L2 norm length): https://www.simplilearn.com/ice9/free_resources_article_thumb/length-of-vector-machine-learning.JPG You normalize with the length of w to arrive at: https://www.simplilearn.com/ice9/free_resources_article_thumb/normalize-equation-machine-learning.JPG SVM: Hard Margin Classification Given below are some points to understand Hard Margin Classification. The left side of equation SVM-1 given above can be interpreted as the distance between the positive (+ve) and negative (-ve) hyperplanes; in other words, it is the margin that can be maximized. Hence the objective of the function is to maximize with the constraint that the samples are classified correctly, which is represented as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-machine-learning.JPG This means that you are minimizing ‖w‖. This also means that all positive samples are on one side of the positive hyperplane and all negative samples are on the other side of the negative hyperplane. This can be written concisely as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-formula.JPG Minimizing ‖w‖ is the same as minimizing. This figure is better as it is differentiable even at w = 0. The approach listed above is called “hard margin linear SVM classifier.” SVM: Soft Margin Classification Given below are some points to understand Soft Margin Classification. To allow for linear constraints to be relaxed for nonlinearly separable data, a slack variable is introduced. (i) measures how much ith instance is allowed to violate the margin. The slack variable is simply added to the linear constraints. https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-machine-learning.JPG Subject to the above constraints, the new objective to be minimized becomes: https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-formula.JPG You have two conflicting objectives now—minimizing slack variable to reduce margin violations and minimizing to increase the margin. The hyperparameter C allows us to define this trade-off. Large values of C correspond to larger error penalties (so smaller margins), whereas smaller values of C allow for higher misclassification errors and larger margins. https://www.simplilearn.com/ice9/free_resources_article_thumb/machine-learning-certification-video-preview.jpg SVM: Regularization The concept of C is the reverse of regularization. Higher C means lower regularization, which increases bias and lowers the variance (causing overfitting). https://www.simplilearn.com/ice9/free_resources_article_thumb/concept-of-c-graph-machine-learning.JPG IRIS Data Set The Iris dataset contains measurements of 150 IRIS flowers from three different species: Setosa Versicolor Viriginica Each row represents one sample. Flower measurements in centimeters are stored as columns. These are called features. IRIS Data Set: SVM Let’s train an SVM model using sci-kit-learn for the Iris dataset: https://www.simplilearn.com/ice9/free_resources_article_thumb/svm-model-graph-machine-learning.JPG Nonlinear SVM Classification There are two ways to solve nonlinear SVMs: by adding polynomial features by adding similarity features Polynomial features can be added to datasets; in some cases, this can create a linearly separable dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/nonlinear-classification-svm-machine-learning.JPG In the figure on the left, there is only 1 feature x1. This dataset is not linearly separable. If you add x2 = (x1)2 (figure on the right), the data becomes linearly separable. Polynomial Kernel In sci-kit-learn, one can use a Pipeline class for creating polynomial features. Classification results for the Moons dataset are shown in the figure. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-machine-learning.JPG Polynomial Kernel with Kernel Trick Let us look at the image below and understand Kernel Trick in detail. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-with-kernel-trick.JPG For large dimensional datasets, adding too many polynomial features can slow down the model. You can apply a kernel trick with the effect of polynomial features without actually adding them. The code is shown (SVC class) below trains an SVM classifier using a 3rd-degree polynomial kernel but with a kernel trick. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-equation-machine-learning.JPG The hyperparameter coefθ controls the influence of high-degree polynomials. Kernel SVM Let us understand in detail about Kernel SVM. Kernel SVMs are used for classification of nonlinear data. In the chart, nonlinear data is projected into a higher dimensional space via a mapping function where it becomes linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-machine-learning.JPG In the higher dimension, a linear separating hyperplane can be derived and used for classification. A reverse projection of the higher dimension back to original feature space takes it back to nonlinear shape. As mentioned previously, SVMs can be kernelized to solve nonlinear classification problems. You can create a sample dataset for XOR gate (nonlinear problem) from NumPy. 100 samples will be assigned the class sample 1, and 100 samples will be assigned the class label -1. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-graph-machine-learning.JPG As you can see, this data is not linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-non-separable.JPG You now use the kernel trick to classify XOR dataset created earlier. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-xor-machine-learning.JPG Naïve Bayes Classifier What is Naive Bayes Classifier? Have you ever wondered how your mail provider implements spam filtering or how online news channels perform news text classification or even how companies perform sentiment analysis of their audience on social media? All of this and more are done through a machine learning algorithm called Naive Bayes Classifier. Naive Bayes Named after Thomas Bayes from the 1700s who first coined this in the Western literature. Naive Bayes classifier works on the principle of conditional probability as given by the Bayes theorem. Advantages of Naive Bayes Classifier Listed below are six benefits of Naive Bayes Classifier. Very simple and easy to implement Needs less training data Handles both continuous and discrete data Highly scalable with the number of predictors and data points As it is fast, it can be used in real-time predictions Not sensitive to irrelevant features Bayes Theorem We will understand Bayes Theorem in detail from the points mentioned below. According to the Bayes model, the conditional probability P(Y|X) can be calculated as: P(Y|X) = P(X|Y)P(Y) / P(X) This means you have to estimate a very large number of P(X|Y) probabilities for a relatively small vector space X. For example, for a Boolean Y and 30 possible Boolean attributes in the X vector, you will have to estimate 3 billion probabilities P(X|Y). To make it practical, a Naïve Bayes classifier is used, which assumes conditional independence of P(X) to each other, with a given value of Y. This reduces the number of probability estimates to 2*30=60 in the above example. Naïve Bayes Classifier for SMS Spam Detection Consider a labeled SMS database having 5574 messages. It has messages as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-machine-learning.JPG Each message is marked as spam or ham in the data set. Let’s train a model with Naïve Bayes algorithm to detect spam from ham. The message lengths and their frequency (in the training dataset) are as shown below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-spam-detection.JPG Analyze the logic you use to train an algorithm to detect spam: Split each message into individual words/tokens (bag of words). Lemmatize the data (each word takes its base form, like “walking” or “walked” is replaced with “walk”). Convert data to vectors using scikit-learn module CountVectorizer. Run TFIDF to remove common words like “is,” “are,” “and.” Now apply scikit-learn module for Naïve Bayes MultinomialNB to get the Spam Detector. This spam detector can then be used to classify a random new message as spam or ham. Next, the accuracy of the spam detector is checked using the Confusion Matrix. For the SMS spam example above, the confusion matrix is shown on the right. Accuracy Rate = Correct / Total = (4827 + 592)/5574 = 97.21% Error Rate = Wrong / Total = (155 + 0)/5574 = 2.78% https://www.simplilearn.com/ice9/free_resources_article_thumb/confusion-matrix-machine-learning.JPG Although confusion Matrix is useful, some more precise metrics are provided by Precision and Recall. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-recall-matrix-machine-learning.JPG Precision refers to the accuracy of positive predictions. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-formula-machine-learning.JPG Recall refers to the ratio of positive instances that are correctly detected by the classifier (also known as True positive rate or TPR). https://www.simplilearn.com/ice9/free_resources_article_thumb/recall-formula-machine-learning.JPG Precision/Recall Trade-off To detect age-appropriate videos for kids, you need high precision (low recall) to ensure that only safe videos make the cut (even though a few safe videos may be left out). The high recall is needed (low precision is acceptable) in-store surveillance to catch shoplifters; a few false alarms are acceptable, but all shoplifters must be caught. Learn about Naive Bayes in detail. Click here! Decision Tree Classifier Some aspects of the Decision Tree Classifier mentioned below are. Decision Trees (DT) can be used both for classification and regression. The advantage of decision trees is that they require very little data preparation. They do not require feature scaling or centering at all. They are also the fundamental components of Random Forests, one of the most powerful ML algorithms. Unlike Random Forests and Neural Networks (which do black-box modeling), Decision Trees are white box models, which means that inner workings of these models are clearly understood. In the case of classification, the data is segregated based on a series of questions. Any new data point is assigned to the selected leaf node. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-machine-learning.JPG Start at the tree root and split the data on the feature using the decision algorithm, resulting in the largest information gain (IG). This splitting procedure is then repeated in an iterative process at each child node until the leaves are pure. This means that the samples at each node belonging to the same class. In practice, you can set a limit on the depth of the tree to prevent overfitting. The purity is compromised here as the final leaves may still have some impurity. The figure shows the classification of the Iris dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-graph.JPG IRIS Decision Tree Let’s build a Decision Tree using scikit-learn for the Iris flower dataset and also visualize it using export_graphviz API. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-machine-learning.JPG The output of export_graphviz can be converted into png format: https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-output.JPG Sample attribute stands for the number of training instances the node applies to. Value attribute stands for the number of training instances of each class the node applies to. Gini impurity measures the node’s impurity. A node is “pure” (gini=0) if all training instances it applies to belong to the same class. https://www.simplilearn.com/ice9/free_resources_article_thumb/impurity-formula-machine-learning.JPG For example, for Versicolor (green color node), the Gini is 1-(0/54)2 -(49/54)2 -(5/54) 2 ≈ 0.168 https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-sample.JPG Decision Boundaries Let us learn to create decision boundaries below. For the first node (depth 0), the solid line splits the data (Iris-Setosa on left). Gini is 0 for Setosa node, so no further split is possible. The second node (depth 1) splits the data into Versicolor and Virginica. If max_depth were set as 3, a third split would happen (vertical dotted line). https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-boundaries.JPG For a sample with petal length 5 cm and petal width 1.5 cm, the tree traverses to depth 2 left node, so the probability predictions for this sample are 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54), and 9.3% for Iris-Virginica (5/54) CART Training Algorithm Scikit-learn uses Classification and Regression Trees (CART) algorithm to train Decision Trees. CART algorithm: Split the data into two subsets using a single feature k and threshold tk (example, petal length < “2.45 cm”). This is done recursively for each node. k and tk are chosen such that they produce the purest subsets (weighted by their size). The objective is to minimize the cost function as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/cart-training-algorithm-machine-learning.JPG The algorithm stops executing if one of the following situations occurs: max_depth is reached No further splits are found for each node Other hyperparameters may be used to stop the tree: min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes Gini Impurity or Entropy Entropy is one more measure of impurity and can be used in place of Gini. https://www.simplilearn.com/ice9/free_resources_article_thumb/gini-impurity-entrophy.JPG It is a degree of uncertainty, and Information Gain is the reduction that occurs in entropy as one traverses down the tree. Entropy is zero for a DT node when the node contains instances of only one class. Entropy for depth 2 left node in the example given above is: https://www.simplilearn.com/ice9/free_resources_article_thumb/entrophy-for-depth-2.JPG Gini and Entropy both lead to similar trees. DT: Regularization The following figure shows two decision trees on the moons dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/dt-regularization-machine-learning.JPG The decision tree on the right is restricted by min_samples_leaf = 4. The model on the left is overfitting, while the model on the right generalizes better. Random Forest Classifier Let us have an understanding of Random Forest Classifier below. A random forest can be considered an ensemble of decision trees (Ensemble learning). Random Forest algorithm: Draw a random bootstrap sample of size n (randomly choose n samples from the training set). Grow a decision tree from the bootstrap sample. At each node, randomly select d features. Split the node using the feature that provides the best split according to the objective function, for instance by maximizing the information gain. Repeat the steps 1 to 2 k times. (k is the number of trees you want to create, using a subset of samples) Aggregate the prediction by each tree for a new data point to assign the class label by majority vote (pick the group selected by the most number of trees and assign new data point to that group). Random Forests are opaque, which means it is difficult to visualize their inner workings. https://www.simplilearn.com/ice9/free_resources_article_thumb/random-forest-classifier-graph.JPG However, the advantages outweigh their limitations since you do not have to worry about hyperparameters except k, which stands for the number of decision trees to be created from a subset of samples. RF is quite robust to noise from the individual decision trees. Hence, you need not prune individual decision trees. The larger the number of decision trees, the more accurate the Random Forest prediction is. (This, however, comes with higher computation cost). Key Takeaways Let us quickly run through what we have learned so far in this Classification tutorial. Classification algorithms are supervised learning methods to split data into classes. They can work on Linear Data as well as Nonlinear Data. Logistic Regression can classify data based on weighted parameters and sigmoid conversion to calculate the probability of classes. K-nearest Neighbors (KNN) algorithm uses similar features to classify data. Support Vector Machines (SVMs) classify data by detecting the maximum margin hyperplane between data classes. Naïve Bayes, a simplified Bayes Model, can help classify data using conditional probability models. Decision Trees are powerful classifiers and use tree splitting logic until pure or somewhat pure leaf node classes are attained. Random Forests apply Ensemble Learning to Decision Trees for more accurate classification predictions. Conclusion This completes ‘Classification’ tutorial. In the next tutorial, we will learn 'Unsupervised Learning with Clustering.'
socialfoundations / FolktablesDatasets derived from US census data
Lattice-zjj / On Device FinLLMOD-FinLLM is a refined model derived from the LLaMA series, with specific enhancements for Chinese financial knowledge. This model is built by fine-tuning LLaMA using a specialized instruction dataset created from publicly available Chinese financial Q&A data and additional web-scraped financial information.
DPIRD-DMA / SmoothifyA Python package for smoothing and refining geometries derived from raster data classifications. Smoothify transforms jagged polygons and lines resulting from raster-to-vector conversion into smooth, visually appealing features using an optimized implementation of Chaikin's corner-cutting algorithm.
GIScience / Openpoiservice:round_pushpin: Openpoiservice is a flask application which hosts a highly customizable points of interest database derived from OpenStreetMap data.
himanshub1007 / Alzhimers Disease Prediction Using Deep Learning# AD-Prediction Convolutional Neural Networks for Alzheimer's Disease Prediction Using Brain MRI Image ## Abstract Alzheimers disease (AD) is characterized by severe memory loss and cognitive impairment. It associates with significant brain structure changes, which can be measured by magnetic resonance imaging (MRI) scan. The observable preclinical structure changes provides an opportunity for AD early detection using image classification tools, like convolutional neural network (CNN). However, currently most AD related studies were limited by sample size. Finding an efficient way to train image classifier on limited data is critical. In our project, we explored different transfer-learning methods based on CNN for AD prediction brain structure MRI image. We find that both pretrained 2D AlexNet with 2D-representation method and simple neural network with pretrained 3D autoencoder improved the prediction performance comparing to a deep CNN trained from scratch. The pretrained 2D AlexNet performed even better (**86%**) than the 3D CNN with autoencoder (**77%**). ## Method #### 1. Data In this project, we used public brain MRI data from **Alzheimers Disease Neuroimaging Initiative (ADNI)** Study. ADNI is an ongoing, multicenter cohort study, started from 2004. It focuses on understanding the diagnostic and predictive value of Alzheimers disease specific biomarkers. The ADNI study has three phases: ADNI1, ADNI-GO, and ADNI2. Both ADNI1 and ADNI2 recruited new AD patients and normal control as research participants. Our data included a total of 686 structure MRI scans from both ADNI1 and ADNI2 phases, with 310 AD cases and 376 normal controls. We randomly derived the total sample into training dataset (n = 519), validation dataset (n = 100), and testing dataset (n = 67). #### 2. Image preprocessing Image preprocessing were conducted using Statistical Parametric Mapping (SPM) software, version 12. The original MRI scans were first skull-stripped and segmented using segmentation algorithm based on 6-tissue probability mapping and then normalized to the International Consortium for Brain Mapping template of European brains using affine registration. Other configuration includes: bias, noise, and global intensity normalization. The standard preprocessing process output 3D image files with an uniform size of 121x145x121. Skull-stripping and normalization ensured the comparability between images by transforming the original brain image into a standard image space, so that same brain substructures can be aligned at same image coordinates for different participants. Diluted or enhanced intensity was used to compensate the structure changes. the In our project, we used both whole brain (including both grey matter and white matter) and grey matter only. #### 3. AlexNet and Transfer Learning Convolutional Neural Networks (CNN) are very similar to ordinary Neural Networks. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers are either convolutional, pooling or fully connected. ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. #### 3.1. AlexNet The net contains eight layers with weights; the first five are convolutional and the remaining three are fully connected. The overall architecture is shown in Figure 1. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. AlexNet maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution. The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (as shown in Figure1). The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully connected layers are connected to all neurons in the previous layer. Response-normalization layers follow the first and second convolutional layers. Max-pooling layers follow both response-normalization layers as well as the fifth convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.  The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3x3x256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3x3x192 , and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each. #### 3.2. Transfer Learning Training an entire Convolutional Network from scratch (with random initialization) is impractical[14] because it is relatively rare to have a dataset of sufficient size. An alternative is to pretrain a Conv-Net on a very large dataset (e.g. ImageNet), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. Typically, there are three major transfer learning scenarios: **ConvNet as fixed feature extractor:** We can take a ConvNet pretrained on ImageNet, and remove the last fully-connected layer, then treat the rest structure as a fixed feature extractor for the target dataset. In AlexNet, this would be a 4096-D vector. Usually, we call these features as CNN codes. Once we get these features, we can train a linear classifier (e.g. linear SVM or Softmax classifier) for our target dataset. **Fine-tuning the ConvNet:** Another idea is not only replace the last fully-connected layer in the classifier, but to also fine-tune the parameters of the pretrained network. Due to overfitting concerns, we can only fine-tune some higher-level part of the network. This suggestion is motivated by the observation that earlier features in a ConvNet contains more generic features (e.g. edge detectors or color blob detectors) that can be useful for many kind of tasks. But the later layer of the network becomes progressively more specific to the details of the classes contained in the original dataset. **Pretrained models:** The released pretrained model is usually the final ConvNet checkpoint. So it is common to see people use the network for fine-tuning. #### 4. 3D Autoencoder and Convolutional Neural Network We take a two-stage approach where we first train a 3D sparse autoencoder to learn filters for convolution operations, and then build a convolutional neural network whose first layer uses the filters learned with the autoencoder.  #### 4.1. Sparse Autoencoder An autoencoder is a 3-layer neural network that is used to extract features from an input such as an image. Sparse representations can provide a simple interpretation of the input data in terms of a small number of \parts by extracting the structure hidden in the data. The autoencoder has an input layer, a hidden layer and an output layer, and the input and output layers have same number of units, while the hidden layer contains more units for a sparse and overcomplete representation. The encoder function maps input x to representation h, and the decoder function maps the representation h to the output x. In our problem, we extract 3D patches from scans as the input to the network. The decoder function aims to reconstruct the input form the hidden representation h. #### 4.2. 3D Convolutional Neural Network Training the 3D convolutional neural network(CNN) is the second stage. The CNN we use in this project has one convolutional layer, one pooling layer, two linear layers, and finally a log softmax layer. After training the sparse autoencoder, we take the weights and biases of the encoder from trained model, and use them a 3D filter of a 3D convolutional layer of the 1-layer convolutional neural network. Figure 2 shows the architecture of the network. #### 5. Tools In this project, we used Nibabel for MRI image processing and PyTorch Neural Networks implementation.
xploitspeeds / Bookmarklet Hacks For School* READ THE README FOR INFO!! * Incoming Tags- z score statistics,find mean median mode statistics in ms excel,variance,standard deviation,linear regression,data processing,confidence intervals,average value,probability theory,binomial distribution,matrix,random numbers,error propagation,t statistics analysis,hypothesis testing,theorem,chi square,time series,data collection,sampling,p value,scatterplots,statistics lectures,statistics tutorials,business mathematics statistics,share stock market statistics in calculator,business analytics,GTA,continuous frequency distribution,statistics mathematics in real life,modal class,n is even,n is odd,median mean of series of numbers,math help,Sujoy Krishna Das,n+1/2 element,measurement of variation,measurement of central tendency,range of numbers,interquartile range,casio fx991,casio fx82,casio fx570,casio fx115es,casio 9860,casio 9750,casio 83gt,TI BAII+ financial,casio piano,casio calculator tricks and hacks,how to cheat in exam and not get caught,grouped interval data,equation of triangle rectangle curve parabola hyperbola,graph theory,operation research(OR),numerical methods,decision making,pie chart,bar graph,computer data analysis,histogram,statistics formula,matlab tutorial,find arithmetic mean geometric mean,find population standard deviation,find sample standard deviation,how to use a graphic calculator,pre algebra,pre calculus,absolute deviation,TI Nspire,TI 84 TI83 calculator tutorial,texas instruments calculator,grouped data,set theory,IIT JEE,AIEEE,GCSE,CAT,MAT,SAT,GMAT,MBBS,JELET,JEXPO,VOCLET,Indiastudychannel,IAS,IPS,IFS,GATE,B-Tech,M-Tech,AMIE,MBA,BBA,BCA,MCA,XAT,TOEFL,CBSE,ICSE,HS,WBUT,SSC,IUPAC,Narendra Modi,Sachin Tendulkar Farewell Speech,Dhoom 3,Arvind Kejriwal,maths revision,how to score good marks in exams,how to pass math exams easily,JEE 12th physics chemistry maths PCM,JEE maths shortcut techniques,quadratic equations,competition exams tips and ticks,competition maths,govt job,JEE KOTA,college math,mean value theorem,L hospital rule,tech guru awaaz,derivation,cryptography,iphone 5 fingerprint hack,crash course,CCNA,converting fractions,solve word problem,cipher,game theory,GDP,how to earn money online on youtube,demand curve,computer science,prime factorization,LCM & GCF,gauss elimination,vector,complex numbers,number systems,vector algebra,logarithm,trigonometry,organic chemistry,electrical math problem,eigen value eigen vectors,runge kutta,gauss jordan,simpson 1/3 3/8 trapezoidal rule,solved problem example,newton raphson,interpolation,integration,differentiation,regula falsi,programming,algorithm,gauss seidal,gauss jacobi,taylor series,iteration,binary arithmetic,logic gates,matrix inverse,determinant of matrix,matrix calculator program,sex in ranchi,sex in kolkata,vogel approximation VAM optimization problem,North west NWCR,Matrix minima,Modi method,assignment problem,transportation problem,simplex,k map,boolean algebra,android,casio FC 200v 100v financial,management mathematics tutorials,net present value NPV,time value of money TVM,internal rate of return IRR Bond price,present value PV and future value FV of annuity casio,simple interest SI & compound interest CI casio,break even point,amortization calculation,HP 10b financial calculator,banking and money,income tax e filing,economics,finance,profit & loss,yield of investment bond,Sharp EL 735S,cash flow casio,re finance,insurance and financial planning,investment appraisal,shortcut keys,depreciation,discounting
CellProfiler / CellProfiler AnalystOpen-source software for exploring and analyzing large, high-dimensional image-derived data.
dwcaress / MB SystemMB-System is an open source software package for the processing and display of bathymetry and backscatter imagery data derived from multibeam, interferometry, and sidescan sonars.
helgeho / ArchiveSparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
COVIDExposureIndices / COVIDExposureIndicesExposure indices derived from PlaceIQ movement data by Couture, Dingel, Green, Handbury, and Williams
jettbrains / L W3C Strategic Highlights September 2019 This report was prepared for the September 2019 W3C Advisory Committee Meeting (W3C Member link). See the accompanying W3C Fact Sheet — September 2019. For the previous edition, see the April 2019 W3C Strategic Highlights. For future editions of this report, please consult the latest version. A Chinese translation is available. ☰ Contents Introduction Future Web Standards Meeting Industry Needs Web Payments Digital Publishing Media and Entertainment Web & Telecommunications Real-Time Communications (WebRTC) Web & Networks Automotive Web of Things Strengthening the Core of the Web HTML CSS Fonts SVG Audio Performance Web Performance WebAssembly Testing Browser Testing and Tools WebPlatform Tests Web of Data Web for All Security, Privacy, Identity Internationalization (i18n) Web Accessibility Outreach to the world W3C Developer Relations W3C Training Translations W3C Liaisons Introduction This report highlights recent work of enhancement of the existing landscape of the Web platform and innovation for the growth and strength of the Web. 33 working groups and a dozen interest groups enable W3C to pursue its mission through the creation of Web standards, guidelines, and supporting materials. We track the tremendous work done across the Consortium through homogeneous work-spaces in Github which enables better monitoring and management. We are in the middle of a period where we are chartering numerous working groups which demonstrate the rapid degree of change for the Web platform: After 4 years, we are nearly ready to publish a Payment Request API Proposed Recommendation and we need to soon charter follow-on work. In the last year we chartered the Web Payment Security Interest Group. In the last year we chartered the Web Media Working Group with 7 specifications for next generation Media support on the Web. We have Accessibility Guidelines under W3C Member review which includes Silver, a new approach. We have just launched the Decentralized Identifier Working Group which has tremendous potential because Decentralized Identifier (DID) is an identifier that is globally unique, resolveable with high availability, and cryptographically verifiable. We have Privacy IG (PING) under W3C Member review which strengthens our focus on the tradeoff between privacy and function. We have a new CSS charter under W3C Member review which maps the group's work for the next three years. In this period, W3C and the WHATWG have succesfully completed the negotiation of a Memorandum of Understanding rooted in the mutual belief that that having two distinct specifications claiming to be normative is generally harmful for the Web community. The MOU, signed last May, describes how the two organizations are to collaborate on the development of a single authoritative version of the HTML and DOM specifications. W3C subsequently rechartered the HTML Working Group to assist the W3C community in raising issues and proposing solutions for the HTML and DOM specifications, and for the production of W3C Recommendations from WHATWG Review Drafts. As the Web evolves continuously, some groups are looking for ways for specifications to do so as well. So-called "evergreen recommendations" or "living standards" aim to track continuous development (and maintenance) of features, on a feature-by-feature basis, while getting review and patent commitments. We see the maturation and further development of an incredible number of new technologies coming to the Web. Continued progress in many areas demonstrates the vitality of the W3C and the Web community, as the rest of the report illustrates. Future Web Standards W3C has a variety of mechanisms for listening to what the community thinks could become good future Web standards. These include discussions with the Membership, discussions with other standards bodies, the activities of thousands of participants in over 300 community groups, and W3C Workshops. There are lots of good ideas. The W3C strategy team has been identifying promising topics and invites public participation. Future, recent and under consideration Workshops include: Inclusive XR (5-6 November 2019, Seattle, WA, USA) to explore existing and future approaches on making Virtual and Augmented Reality experiences more inclusive, including to people with disabilities; W3C Workshop on Data Models for Transportation (12-13 September 2019, Palo Alto, CA, USA) W3C Workshop on Web Games (27-28 June 2019, Redmond, WA, USA), view report Second W3C Workshop on the Web of Things (3-5 June 2019, Munich, Germany) W3C Workshop on Web Standardization for Graph Data; Creating Bridges: RDF, Property Graph and SQL (4-6 March 2019, Berlin, Germany), view report Web & Machine Learning. The Strategy Funnel documents the staff's exploration of potential new work at various phases: Exploration and Investigation, Incubation and Evaluation, and eventually to the chartering of a new standards group. The Funnel view is a GitHub Project where new area are issues represented by “cards” which move through the columns, usually from left to right. Most cards start in Exploration and move towards Chartering, or move out of the funnel. Public input is welcome at any stage but particularly once Incubation has begun. This helps W3C identify work that is sufficiently incubated to warrant standardization, to review the ecosystem around the work and indicate interest in participating in its standardization, and then to draft a charter that reflects an appropriate scope. Ongoing feedback can speed up the overall standardization process. Since the previous highlights document, W3C has chartered a number of groups, and started discussion on many more: Newly Chartered or Rechartered Web Application Security WG (03-Apr) Web Payment Security IG (17-Apr) Patent and Standards IG (24-Apr) Web Applications WG (14-May) Web & Networks IG (16-May) Media WG (23-May) Media and Entertainment IG (06-Jun) HTML WG (06-Jun) Decentralized Identifier WG (05-Sep) Extended Privacy IG (PING) (30-Sep) Verifiable Claims WG (30-Sep) Service Workers WG (31-Dec) Dataset Exchange WG (31-Dec) Web of Things Working Group (31-Dec) Web Audio Working Group (31-Dec) Proposed charters / Advance Notice Accessibility Guidelines WG Privacy IG (PING) RDF Literal Direction WG Timed Text WG CSS WG Web Authentication WG Closed Internationalization Tag Set IG Meeting Industry Needs Web Payments All Web Payments specifications W3C's payments standards enable a streamlined checkout experience, enabling a consistent user experience across the Web with lower front end development costs for merchants. Users can store and reuse information and more quickly and accurately complete online transactions. The Web Payments Working Group has republished Payment Request API as a Candidate Recommendation, aiming to publish a Proposed Recommendation in the Fall 2019, and is discussing use cases and features for Payment Request after publication of the 1.0 Recommendation. Browser vendors have been finalizing implementation of features added in the past year (view the implementation report). As work continues on the Payment Handler API and its implementation (currently in Chrome and Edge Canary), one focus in 2019 is to increase adoption in other browsers. Recently, Mastercard demonstrated the use of Payment Request API to carry out EMVCo's Secure Remote Commerce (SRC) protocol whose payment method definition is being developed with active participation by Visa, Mastercard, American Express, and Discover. Payment method availability is a key factor in merchant considerations about adopting Payment Request API. The ability to get uniform adoption of a new payment method such as Secure Remote Commerce (SRC) also depends on the availability of the Payment Handler API in browsers, or of proprietary alternatives. Web Monetization, which the Web Payments Working Group will discuss again at its face-to-face meeting in September, can be used to enable micropayments as an alternative revenue stream to advertising. Since the beginning of 2019, Amazon, Brave Software, JCB, Certus Cybersecurity Solutions and Netflix have joined the Web Payments Working Group. In April, W3C launched the Web Payment Security Group to enable W3C, EMVCo, and the FIDO Alliance to collaborate on a vision for Web payment security and interoperability. Participants will define areas of collaboration and identify gaps between existing technical specifications in order to increase compatibility among different technologies, such as: How do SRC, FIDO, and Payment Request relate? The Payment Services Directive 2 (PSD2) regulations in Europe are scheduled to take effect in September 2019. What is the role of EMVCo, W3C, and FIDO technologies, and what is the current state of readiness for the deadline? How can we improve privacy on the Web at the same time as we meet industry requirements regarding user identity? Digital Publishing All Digital Publishing specifications, Publication milestones The Web is the universal publishing platform. Publishing is increasingly impacted by the Web, and the Web increasingly impacts Publishing. Topic of particular interest to Publishing@W3C include typography and layout, accessibility, usability, portability, distribution, archiving, offline access, print on demand, and reliable cross referencing. And the diverse publishing community represented in the groups consist of the traditional "trade" publishers, ebook reading system manufacturers, but also publishers of audio book, scholarly journals or educational materials, library scientists or browser developers. The Publishing Working Group currently concentrates on Audiobooks which lack a comprehensive standard, thus incurring extra costs and time to publish in this booming market. Active development is ongoing on the future standard: Publication Manifest Audiobook profile for Web Publications Lightweight Packaging Format The BD Comics Manga Community Group, the Synchronized Multimedia for Publications Community Group, the Publishing Community Group and a future group on archival, are companions to the working group where specific work is developed and incubated. The Publishing Community Group is a recently launched incubation channel for Publishing@W3C. The goal of the group is to propose, document, and prototype features broadly related to: publications on the Web reading modes and systems and the user experience of publications The EPUB 3 Community Group has successfully completed the revision of EPUB 3.2. The Publishing Business Group fosters ongoing participation by members of the publishing industry and the overall ecosystem in the development of Web infrastructure to better support the needs of the industry. The Business Group serves as an additional conduit to the Publishing Working Group and several Community Groups for feedback between the publishing ecosystem and W3C. The Publishing BG has played a vital role in fostering and advancing the adoption and continued development of EPUB 3. In particular the BG provided critical support to the update of EPUBCheck to validate EPUB content to the new EPUB 3.2 specification. This resulted in the development, in conjunction with the EPUB3 Community Group, of a new generation of EPUBCheck, i.e., EPUBCheck 4.2 production-ready release. Media and Entertainment All Media specifications The Media and Entertainment vertical tracks media-related topics and features that create immersive experiences for end users. HTML5 brought standard audio and video elements to the Web. Standardization activities since then have aimed at turning the Web into a professional platform fully suitable for the delivery of media content and associated materials, enabling missing features to stream video content on the Web such as adaptive streaming and content protection. Together with Microsoft, Comcast, Netflix and Google, W3C received an Technology & Engineering Emmy Award in April 2019 for standardization of a full TV experience on the Web. Current goals are to: Reinforce core media technologies: Creation of the Media Working Group, to develop media-related specifications incubated in the WICG (e.g. Media Capabilities, Picture-in-picture, Media Session) and maintain maintain/evolve Media Source Extensions (MSE) and Encrypted Media Extensions (EME). Improve support for Media Timed Events: data cues incubation. Enhance color support (HDR, wide gamut), in scope of the CSS WG and in the Color on the Web CG. Reduce fragmentation: Continue annual releases of a common and testable baseline media devices, in scope of the Web Media APIs CG and in collaboration with the CTA WAVE Project. Maintain the Road-map of Media Technologies for the Web which highlights Web technologies that can be used to build media applications and services, as well as known gaps to enable additional use cases. Create the future: Discuss perspectives for Media and Entertainment for the Web. Bring the power of GPUs to the Web (graphics, machine learning, heavy processing), under incubation in the GPU for the Web CG. Transition to a Working Group is under discussion. Determine next steps after the successful W3C Workshop on Web Games of June 2019. View the report. Timed Text The Timed Text Working Group develops and maintains formats used for the representation of text synchronized with other timed media, like audio and video, and notably works on TTML, profiles of TTML, and WebVTT. Recent progress includes: A robust WebVTT implementation report poises the specification for publication as a proposed recommendation. Discussions around re-chartering, notably to add a TTML Profile for Audio Description deliverable to the scope of the group, and clarify that rendering of captions within XR content is also in scope. Immersive Web Hardware that enables Virtual Reality (VR) and Augmented Reality (AR) applications are now broadly available to consumers, offering an immersive computing platform with both new opportunities and challenges. The ability to interact directly with immersive hardware is critical to ensuring that the web is well equipped to operate as a first-class citizen in this environment. The Immersive Web Working Group has been stabilizing the WebXR Device API while the companion Immersive Web Community Group incubates the next series of features identified as key for the future of the Immersive Web. W3C plans a workshop focused on the needs and benefits at the intersection of VR & Accessibility (Inclusive XR), on 5-6 November 2019 in Seattle, WA, USA, to explore existing and future approaches on making Virtual and Augmented Reality experiences more inclusive. Web & Telecommunications The Web is the Open Platform for Mobile. Telecommunication service providers and network equipment providers have long been critical actors in the deployment of Web technologies. As the Web platform matures, it brings richer and richer capabilities to extend existing services to new users and devices, and propose new and innovative services. Real-Time Communications (WebRTC) All Real-Time Communications specifications WebRTC has reshaped the whole communication landscape by making any connected device a potential communication end-point, bringing audio and video communications anywhere, on any network, vastly expanding the ability of operators to reach their customers. WebRTC serves as the corner-stone of many online communication and collaboration services. The WebRTC Working Group aims to bringing WebRTC 1.0 (and companion specification Media Capture and Streams) to Recommendation by the end of 2019. Intense efforts are focused on testing (supported by a dedicated hackathon at IETF 104) and interoperability. The group is considering pushing features that have not gotten enough traction to separate modules or to a later minor revision of the spec. Beyond WebRTC 1.0, the WebRTC Working Group will focus its efforts on WebRTC NV which the group has started documenting by identifying use cases. Web & Networks Recently launched, in the wake of the May 2018 Web5G workshop, the Web & Networks Interest Group is chaired by representatives from AT&T, China Mobile and Intel, with a goal to explore solutions for web applications to achieve better performance and resource allocation, both on the device and network. The group's first efforts are around use cases, privacy & security requirements and liaisons. Automotive All Automotive specifications To create a rich application ecosystem for vehicles and other devices allowed to connect to the vehicle, the W3C Automotive Working Group is delivering a service specification to expose all common vehicle signals (engine temperature, fuel/charge level, range, tire pressure, speed, etc.) The Vehicle Information Service Specification (VISS), which is a Candidate Recommendation, is seeing more implementations across the industry. It provides the access method to a common data model for all the vehicle signals –presently encapsulating a thousand or so different data elements– and will be growing to accommodate the advances in automotive such as autonomous and driver assist technologies and electrification. The group is already working on a successor to VISS, leveraging the underlying data model and the VIWI submission from Volkswagen, for a more robust means of accessing vehicle signals information and the same paradigm for other automotive needs including location-based services, media, notifications and caching content. The Automotive and Web Platform Business Group acts as an incubator for prospective standards work. One of its task forces is using W3C VISS in performing data sampling and off-boarding the information to the cloud. Access to the wealth of information that W3C's auto signals standard exposes is of interest to regulators, urban planners, insurance companies, auto manufacturers, fleet managers and owners, service providers and others. In addition to components needed for data sampling and edge computing, capturing user and owner consent, information collection methods and handling of data are in scope. The upcoming W3C Workshop on Data Models for Transportation (September 2019) is expected to focus on the need of additional ontologies around transportation space. Web of Things All Web of Things specifications W3C's Web of Things work is designed to bridge disparate technology stacks to allow devices to work together and achieve scale, thus enabling the potential of the Internet of Things by eliminating fragmentation and fostering interoperability. Thing descriptions expressed in JSON-LD cover the behavior, interaction affordances, data schema, security configuration, and protocol bindings. The Web of Things complements existing IoT ecosystems to reduce the cost and risk for suppliers and consumers of applications that create value by combining multiple devices and information services. There are many sectors that will benefit, e.g. smart homes, smart cities, smart industry, smart agriculture, smart healthcare and many more. The Web of Things Working Group is finishing the initial Web of Things standards, with support from the Web of Things Interest Group: Web of Things Architecture Thing Descriptions Strengthening the Core of the Web HTML The HTML Working Group was chartered early June to assist the W3C community in raising issues and proposing solutions for the HTML and DOM specifications, and to produce W3C Recommendations from WHATWG Review Drafts. A few days before, W3C and the WHATWG signed a Memorandum of Understanding outlining the agreement to collaborate on the development of a single version of the HTML and DOM specifications. Issues and proposed solutions for HTML and DOM done via the newly rechartered HTML Working Group in the WHATWG repositories The HTML Working Group is targetting November 2019 to bring HTML and DOM to Candidate Recommendations. CSS All CSS specifications CSS is a critical part of the Open Web Platform. The CSS Working Group gathers requirements from two large groups of CSS users: the publishing industry and application developers. Within W3C, those groups are exemplified by the Publishing groups and the Web Platform Working Group. The former requires things like better pagination support and advanced font handling, the latter needs intelligent (and fast!) scrolling and animations. What we know as CSS is actually a collection of almost a hundred specifications, referred to as ‘modules’. The current state of CSS is defined by a snapshot, updated once a year. The group also publishes an index defining every term defined by CSS specifications. Fonts All Fonts specifications The Web Fonts Working Group develops specifications that allow the interoperable deployment of downloadable fonts on the Web, with a focus on Progressive Font Enrichment as well as maintenance of WOFF Recommendations. Recent and ongoing work includes: Early API experiments by Adobe and Monotype have demonstrated the feasibility of a font enrichment API, where a server delivers a font with minimal glyph repertoire and the client can query the full repertoire and request additional subsets on-the-fly. In other experiments, the Brotli compression used in WOFF 2 was extended to support shared dictionaries and patch update. Metrics to quantify improvement are a current hot discussion topic. The group will meet at ATypi 2019 in Japan, to gather requirements from the international typography community. The group will first produce a report summarizing the strengths and weaknesses of each prototype solution by Q2 2020. SVG All SVG specifications SVG is an important and widely-used part of the Open Web Platform. The SVG Working Group focuses on aligning the SVG 2.0 specification with browser implementations, having split the specification into a currently-implemented 2.0 and a forward-looking 2.1. Current activity is on stabilization, increased integration with the Open Web Platform, and test coverage analysis. The Working Group was rechartered in March 2019. A new work item concerns native (non-Web-browser) uses of SVG as a non-interactive, vector graphics format. Audio The Web Audio Working Group was extended to finish its work on the Web Audio API, expecting to publish it as a Recommendation by year end. The specification enables synthesizing audio in the browser. Audio operations are performed with audio nodes, which are linked together to form a modular audio routing graph. Multiple sources — with different types of channel layout — are supported. This modular design provides the flexibility to create complex audio functions with dynamic effects. The first version of Web Audio API is now feature complete and is implemented in all modern browsers. Work has started on the next version, and new features are being incubated in the Audio Community Group. Performance Web Performance All Web Performance specifications There are currently 18 specifications in development in the Web Performance Working Group aiming to provide methods to observe and improve aspects of application performance of user agent features and APIs. The W3C team is looking at related work incubated in the W3C GPU for the Web (WebGPU) Community Group which is poised to transition to a W3C Working Group. A preliminary draft charter is available. WebAssembly All WebAssembly specifications WebAssembly improves Web performance and power by being a virtual machine and execution environment enabling loaded pages to run native (compiled) code. It is deployed in Firefox, Edge, Safari and Chrome. The specification will soon reach Candidate Recommendation. WebAssembly enables near-native performance, optimized load time, and perhaps most importantly, a compilation target for existing code bases. While it has a small number of native types, much of the performance increase relative to Javascript derives from its use of consistent typing. WebAssembly leverages decades of optimization for compiled languages and the byte code is optimized for compactness and streaming (the web page starts executing while the rest of the code downloads). Network and API access all occurs through accompanying Javascript libraries -- the security model is identical to that of Javascript. Requirements gathering and language development occur in the Community Group while the Working Group manages test development, community review and progression of specifications on the Recommendation Track. Testing Browser testing plays a critical role in the growth of the Web by: Improving the reliability of Web technology definitions; Improving the quality of implementations of these technologies by helping vendors to detect bugs in their products; Improving the data available to Web developers on known bugs and deficiencies of Web technologies by publishing results of these tests. Browser Testing and Tools The Browser Testing and Tools Working Group is developing WebDriver version 2, having published last year the W3C Recommendation of WebDriver. WebDriver acts as a remote control interface that enables introspection and control of user agents, provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of Web, and emulates the actions of a real person using the browser. WebPlatform Tests The WebPlatform Tests project now provides a mechanism which allows to fully automate tests that previously needed to be run manually: TestDriver. TestDriver enables sending trusted key and mouse events, sending complex series of trusted pointer and key interactions for things like in-content drag-and-drop or pinch zoom, and even file upload. Since 2014 W3C began work on this coordinated open-source effort to build a cross-browser test suite for the Web Platform, which WHATWG, and all major browsers adopted. Web of Data All Data specifications There have been several great success stories around the standardization of data on the web over the past year. Verifiable Claims seems to have significant uptake. It is also significant that the Distributed Identifier WG charter has received numerous favorable reviews, and was just recently launched. JSON-LD has been a major success with the large deployment on Web sites via schema.org. JSON-LD 1.1 completed technical work, about to transition to CR More than 25% of websites today include schema.org data in JSON-LD The Web of Things description is in CR since May, making use of JSON-LD Verifiable Credentials data model is in CR since July, also making use of JSON-LD Continued strong interest in decentralized identifiers Engagement from the TAG with reframing core documents, such as Ethical Web Principles, to include data on the web within their scope Data is increasingly important for all organizations, especially with the rise of IoT and Big Data. W3C has a mature and extensive suite of standards relating to data that were developed over two decades of experience, with plans for further work on making it easier for developers to work with graph data and knowledge graphs. Linked Data is about the use of URIs as names for things, the ability to dereference these URIs to get further information and to include links to other data. There are ever-increasing sources of open Linked Data on the Web, as well as data services that are restricted to the suppliers and consumers of those services. The digital transformation of industry is seeking to exploit advanced digital technologies. This will facilitate businesses to integrate horizontally along the supply and value chains, and vertically from the factory floor to the office floor. W3C is seeking to make it easier to support enterprise-wide data management and governance, reflecting the strategic importance of data to modern businesses. Traditional approaches to data have focused on tabular databases (SQL/RDBMS), Comma Separated Value (CSV) files, and data embedded in PDF documents and spreadsheets. We're now in midst of a major shift to graph data with nodes and labeled directed links between them. Graph data is: Faster than using SQL and associated JOIN operations More favorable to integrating data from heterogeneous sources Better suited to situations where the data model is evolving In the wake of the recent W3C Workshop on Graph Data we are in the process of launching a Graph Standardization Business Group to provide a business perspective with use cases and requirements, to coordinate technical standards work and liaisons with external organizations. Web for All Security, Privacy, Identity All Security specifications, all Privacy specifications Authentication on the Web As the WebAuthn Level 1 W3C Recommendation published last March is seeing wide implementation and adoption of strong cryptographic authentication, work is proceeding on Level 2. The open standard Web API gives native authentication technology built into native platforms, browsers, operating systems (including mobile) and hardware, offering protection against hacking, credential theft, phishing attacks, thus aiming to end the era of passwords as a security construct. You may read more in our March press release. Privacy An increasing number of W3C specifications are benefitting from Privacy and Security review; there are security and privacy aspects to every specification. Early review is essential. Working with the TAG, the Privacy Interest Group has updated the Self-Review Questionnaire: Security and Privacy. Other recent work of the group includes public blogging further to the exploration of anti-patterns in standards and permission prompts. Security The Web Application Security Working Group adopted Feature Policy, aiming to allow developers to selectively enable, disable, or modify the behavior of some of these browser features and APIs within their application; and Fetch Metadata, aiming to provide servers with enough information to make a priori decisions about whether or not to service a request based on the way it was made, and the context in which it will be used. The Web Payment Security Interest Group, launched last April, convenes members from W3C, EMVCo, and the FIDO Alliance to discuss cooperative work to enhance the security and interoperability of Web payments (read more about payments). Internationalization (i18n) All Internationalization specifications, educational articles related to Internationalization, spec developers checklist Only a quarter or so current Web users use English online and that proportion will continue to decrease as the Web reaches more and more communities of limited English proficiency. If the Web is to live up to the "World Wide" portion of its name, and for the Web to truly work for stakeholders all around the world engaging with content in various languages, it must support the needs of worldwide users as they engage with content in the various languages. The growth of epublishing also brings requirements for new features and improved typography on the Web. It is important to ensure the needs of local communities are captured. The W3C Internationalization Initiative was set up to increase in-house resources dedicated to accelerating progress in making the World Wide Web "worldwide" by gathering user requirements, supporting developers, and education & outreach. For an overview of current projects see the i18n radar. W3C's Internationalization efforts progressed on a number of fronts recently: Requirements: New African and European language groups will work on the gap analysis, errata and layout requirements. Gap analysis: Japanese, Devanagari, Bengali, Tamil, Lao, Khmer, Javanese, and Ethiopic updated in the gap-analysis documents. Layout requirements document: notable progress tracked in the Southeast Asian Task Force while work continues on Chinese layout requirements. Developer support: Spec reviews: the i18n WG continues active review of specifications of the WHATWG and other W3C Working Groups. Short review checklist: easy way to begin a self-review to help spec developers understand what aspects of their spec are likely to need attention for internationalization, and points them to more detailed checklists for the relevant topics. It also helps those reviewing specs for i18n issues. Strings on the Web: Language and Direction Metadata lays out issues and discusses potential solutions for passing information about language and direction with strings in JSON or other data formats. The document was rewritten for clarity, and expanded. The group is collaborating with the JSON-LD and Web Publishing groups to develop a plan for updating RDF, JSON-LD and related specifications to handle metadata for base direction of text (bidi). User-friendly test format: a new format was developed for Internationalization Test Suite tests, which displays helpful information about how the test works. This particularly useful because those tests are pointed to by educational materials and gap-analysis documents. Web Platform Tests: a large number of tests in the i18n test suite have been ported to the WPT repository, including: css-counter-styles, css-ruby, css-syntax, css-test, css-text-decor, css-writing-modes, and css-pseudo. Education & outreach: (for all educational materials, see the HTML & CSS Authoring Techniques) Web Accessibility All Accessibility specifications, WAI resources The Web Accessibility Initiative supports W3C's Web for All mission. Recent achievements include: Education and training: Inaccessibility of CAPTCHA updated to bring our analysis and recommendations up to date with CAPTCHA practice today, concluding two years of extensive work and invaluable input from the public (read more on the W3C Blog Learn why your web content and applications should be accessible. The Education and Outreach Working Group has completed revision and updating of the Business Case for Digital Accessibility. Accessibility guidelines: The Accessibility Guidelines Working Group has continued to update WCAG Techniques and Understanding WCAG 2.1; and published a Candidate Recommendation of Accessibility Conformance Testing Rules Format 1.0 to improve inter-rater reliability when evaluating conformance of web content to WCAG An updated charter is being developed to host work on "Silver", the next generation accessibility guidelines (WCAG 2.2) There are accessibility aspects to most specifications. Check your work with the FAST checklist. Outreach to the world W3C Developer Relations To foster the excellent feedback loop between Web Standards development and Web developers, and to grow participation from that diverse community, recent W3C Developer Relations activities include: @w3cdevs tracks the enormous amount of work happening across W3C W3C Track during the Web Conference 2019 in San Francisco Tech videos: W3C published the 2019 Web Games Workshop videos The 16 September 2019 Developer Meetup in Fukuoka, Japan, is open to all and will combine a set of technical demos prepared by W3C groups, and a series of talks on a selected set of W3C technologies and projects W3C is involved with Mozilla, Google, Samsung, Microsoft and Bocoup in the organization of ViewSource 2019 in Amsterdam (read more on the W3C Blog) W3C Training In partnership with EdX, W3C's MOOC training program, W3Cx offers a complete "Front-End Web Developer" (FEWD) professional certificate program that consists of a suite of five courses on the foundational languages that power the Web: HTML5, CSS and JavaScript. We count nearly 900K students from all over the world. Translations Many Web users rely on translations of documents developed at W3C whose official language is English. W3C is extremely grateful to the continuous efforts of its community in ensuring our various deliverables in general, and in our specifications in particular, are made available in other languages, for free, ensuring their exposure to a much more diverse set of readers. Last Spring we developed a more robust system, a new listing of translations of W3C specifications and updated the instructions on how to contribute to our translation efforts. W3C Liaisons Liaisons and coordination with numerous organizations and Standards Development Organizations (SDOs) is crucial for W3C to: make sure standards are interoperable coordinate our respective agenda in Internet governance: W3C participates in ICANN, GIPO, IGF, the I* organizations (ICANN, IETF, ISOC, IAB). ensure at the government liaison level that our standards work is officially recognized when important to our membership so that products based on them (often done by our members) are part of procurement orders. W3C has ARO/PAS status with ISO. W3C participates in the EU MSP and Rolling Plan on Standardization ensure the global set of Web and Internet standards form a compatible stack of technologies, at the technical and policy level (patent regime, fragmentation, use in policy making) promote Standards adoption equally by the industry, the public sector, and the public at large Coralie Mercier, Editor, W3C Marketing & Communications $Id: Overview.html,v 1.60 2019/10/15 12:05:52 coralie Exp $ Copyright © 2019 W3C ® (MIT, ERCIM, Keio, Beihang) Usage policies apply.
liblime / LibLime KohaLibLime Koha is the most mature of the open source ILS applications. Based on the ground-breaking 3.0 platform (derived from the original Koha offering of 1999), LibLime Koha is a completely web-based open source ILS, with library staff, systems librarians, and library users all accessing LibLime Koha through a web browser. Relying on the MySQL relational database, all LibLime Koha data is readily accessible.