81 skills found · Page 1 of 3
sayantann11 / All Classification Templetes For MLClassification - Machine Learning This is ‘Classification’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn Classification algorithms, types of classification algorithms, support vector machines(SVM), Naive Bayes, Decision Tree and Random Forest Classifier in this tutorial. Objectives Let us look at some of the objectives covered under this section of Machine Learning tutorial. Define Classification and list its algorithms Describe Logistic Regression and Sigmoid Probability Explain K-Nearest Neighbors and KNN classification Understand Support Vector Machines, Polynomial Kernel, and Kernel Trick Analyze Kernel Support Vector Machines with an example Implement the Naïve Bayes Classifier Demonstrate Decision Tree Classifier Describe Random Forest Classifier Classification: Meaning Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well. There are 2 types of Classification: Binomial Multi-Class Classification: Use Cases Some of the key areas where classification cases are being used: To find whether an email received is a spam or ham To identify customer segments To find if a bank loan is granted To identify if a kid will pass or fail in an examination Classification: Example Social media sentiment analysis has two potential outcomes, positive or negative, as displayed by the chart given below. https://www.simplilearn.com/ice9/free_resources_article_thumb/classification-example-machine-learning.JPG This chart shows the classification of the Iris flower dataset into its three sub-species indicated by codes 0, 1, and 2. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-flower-dataset-graph.JPG The test set dots represent the assignment of new test data points to one class or the other based on the trained classifier model. Types of Classification Algorithms Let’s have a quick look into the types of Classification Algorithm below. Linear Models Logistic Regression Support Vector Machines Nonlinear models K-nearest Neighbors (KNN) Kernel Support Vector Machines (SVM) Naïve Bayes Decision Tree Classification Random Forest Classification Logistic Regression: Meaning Let us understand the Logistic Regression model below. This refers to a regression model that is used for classification. This method is widely used for binary classification problems. It can also be extended to multi-class classification problems. Here, the dependent variable is categorical: y ϵ {0, 1} A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc In this case, you model the probability distribution of output y as 1 or 0. This is called the sigmoid probability (σ). If σ(θ Tx) > 0.5, set y = 1, else set y = 0 Unlike Linear Regression (and its Normal Equation solution), there is no closed form solution for finding optimal weights of Logistic Regression. Instead, you must solve this with maximum likelihood estimation (a probability model to detect the maximum likelihood of something happening). It can be used to calculate the probability of a given outcome in a binary model, like the probability of being classified as sick or passing an exam. https://www.simplilearn.com/ice9/free_resources_article_thumb/logistic-regression-example-graph.JPG Sigmoid Probability The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve): https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-function-machine-learning.JPG In this equation, t represents data values * the number of hours studied and S(t) represents the probability of passing the exam. Assume sigmoid function: https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-probability-machine-learning.JPG g(z) tends toward 1 as z -> infinity , and g(z) tends toward 0 as z -> infinity K-nearest Neighbors (KNN) K-nearest Neighbors algorithm is used to assign a data point to clusters based on similarity measurement. It uses a supervised method for classification. The steps to writing a k-means algorithm are as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-distribution-graph-machine-learning.JPG Choose the number of k and a distance metric. (k = 5 is common) Find k-nearest neighbors of the sample that you want to classify Assign the class label by majority vote. KNN Classification A new input point is classified in the category such that it has the most number of neighbors from that category. For example: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-classification-machine-learning.JPG Classify a patient as high risk or low risk. Mark email as spam or ham. Keen on learning about Classification Algorithms in Machine Learning? Click here! Support Vector Machine (SVM) Let us understand Support Vector Machine (SVM) in detail below. SVMs are classification algorithms used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Once ideal hyperplanes are discovered, new data points can be easily classified. https://www.simplilearn.com/ice9/free_resources_article_thumb/support-vector-machines-graph-machine-learning.JPG The optimization objective is to find “maximum margin hyperplane” that is farthest from the closest points in the two classes (these points are called support vectors). In the given figure, the middle line represents the hyperplane. SVM Example Let’s look at this image below and have an idea about SVM in general. Hyperplanes with larger margins have lower generalization error. The positive and negative hyperplanes are represented by: https://www.simplilearn.com/ice9/free_resources_article_thumb/positive-negative-hyperplanes-machine-learning.JPG Classification of any new input sample xtest : If w0 + wTxtest > 1, the sample xtest is said to be in the class toward the right of the positive hyperplane. If w0 + wTxtest < -1, the sample xtest is said to be in the class toward the left of the negative hyperplane. When you subtract the two equations, you get: https://www.simplilearn.com/ice9/free_resources_article_thumb/equation-subtraction-machine-learning.JPG Length of vector w is (L2 norm length): https://www.simplilearn.com/ice9/free_resources_article_thumb/length-of-vector-machine-learning.JPG You normalize with the length of w to arrive at: https://www.simplilearn.com/ice9/free_resources_article_thumb/normalize-equation-machine-learning.JPG SVM: Hard Margin Classification Given below are some points to understand Hard Margin Classification. The left side of equation SVM-1 given above can be interpreted as the distance between the positive (+ve) and negative (-ve) hyperplanes; in other words, it is the margin that can be maximized. Hence the objective of the function is to maximize with the constraint that the samples are classified correctly, which is represented as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-machine-learning.JPG This means that you are minimizing ‖w‖. This also means that all positive samples are on one side of the positive hyperplane and all negative samples are on the other side of the negative hyperplane. This can be written concisely as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-formula.JPG Minimizing ‖w‖ is the same as minimizing. This figure is better as it is differentiable even at w = 0. The approach listed above is called “hard margin linear SVM classifier.” SVM: Soft Margin Classification Given below are some points to understand Soft Margin Classification. To allow for linear constraints to be relaxed for nonlinearly separable data, a slack variable is introduced. (i) measures how much ith instance is allowed to violate the margin. The slack variable is simply added to the linear constraints. https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-machine-learning.JPG Subject to the above constraints, the new objective to be minimized becomes: https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-formula.JPG You have two conflicting objectives now—minimizing slack variable to reduce margin violations and minimizing to increase the margin. The hyperparameter C allows us to define this trade-off. Large values of C correspond to larger error penalties (so smaller margins), whereas smaller values of C allow for higher misclassification errors and larger margins. https://www.simplilearn.com/ice9/free_resources_article_thumb/machine-learning-certification-video-preview.jpg SVM: Regularization The concept of C is the reverse of regularization. Higher C means lower regularization, which increases bias and lowers the variance (causing overfitting). https://www.simplilearn.com/ice9/free_resources_article_thumb/concept-of-c-graph-machine-learning.JPG IRIS Data Set The Iris dataset contains measurements of 150 IRIS flowers from three different species: Setosa Versicolor Viriginica Each row represents one sample. Flower measurements in centimeters are stored as columns. These are called features. IRIS Data Set: SVM Let’s train an SVM model using sci-kit-learn for the Iris dataset: https://www.simplilearn.com/ice9/free_resources_article_thumb/svm-model-graph-machine-learning.JPG Nonlinear SVM Classification There are two ways to solve nonlinear SVMs: by adding polynomial features by adding similarity features Polynomial features can be added to datasets; in some cases, this can create a linearly separable dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/nonlinear-classification-svm-machine-learning.JPG In the figure on the left, there is only 1 feature x1. This dataset is not linearly separable. If you add x2 = (x1)2 (figure on the right), the data becomes linearly separable. Polynomial Kernel In sci-kit-learn, one can use a Pipeline class for creating polynomial features. Classification results for the Moons dataset are shown in the figure. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-machine-learning.JPG Polynomial Kernel with Kernel Trick Let us look at the image below and understand Kernel Trick in detail. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-with-kernel-trick.JPG For large dimensional datasets, adding too many polynomial features can slow down the model. You can apply a kernel trick with the effect of polynomial features without actually adding them. The code is shown (SVC class) below trains an SVM classifier using a 3rd-degree polynomial kernel but with a kernel trick. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-equation-machine-learning.JPG The hyperparameter coefθ controls the influence of high-degree polynomials. Kernel SVM Let us understand in detail about Kernel SVM. Kernel SVMs are used for classification of nonlinear data. In the chart, nonlinear data is projected into a higher dimensional space via a mapping function where it becomes linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-machine-learning.JPG In the higher dimension, a linear separating hyperplane can be derived and used for classification. A reverse projection of the higher dimension back to original feature space takes it back to nonlinear shape. As mentioned previously, SVMs can be kernelized to solve nonlinear classification problems. You can create a sample dataset for XOR gate (nonlinear problem) from NumPy. 100 samples will be assigned the class sample 1, and 100 samples will be assigned the class label -1. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-graph-machine-learning.JPG As you can see, this data is not linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-non-separable.JPG You now use the kernel trick to classify XOR dataset created earlier. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-xor-machine-learning.JPG Naïve Bayes Classifier What is Naive Bayes Classifier? Have you ever wondered how your mail provider implements spam filtering or how online news channels perform news text classification or even how companies perform sentiment analysis of their audience on social media? All of this and more are done through a machine learning algorithm called Naive Bayes Classifier. Naive Bayes Named after Thomas Bayes from the 1700s who first coined this in the Western literature. Naive Bayes classifier works on the principle of conditional probability as given by the Bayes theorem. Advantages of Naive Bayes Classifier Listed below are six benefits of Naive Bayes Classifier. Very simple and easy to implement Needs less training data Handles both continuous and discrete data Highly scalable with the number of predictors and data points As it is fast, it can be used in real-time predictions Not sensitive to irrelevant features Bayes Theorem We will understand Bayes Theorem in detail from the points mentioned below. According to the Bayes model, the conditional probability P(Y|X) can be calculated as: P(Y|X) = P(X|Y)P(Y) / P(X) This means you have to estimate a very large number of P(X|Y) probabilities for a relatively small vector space X. For example, for a Boolean Y and 30 possible Boolean attributes in the X vector, you will have to estimate 3 billion probabilities P(X|Y). To make it practical, a Naïve Bayes classifier is used, which assumes conditional independence of P(X) to each other, with a given value of Y. This reduces the number of probability estimates to 2*30=60 in the above example. Naïve Bayes Classifier for SMS Spam Detection Consider a labeled SMS database having 5574 messages. It has messages as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-machine-learning.JPG Each message is marked as spam or ham in the data set. Let’s train a model with Naïve Bayes algorithm to detect spam from ham. The message lengths and their frequency (in the training dataset) are as shown below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-spam-detection.JPG Analyze the logic you use to train an algorithm to detect spam: Split each message into individual words/tokens (bag of words). Lemmatize the data (each word takes its base form, like “walking” or “walked” is replaced with “walk”). Convert data to vectors using scikit-learn module CountVectorizer. Run TFIDF to remove common words like “is,” “are,” “and.” Now apply scikit-learn module for Naïve Bayes MultinomialNB to get the Spam Detector. This spam detector can then be used to classify a random new message as spam or ham. Next, the accuracy of the spam detector is checked using the Confusion Matrix. For the SMS spam example above, the confusion matrix is shown on the right. Accuracy Rate = Correct / Total = (4827 + 592)/5574 = 97.21% Error Rate = Wrong / Total = (155 + 0)/5574 = 2.78% https://www.simplilearn.com/ice9/free_resources_article_thumb/confusion-matrix-machine-learning.JPG Although confusion Matrix is useful, some more precise metrics are provided by Precision and Recall. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-recall-matrix-machine-learning.JPG Precision refers to the accuracy of positive predictions. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-formula-machine-learning.JPG Recall refers to the ratio of positive instances that are correctly detected by the classifier (also known as True positive rate or TPR). https://www.simplilearn.com/ice9/free_resources_article_thumb/recall-formula-machine-learning.JPG Precision/Recall Trade-off To detect age-appropriate videos for kids, you need high precision (low recall) to ensure that only safe videos make the cut (even though a few safe videos may be left out). The high recall is needed (low precision is acceptable) in-store surveillance to catch shoplifters; a few false alarms are acceptable, but all shoplifters must be caught. Learn about Naive Bayes in detail. Click here! Decision Tree Classifier Some aspects of the Decision Tree Classifier mentioned below are. Decision Trees (DT) can be used both for classification and regression. The advantage of decision trees is that they require very little data preparation. They do not require feature scaling or centering at all. They are also the fundamental components of Random Forests, one of the most powerful ML algorithms. Unlike Random Forests and Neural Networks (which do black-box modeling), Decision Trees are white box models, which means that inner workings of these models are clearly understood. In the case of classification, the data is segregated based on a series of questions. Any new data point is assigned to the selected leaf node. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-machine-learning.JPG Start at the tree root and split the data on the feature using the decision algorithm, resulting in the largest information gain (IG). This splitting procedure is then repeated in an iterative process at each child node until the leaves are pure. This means that the samples at each node belonging to the same class. In practice, you can set a limit on the depth of the tree to prevent overfitting. The purity is compromised here as the final leaves may still have some impurity. The figure shows the classification of the Iris dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-graph.JPG IRIS Decision Tree Let’s build a Decision Tree using scikit-learn for the Iris flower dataset and also visualize it using export_graphviz API. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-machine-learning.JPG The output of export_graphviz can be converted into png format: https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-output.JPG Sample attribute stands for the number of training instances the node applies to. Value attribute stands for the number of training instances of each class the node applies to. Gini impurity measures the node’s impurity. A node is “pure” (gini=0) if all training instances it applies to belong to the same class. https://www.simplilearn.com/ice9/free_resources_article_thumb/impurity-formula-machine-learning.JPG For example, for Versicolor (green color node), the Gini is 1-(0/54)2 -(49/54)2 -(5/54) 2 ≈ 0.168 https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-sample.JPG Decision Boundaries Let us learn to create decision boundaries below. For the first node (depth 0), the solid line splits the data (Iris-Setosa on left). Gini is 0 for Setosa node, so no further split is possible. The second node (depth 1) splits the data into Versicolor and Virginica. If max_depth were set as 3, a third split would happen (vertical dotted line). https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-boundaries.JPG For a sample with petal length 5 cm and petal width 1.5 cm, the tree traverses to depth 2 left node, so the probability predictions for this sample are 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54), and 9.3% for Iris-Virginica (5/54) CART Training Algorithm Scikit-learn uses Classification and Regression Trees (CART) algorithm to train Decision Trees. CART algorithm: Split the data into two subsets using a single feature k and threshold tk (example, petal length < “2.45 cm”). This is done recursively for each node. k and tk are chosen such that they produce the purest subsets (weighted by their size). The objective is to minimize the cost function as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/cart-training-algorithm-machine-learning.JPG The algorithm stops executing if one of the following situations occurs: max_depth is reached No further splits are found for each node Other hyperparameters may be used to stop the tree: min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes Gini Impurity or Entropy Entropy is one more measure of impurity and can be used in place of Gini. https://www.simplilearn.com/ice9/free_resources_article_thumb/gini-impurity-entrophy.JPG It is a degree of uncertainty, and Information Gain is the reduction that occurs in entropy as one traverses down the tree. Entropy is zero for a DT node when the node contains instances of only one class. Entropy for depth 2 left node in the example given above is: https://www.simplilearn.com/ice9/free_resources_article_thumb/entrophy-for-depth-2.JPG Gini and Entropy both lead to similar trees. DT: Regularization The following figure shows two decision trees on the moons dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/dt-regularization-machine-learning.JPG The decision tree on the right is restricted by min_samples_leaf = 4. The model on the left is overfitting, while the model on the right generalizes better. Random Forest Classifier Let us have an understanding of Random Forest Classifier below. A random forest can be considered an ensemble of decision trees (Ensemble learning). Random Forest algorithm: Draw a random bootstrap sample of size n (randomly choose n samples from the training set). Grow a decision tree from the bootstrap sample. At each node, randomly select d features. Split the node using the feature that provides the best split according to the objective function, for instance by maximizing the information gain. Repeat the steps 1 to 2 k times. (k is the number of trees you want to create, using a subset of samples) Aggregate the prediction by each tree for a new data point to assign the class label by majority vote (pick the group selected by the most number of trees and assign new data point to that group). Random Forests are opaque, which means it is difficult to visualize their inner workings. https://www.simplilearn.com/ice9/free_resources_article_thumb/random-forest-classifier-graph.JPG However, the advantages outweigh their limitations since you do not have to worry about hyperparameters except k, which stands for the number of decision trees to be created from a subset of samples. RF is quite robust to noise from the individual decision trees. Hence, you need not prune individual decision trees. The larger the number of decision trees, the more accurate the Random Forest prediction is. (This, however, comes with higher computation cost). Key Takeaways Let us quickly run through what we have learned so far in this Classification tutorial. Classification algorithms are supervised learning methods to split data into classes. They can work on Linear Data as well as Nonlinear Data. Logistic Regression can classify data based on weighted parameters and sigmoid conversion to calculate the probability of classes. K-nearest Neighbors (KNN) algorithm uses similar features to classify data. Support Vector Machines (SVMs) classify data by detecting the maximum margin hyperplane between data classes. Naïve Bayes, a simplified Bayes Model, can help classify data using conditional probability models. Decision Trees are powerful classifiers and use tree splitting logic until pure or somewhat pure leaf node classes are attained. Random Forests apply Ensemble Learning to Decision Trees for more accurate classification predictions. Conclusion This completes ‘Classification’ tutorial. In the next tutorial, we will learn 'Unsupervised Learning with Clustering.'
molyswu / Hand Detectionusing Neural Networks (SSD) on Tensorflow. This repo documents steps and scripts used to train a hand detector using Tensorflow (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements. The goal of this repo/post is to demonstrate how neural networks can be applied to the (hard) problem of tracking hands (egocentric and other views). Better still, provide code that can be adapted to other uses cases. If you use this tutorial or models in your research or project, please cite [this](#citing-this-tutorial). Here is the detector in action. <img src="images/hand1.gif" width="33.3%"><img src="images/hand2.gif" width="33.3%"><img src="images/hand3.gif" width="33.3%"> Realtime detection on video stream from a webcam . <img src="images/chess1.gif" width="33.3%"><img src="images/chess2.gif" width="33.3%"><img src="images/chess3.gif" width="33.3%"> Detection on a Youtube video. Both examples above were run on a macbook pro **CPU** (i7, 2.5GHz, 16GB). Some fps numbers are: | FPS | Image Size | Device| Comments| | ------------- | ------------- | ------------- | ------------- | | 21 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run without visualizing results| | 16 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | | 11 | 640 * 480 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | > Note: The code in this repo is written and tested with Tensorflow `1.4.0-rc0`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581). You may need to [generate your own frozen model](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/) graph using the [model checkpoints](model-checkpoint) in the repo to fit your TF version. **Content of this document** - Motivation - Why Track/Detect hands with Neural Networks - Data preparation and network training in Tensorflow (Dataset, Import, Training) - Training the hand detection Model - Using the Detector to Detect/Track hands - Thoughts on Optimizations. > P.S if you are using or have used the models provided here, feel free to reach out on twitter ([@vykthur](https://twitter.com/vykthur)) and share your work! ## Motivation - Why Track/Detect hands with Neural Networks? There are several existing approaches to tracking hands in the computer vision domain. Incidentally, many of these approaches are rule based (e.g extracting background based on texture and boundary features, distinguishing between hands and background using color histograms and HOG classifiers,) making them not very robust. For example, these algorithms might get confused if the background is unusual or in situations where sharp changes in lighting conditions cause sharp changes in skin color or the tracked object becomes occluded.(see [here for a review](https://www.cse.unr.edu/~bebis/handposerev.pdf) paper on hand pose estimation from the HCI perspective) With sufficiently large datasets, neural networks provide opportunity to train models that perform well and address challenges of existing object tracking/detection algorithms - varied/poor lighting, noisy environments, diverse viewpoints and even occlusion. The main drawbacks to usage for real-time tracking/detection is that they can be complex, are relatively slow compared to tracking-only algorithms and it can be quite expensive to assemble a good dataset. But things are changing with advances in fast neural networks. Furthermore, this entire area of work has been made more approachable by deep learning frameworks (such as the tensorflow object detection api) that simplify the process of training a model for custom object detection. More importantly, the advent of fast neural network models like ssd, faster r-cnn, rfcn (see [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models) ) etc make neural networks an attractive candidate for real-time detection (and tracking) applications. Hopefully, this repo demonstrates this. > If you are not interested in the process of training the detector, you can skip straight to applying the [pretrained model I provide in detecting hands](#detecting-hands). Training a model is a multi-stage process (assembling dataset, cleaning, splitting into training/test partitions and generating an inference graph). While I lightly touch on the details of these parts, there are a few other tutorials cover training a custom object detector using the tensorflow object detection api in more detail[ see [here](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) and [here](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) ]. I recommend you walk through those if interested in training a custom object detector from scratch. ## Data preparation and network training in Tensorflow (Dataset, Import, Training) **The Egohands Dataset** The hand detector model is built using data from the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) dataset. This dataset works well for several reasons. It contains high quality, pixel level annotations (>15000 ground truth labels) where hands are located across 4800 images. All images are captured from an egocentric view (Google glass) across 48 different environments (indoor, outdoor) and activities (playing cards, chess, jenga, solving puzzles etc). <img src="images/egohandstrain.jpg" width="100%"> If you will be using the Egohands dataset, you can cite them as follows: > Bambach, Sven, et al. "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions." Proceedings of the IEEE International Conference on Computer Vision. 2015. The Egohands dataset (zip file with labelled data) contains 48 folders of locations where video data was collected (100 images per folder). ``` -- LOCATION_X -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder -- LOCATION_Y -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder ``` **Converting data to Tensorflow Format** Some initial work needs to be done to the Egohands dataset to transform it into the format (`tfrecord`) which Tensorflow needs to train a model. This repo contains `egohands_dataset_clean.py` a script that will help you generate these csv files. - Downloads the egohands datasets - Renames all files to include their directory names to ensure each filename is unique - Splits the dataset into train (80%), test (10%) and eval (10%) folders. - Reads in `polygons.mat` for each folder, generates bounding boxes and visualizes them to ensure correctness (see image above). - Once the script is done running, you should have an images folder containing three folders - train, test and eval. Each of these folders should also contain a csv label document each - `train_labels.csv`, `test_labels.csv` that can be used to generate `tfrecords` Note: While the egohands dataset provides four separate labels for hands (own left, own right, other left, and other right), for my purpose, I am only interested in the general `hand` class and label all training data as `hand`. You can modify the data prep script to generate `tfrecords` that support 4 labels. Next: convert your dataset + csv files to tfrecords. A helpful guide on this can be found [here](https://pythonprogramming.net/creating-tfrecord-files-tensorflow-object-detection-api-tutorial/).For each folder, you should be able to generate `train.record`, `test.record` required in the training process. ## Training the hand detection Model Now that the dataset has been assembled (and your tfrecords), the next task is to train a model based on this. With neural networks, it is possible to use a process called [transfer learning](https://www.tensorflow.org/tutorials/image_retraining) to shorten the amount of time needed to train the entire model. This means we can take an existing model (that has been trained well on a related domain (here image classification) and retrain its final layer(s) to detect hands for us. Sweet!. Given that neural networks sometimes have thousands or millions of parameters that can take weeks or months to train, transfer learning helps shorten training time to possibly hours. Tensorflow does offer a few models (in the tensorflow [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models)) and I chose to use the `ssd_mobilenet_v1_coco` model as my start point given it is currently (one of) the fastest models (read the SSD research [paper here](https://arxiv.org/pdf/1512.02325.pdf)). The training process can be done locally on your CPU machine which may take a while or better on a (cloud) GPU machine (which is what I did). For reference, training on my macbook pro (tensorflow compiled from source to take advantage of the mac's cpu architecture) the maximum speed I got was 5 seconds per step as opposed to the ~0.5 seconds per step I got with a GPU. For reference it would take about 12 days to run 200k steps on my mac (i7, 2.5GHz, 16GB) compared to ~5hrs on a GPU. > **Training on your own images**: Please use the [guide provided by Harrison from pythonprogramming](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) on how to generate tfrecords given your label csv files and your images. The guide also covers how to start the training process if training locally. [see [here] (https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/)]. If training in the cloud using a service like GCP, see the [guide here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md). As the training process progresses, the expectation is that total loss (errors) gets reduced to its possible minimum (about a value of 1 or thereabout). By observing the tensorboard graphs for total loss(see image below), it should be possible to get an idea of when the training process is complete (total loss does not decrease with further iterations/steps). I ran my training job for 200k steps (took about 5 hours) and stopped at a total Loss (errors) value of 2.575.(In retrospect, I could have stopped the training at about 50k steps and gotten a similar total loss value). With tensorflow, you can also run an evaluation concurrently that assesses your model to see how well it performs on the test data. A commonly used metric for performance is mean average precision (mAP) which is single number used to summarize the area under the precision-recall curve. mAP is a measure of how well the model generates a bounding box that has at least a 50% overlap with the ground truth bounding box in our test dataset. For the hand detector trained here, the mAP value was **0.9686@0.5IOU**. mAP values range from 0-1, the higher the better. <img src="images/accuracy.jpg" width="100%"> Once training is completed, the trained inference graph (`frozen_inference_graph.pb`) is then exported (see the earlier referenced guides for how to do this) and saved in the `hand_inference_graph` folder. Now its time to do some interesting detection. ## Using the Detector to Detect/Track hands If you have not done this yet, please following the guide on installing [Tensorflow and the Tensorflow object detection api](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). This will walk you through setting up the tensorflow framework, cloning the tensorflow github repo and a guide on - Load the `frozen_inference_graph.pb` trained on the hands dataset as well as the corresponding label map. In this repo, this is done in the `utils/detector_utils.py` script by the `load_inference_graph` method. ```python detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') sess = tf.Session(graph=detection_graph) print("> ====== Hand Inference graph loaded.") ``` - Detect hands. In this repo, this is done in the `utils/detector_utils.py` script by the `detect_objects` method. ```python (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) ``` - Visualize detected bounding detection_boxes. In this repo, this is done in the `utils/detector_utils.py` script by the `draw_box_on_image` method. This repo contains two scripts that tie all these steps together. - detect_multi_threaded.py : A threaded implementation for reading camera video input detection and detecting. Takes a set of command line flags to set parameters such as `--display` (visualize detections), image parameters `--width` and `--height`, videe `--source` (0 for camera) etc. - detect_single_threaded.py : Same as above, but single threaded. This script works for video files by setting the video source parameter videe `--source` (path to a video file). ```cmd # load and run detection on video at path "videos/chess.mov" python detect_single_threaded.py --source videos/chess.mov ``` > Update: If you do have errors loading the frozen inference graph in this repo, feel free to generate a new graph that fits your TF version from the model-checkpoint in this repo. Use the [export_inference_graph.py](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py) script provided in the tensorflow object detection api repo. More guidance on this [here](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/). ## Thoughts on Optimization. A few things that led to noticeable performance increases. - Threading: Turns out that reading images from a webcam is a heavy I/O event and if run on the main application thread can slow down the program. I implemented some good ideas from [Adrian Rosebuck](https://www.pyimagesearch.com/2017/02/06/faster-video-file-fps-with-cv2-videocapture-and-opencv/) on parrallelizing image capture across multiple worker threads. This mostly led to an FPS increase of about 5 points. - For those new to Opencv, images from the `cv2.read()` method return images in [BGR format](https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/). Ensure you convert to RGB before detection (accuracy will be much reduced if you dont). ```python cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) ``` - Keeping your input image small will increase fps without any significant accuracy drop.(I used about 320 x 240 compared to the 1280 x 720 which my webcam provides). - Model Quantization. Moving from the current 32 bit to 8 bit can achieve up to 4x reduction in memory required to load and store models. One way to further speed up this model is to explore the use of [8-bit fixed point quantization](https://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd). Performance can also be increased by a clever combination of tracking algorithms with the already decent detection and this is something I am still experimenting with. Have ideas for optimizing better, please share! <img src="images/general.jpg" width="100%"> Note: The detector does reflect some limitations associated with the training set. This includes non-egocentric viewpoints, very noisy backgrounds (e.g in a sea of hands) and sometimes skin tone. There is opportunity to improve these with additional data. ## Integrating Multiple DNNs. One way to make things more interesting is to integrate our new knowledge of where "hands" are with other detectors trained to recognize other objects. Unfortunately, while our hand detector can in fact detect hands, it cannot detect other objects (a factor or how it is trained). To create a detector that classifies multiple different objects would mean a long involved process of assembling datasets for each class and a lengthy training process. > Given the above, a potential strategy is to explore structures that allow us **efficiently** interleave output form multiple pretrained models for various object classes and have them detect multiple objects on a single image. An example of this is with my primary use case where I am interested in understanding the position of objects on a table with respect to hands on same table. I am currently doing some work on a threaded application that loads multiple detectors and outputs bounding boxes on a single image. More on this soon.
ksm26 / LangChain Chat With Your DataExplore LangChain and build powerful chatbots that interact with your own data. Gain insights into document loading, splitting, retrieval, question answering, and more.
GigaVision / PANDA ToolkitPANDA dataset tool kit for data visualization, splitting, merging, and result evaluation.
marieai / Marie AIComplex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
shenwanxiang / ChemBenchMoleculeNet benchmark dataset & MolMapNet dataset
ZhuPengsen / Method For Splitting The DeepShip DatasetMethod for Splitting the DeepShip Dataset
zhaohuabing / Istio Redis CulsterUse Istio to enable Envoy Redis Cluster support, including data sharding, read/write splitting, and traffic mirroring, all the magics are done by Istio and Envoy proxy, without any awareness at the client side.
WH-Wei / SIAT Lower Limb Motion Dataset CodesThese codes for reading, pre-processing of sEMG, splitting of sEMG into windows of various sizes, extracting of sEMG features, normalization of extracted features, generation of sample data, and making log files are provided for easy handling of the SIAT Lower Limb Motion Dataset (SIAT-LLMD). 代码可以用于SIAT-LLMD,便于读取数据、预处理肌电信号、分窗、肌电特征提取、归一化、生成数据样本并记录日志文件。当然,有些代码(例如:'EMGbox_WWH.py')也可以用于处理数据集之外的肌电信号。
privefl / BigreadrR package to read large text files based on splitting + data.table::fread
sharejing / TakinA Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
twardoch / Split Markdown4gptA Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.
Wallace-Best / Best<!DOCTYPE html>Wallace-Best <html lang="en-us"> <head> <link rel="node" href="//a.wallace-bestcdn.com/1391808583/img/favicon16-32.ico" type="image/vnd.microsoft.icon"> <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> <meta http-equiv="Content-Language" content="en-us"> <meta name="keywords" content="Wallace Best, wallace-best.com, comments, blog, blogs, discussion"> <meta name="description" content="Wallace Best's Network is a global comment system that improves discussion on websites and connects conversations across the web."> <meta name="world" value="notranslate" /> <title> WB Admin | Sign-in </title> <script type="text/javascript" charset="utf-8"> document.domain = 'wallace-best.com'; if (window.context === undefined) { var context = {}; } context.wallace-bestUrl = 'https://wallace-best.com'; context.wallace-bestDomain = 'wallace-best.com'; context.mediaUrl = '//a.wallace-bestcdn.com/1391808583/'; context.uploadsUrl = '//a.wallace.bestcdn.com/uploads'; context.sslUploadsUrl = '//a.wallace-bestcdn.com/uploads'; context.loginUrl = 'https://wallace-best.com/profile/login/'; context.signupUrl = 'https://wallace-best.com/profile/signup/'; context.apiUrl = '//wallace-best.com/api/3.0/'; context.apiPublicKey = 'Y1S1wGIzdc63qnZ5rhHfjqEABGA4ZTDncauWFFWWTUBqkmLjdxloTb7ilhGnZ7z1'; context.forum = null; context.adminUrl = 'https://wallace-best.com'; context.switches = { "explore_dashboard_2":false, "partitions:api:posts/countPendin":false, "use_rs_paginator_30m":false, "inline_defaults_css":false, "evm_publisher_reports":true, "postsort":false, "enable_entropy_filtering":false, "exp_newnav":true, "organic_discovery_experiments":false, "realtime_for_oldies":false, "firehose_push":true, "website_addons":true, "addons_ab_test":false, "firehose_gnip_http":true, "community_icon":true, "pub_reporting_v2":true, "pd_thumbnail_settings":true, "algorithm_experiments":false, "discovery_log_to_browser":false, "is_last_modified":true, "embed_category_display":false, "partitions:api:forums/listPosts":false, "shardpost":true, "limit_get_posts_days_30d":true, "next_realtime_anim_disabled":false, "juggler_thread_onReady":true, "firehose_realertime":false, "loginas":true, "juggler_enabled":true, "user_onboarding":true, "website_follow_redirect":true, "raven_js":true, "shardpost:index":true, "filter_ads_by_country":true, "new_sort_paginator":true, "threadident_reads":true, "new_media":true, "enable_link_affiliation":true, "show_unapproved":false, "onboarding_profile_editing":true, "partitions":true, "dotcom_marketing":true, "discovery_analytics":true, "exp_newnav_disable":true, "new_community_nav_embed":true, "discussions_tab":true, "embed_less_refactor":false, "use_rs_paginator_60m":true, "embed_labs":false, "auto_flat_sort":false, "disable_moderate_ascending":true, "disable_realtime":true, "partitions:api":true, "digest_thread_votes":true, "shardpost:paginator":false, "debug_js":false, "exp_mn2":false, "limit_get_posts_days_7d":true, "pinnedcomments":false, "use_queue_b":true, "new_embed_profile":true, "next_track_links":true, "postsort:paginator":true, "simple_signup":true, "static_styles":true, "stats":true, "discovery_next":true, "override_skip_syslog":false, "show_captcha_on_links":true, "exp_mn2_force":false, "next_dragdrop_nag":true, "firehose_gnip":true, "firehose_pubsub":true, "rt_go_backend":false, "dark_jester":true, "next_logging":false, "surveyNotice":false, "tipalti_payments":true, "default_trusted_domain":false, "disqus_trends":false, "log_large_querysets":false, "phoenix":false, "exp_autoonboard":true, "lazy_embed":false, "explore_dashboard":true, "partitions:api:posts/list":true, "support_contact_with_frames":true, "use_rs_paginator_5m":true, "limit_textdigger":true, "embed_redirect":false, "logging":false, "exp_mn2_disable":true, "aggressive_embed_cache":true, "dashboard_client":false, "safety_levels_enabled":true, "partitions:api:categories/listPo":false, "next_show_new_media":true, "next_realtime_cap":false, "next_discard_low_rep":true, "next_streaming_realtime":false, "partitions:api:threads/listPosts":false, "textdigger_crawler":true }; context.urlMap = { 'signup': 'https://wallace-best.com/admin/signup/', 'dashboard': 'http://wallace-best.com/dashboard/', 'admin': 'http://wallace-best.com/admin/', 'logout': '//wallace-best.com/logout/', 'home': 'https://wallace-best.com', 'for_websites': 'http://wallace-best.com/websites/', 'login': 'https://wallace-best.com/profile/login/' }; context.navMap = { 'signup': '', 'dashboard': '', 'admin': '', 'addons': '' }; </script> <script src="//a.wallace-bestcdn.com/1391808583/js/src/auth_context.js" type="text/javascript" charset="utf-8"></script> <link rel="stylesheet" href="//a.wallace-bestdn.com/1391808583/build/css/b31fb2fa3905.css" type="text/css" /> <script type="text/javascript" src="//a.wallace-bestcdn.com/1391808583/build/js/5ee01877d131.js"></script> <script> // // shared/foundation.js // // This file contains the absolute minimum code necessary in order // to create a new application in the WALLACE-BEST namespace. // // You should load this file *before* anything that modifies the WALLACE-BEST global. // /*jshint browser:true, undef:true, strict:true, expr:true, white:true */ /*global wallace-best:true */ var WALLACE-BEST = (function (window, undefined) { "use strict"; var wallace-best = window.wallace-best || {}; // Exception thrown from wallace-best.assert method on failure wallace-best.AssertionError = function (message) { this.message = message; }; wallace-best.AssertionError.prototype.toString = function () { return 'Assertion Error: ' + (this.message || '[no message]'); }; // Raises a wallace-best.AssertionError if value is falsy wallace-best.assert = function (value, message, soft) { if (value) return; if (soft) window.console && window.console.log("DISQUS assertion failed: " + message); else throw new wallace-best.AssertionError(message); }; // Functions to clean attached modules (used by define and cleanup) var cleanFuncs = []; // Attaches a new public interface (module) to the wallace-best namespace. // For example, if wallace-best object is { 'a': { 'b': {} } }: // // wallace-best.define('a.b.c', function () { return { 'd': 'hello' }; }); will transform it into // -> { 'a': { 'b': { 'c': { 'd' : hello' }}}} // // and wallace-best.define('a', function () { return { 'x': 'world' }; }); will transform it into // -> { 'a': { 'b': {}}, 'x': 'world' } // // Attach modules to wallace-best using only this function. wallace-best.define = function (name, fn) { /*jshint loopfunc:true */ if (typeof name === 'function') { fn = name; name = ''; } var parts = name.split('.'); var part = parts.shift(); var cur = wallace-best; var exports = (fn || function () { return {}; }).call({ overwrites: function (obj) { obj.__overwrites__ = true; return obj; } }, window); while (part) { cur = (cur[part] ? cur[part] : cur[part] = {}); part = parts.shift(); } for (var key in exports) { if (!exports.hasOwnProperty(key)) continue; /*jshint eqnull:true */ if (!exports.__overwrites__ && cur[key] !== null) { wallace-best.assert(!cur.hasOwnProperty(key), 'Unsafe attempt to redefine existing module: ' + key, true /* soft assertion */); } cur[key] = exports[key]; cleanFuncs.push(function (cur, key) { return function () { delete cur[key]; }; }(cur, key)); } return cur; }; // Alias for wallace-best.define for the sake of semantics. // You should use it when you need to get a reference to another // wallace-best module before that module is defined: // // var collections = wallace-best.use('lounge.collections'); // // wallace-best.use is a single argument function because we don't // want to encourage people to use it instead of wallace-best.define. wallace-best.use = function (name) { return wallace-best.define(name); }; wallace-best.cleanup = function () { for (var i = 0; i < cleanFuncs.length; i++) { cleanFuncs[i](); } }; return wallace-best; })(window); /*jshint expr:true, undef:true, strict:true, white:true, browser:true */ /*global wallace-best:false*/ // // shared/corefuncs.js // wallace-best.define(function (window, undefined) { "use strict"; var wallace-best = window.wallace-best; var document = window.document; var head = document.getElementsByTagName('head')[0] || document.body; var jobs = { running: false, timer: null, queue: [] }; var uid = 0; // Taken from _.uniqueId wallace-best.getUid = function (prefix) { var id = ++uid + ''; return prefix ? prefix + id : id; }; /* Defers func() execution until cond() is true */ wallace-best.defer = function (cond, func) { function beat() { /*jshint boss:true */ var queue = jobs.queue; if (queue.length === 0) { jobs.running = false; clearInterval(jobs.timer); } for (var i = 0, pair; pair = queue[i]; i++) { if (pair[0]()) { queue.splice(i--, 1); pair[1](); } } } jobs.queue.push([cond, func]); beat(); if (!jobs.running) { jobs.running = true; jobs.timer = setInterval(beat, 100); } }; wallace-best.isOwn = function (obj, key) { // The object.hasOwnProperty method fails when the // property under consideration is named 'hasOwnProperty'. return Object.prototype.hasOwnProperty.call(obj, key); }; wallace-best.isString = function (str) { return Object.prototype.toString.call(str) === "[object String]"; }; /* * Iterates over an object or a collection and calls a callback * function with each item as a parameter. */ wallace-best.each = function (collection, callback) { var length = collection.length, forEach = Array.prototype.forEach; if (!isNaN(length)) { // Treat collection as an array if (forEach) { forEach.call(collection, callback); } else { for (var i = 0; i < length; i++) { callback(collection[i], i, collection); } } } else { // Treat collection as an object for (var key in collection) { if (wallace-best.isOwn(collection, key)) { callback(collection[key], key, collection); } } } }; // Borrowed from underscore wallace-best.extend = function (obj) { wallace-best.each(Array.prototype.slice.call(arguments, 1), function (source) { for (var prop in source) { obj[prop] = source[prop]; } }); return obj; }; wallace-best.serializeArgs = function (params) { var pcs = []; wallace-best.each(params, function (val, key) { if (val !== undefined) { pcs.push(key + (val !== null ? '=' + encodeURIComponent(val) : '')); } }); return pcs.join('&'); }; wallace-best.serialize = function (url, params, nocache) { if (params) { url += (~url.indexOf('?') ? (url.charAt(url.length - 1) == '&' ? '': '&') : '?'); url += wallace-best.serializeArgs(params); } if (nocache) { var ncp = {}; ncp[(new Date()).getTime()] = null; return wallace-best.serialize(url, ncp); } var len = url.length; return (url.charAt(len - 1) == "&" ? url.slice(0, len - 1) : url); }; var TIMEOUT_DURATION = 2e4; // 20 seconds var addEvent, removeEvent; // select the correct event listener function. all of our supported // browsers will use one of these if ('addEventListener' in window) { addEvent = function (node, event, handler) { node.addEventListener(event, handler, false); }; removeEvent = function (node, event, handler) { node.removeEventListener(event, handler, false); }; } else { addEvent = function (node, event, handler) { node.attachEvent('on' + event, handler); }; removeEvent = function (node, event, handler) { node.detachEvent('on' + event, handler); }; } wallace-best.require = function (url, params, nocache, success, failure) { var script = document.createElement('script'); var evName = script.addEventListener ? 'load' : 'readystatechange'; var timeout = null; script.src = wallace-best.serialize(url, params, nocache); script.async = true; script.charset = 'UTF-8'; function handler(ev) { ev = ev || window.event; if (!ev.target) { ev.target = ev.srcElement; } if (ev.type != 'load' && !/^(complete|loaded)$/.test(ev.target.readyState)) { return; // Not ready yet } if (success) { success(); } if (timeout) { clearTimeout(timeout); } removeEvent(ev.target, evName, handler); } if (success || failure) { addEvent(script, evName, handler); } if (failure) { timeout = setTimeout(function () { failure(); }, TIMEOUT_DURATION); } head.appendChild(script); return wallace-best; }; wallace-best.requireStylesheet = function (url, params, nocache) { var link = document.createElement('link'); link.rel = 'stylesheet'; link.type = 'text/css'; link.href = wallace-best.serialize(url, params, nocache); head.appendChild(link); return wallace-best; }; wallace-best.requireSet = function (urls, nocache, callback) { var remaining = urls.length; wallace-best.each(urls, function (url) { wallace-best.require(url, {}, nocache, function () { if (--remaining === 0) { callback(); } }); }); }; wallace-best.injectCss = function (css) { var style = document.createElement('style'); style.setAttribute('type', 'text/css'); // Make inline CSS more readable by splitting each rule onto a separate line css = css.replace(/\}/g, "}\n"); if (window.location.href.match(/^https/)) css = css.replace(/http:\/\//g, 'https://'); if (style.styleSheet) { // Internet Explorer only style.styleSheet.cssText = css; } else { style.appendChild(document.createTextNode(css)); } head.appendChild(style); }; wallace-best.isString = function (val) { return Object.prototype.toString.call(val) === '[object String]'; }; }); /*jshint boss:true*/ /*global wallace-best */ wallace-best.define('Events', function (window, undefined) { "use strict"; // Returns a function that will be executed at most one time, no matter how // often you call it. Useful for lazy initialization. var once = function (func) { var ran = false, memo; return function () { if (ran) return memo; ran = true; memo = func.apply(this, arguments); func = null; return memo; }; }; var has = wallace-best.isOwn; var keys = Object.keys || function (obj) { if (obj !== Object(obj)) throw new TypeError('Invalid object'); var keys = []; for (var key in obj) if (has(obj, key)) keys[keys.length] = key; return keys; }; var slice = [].slice; // Backbone.Events // --------------- // A module that can be mixed in to *any object* in order to provide it with // custom events. You may bind with `on` or remove with `off` callback // functions to an event; `trigger`-ing an event fires all callbacks in // succession. // // var object = {}; // _.extend(object, Backbone.Events); // object.on('expand', function(){ alert('expanded'); }); // object.trigger('expand'); // var Events = { // Bind an event to a `callback` function. Passing `"all"` will bind // the callback to all events fired. on: function (name, callback, context) { if (!eventsApi(this, 'on', name, [callback, context]) || !callback) return this; this._events = this._events || {}; var events = this._events[name] || (this._events[name] = []); events.push({callback: callback, context: context, ctx: context || this}); return this; }, // Bind an event to only be triggered a single time. After the first time // the callback is invoked, it will be removed. once: function (name, callback, context) { if (!eventsApi(this, 'once', name, [callback, context]) || !callback) return this; var self = this; var onced = once(function () { self.off(name, onced); callback.apply(this, arguments); }); onced._callback = callback; return this.on(name, onced, context); }, // Remove one or many callbacks. If `context` is null, removes all // callbacks with that function. If `callback` is null, removes all // callbacks for the event. If `name` is null, removes all bound // callbacks for all events. off: function (name, callback, context) { var retain, ev, events, names, i, l, j, k; if (!this._events || !eventsApi(this, 'off', name, [callback, context])) return this; if (!name && !callback && !context) { this._events = {}; return this; } names = name ? [name] : keys(this._events); for (i = 0, l = names.length; i < l; i++) { name = names[i]; if (events = this._events[name]) { this._events[name] = retain = []; if (callback || context) { for (j = 0, k = events.length; j < k; j++) { ev = events[j]; if ((callback && callback !== ev.callback && callback !== ev.callback._callback) || (context && context !== ev.context)) { retain.push(ev); } } } if (!retain.length) delete this._events[name]; } } return this; }, // Trigger one or many events, firing all bound callbacks. Callbacks are // passed the same arguments as `trigger` is, apart from the event name // (unless you're listening on `"all"`, which will cause your callback to // receive the true name of the event as the first argument). trigger: function (name) { if (!this._events) return this; var args = slice.call(arguments, 1); if (!eventsApi(this, 'trigger', name, args)) return this; var events = this._events[name]; var allEvents = this._events.all; if (events) triggerEvents(events, args); if (allEvents) triggerEvents(allEvents, arguments); return this; }, // Tell this object to stop listening to either specific events ... or // to every object it's currently listening to. stopListening: function (obj, name, callback) { var listeners = this._listeners; if (!listeners) return this; var deleteListener = !name && !callback; if (typeof name === 'object') callback = this; if (obj) (listeners = {})[obj._listenerId] = obj; for (var id in listeners) { listeners[id].off(name, callback, this); if (deleteListener) delete this._listeners[id]; } return this; } }; // Regular expression used to split event strings. var eventSplitter = /\s+/; // Implement fancy features of the Events API such as multiple event // names `"change blur"` and jQuery-style event maps `{change: action}` // in terms of the existing API. var eventsApi = function (obj, action, name, rest) { if (!name) return true; // Handle event maps. if (typeof name === 'object') { for (var key in name) { obj[action].apply(obj, [key, name[key]].concat(rest)); } return false; } // Handle space separated event names. if (eventSplitter.test(name)) { var names = name.split(eventSplitter); for (var i = 0, l = names.length; i < l; i++) { obj[action].apply(obj, [names[i]].concat(rest)); } return false; } return true; }; // A difficult-to-believe, but optimized internal dispatch function for // triggering events. Tries to keep the usual cases speedy (most internal // Backbone events have 3 arguments). var triggerEvents = function (events, args) { var ev, i = -1, l = events.length, a1 = args[0], a2 = args[1], a3 = args[2]; switch (args.length) { case 0: while (++i < l) { (ev = events[i]).callback.call(ev.ctx); } return; case 1: while (++i < l) { (ev = events[i]).callback.call(ev.ctx, a1); } return; case 2: while (++i < l) { (ev = events[i]).callback.call(ev.ctx, a1, a2); } return; case 3: while (++i < l) { (ev = events[i]).callback.call(ev.ctx, a1, a2, a3); } return; default: while (++i < l) { (ev = events[i]).callback.apply(ev.ctx, args); } } }; var listenMethods = {listenTo: 'on', listenToOnce: 'once'}; // Inversion-of-control versions of `on` and `once`. Tell *this* object to // listen to an event in another object ... keeping track of what it's // listening to. wallace-best.each(listenMethods, function (implementation, method) { Events[method] = function (obj, name, callback) { var listeners = this._listeners || (this._listeners = {}); var id = obj._listenerId || (obj._listenerId = wallace-best.getUid('l')); listeners[id] = obj; if (typeof name === 'object') callback = this; obj[implementation](name, callback, this); return this; }; }); // Aliases for backwards compatibility. Events.bind = Events.on; Events.unbind = Events.off; return Events; }); // used for /follow/ /login/ /signup/ social oauth dialogs // faking the bus wallace-best.use('Bus'); _.extend(DISQUS.Bus, wallace-best.Events); </script> <script src="//a.disquscdn.com/1391808583/js/src/global.js" charset="utf-8"></script> <script src="//a.disquscdn.com/1391808583/js/src/ga_events.js" charset="utf-8"></script> <script src="//a.disquscdn.com/1391808583/js/src/messagesx.js"></script> <!-- start Mixpanel --><script type="text/javascript">(function(e,b){if(!b.__SV){var a,f,i,g;window.mixpanel=b;a=e.createElement("script");a.type="text/javascript";a.async=!0;a.src=("https:"===e.location.protocol?"https:":"http:")+'//cdn.mxpnl.com/libs/mixpanel-2.2.min.js';f=e.getElementsByTagName("script")[0];f.parentNode.insertBefore(a,f);b._i=[];b.init=function(a,e,d){function f(b,h){var a=h.split(".");2==a.length&&(b=b[a[0]],h=a[1]);b[h]=function(){b.push([h].concat(Array.prototype.slice.call(arguments,0)))}}var c=b;"undefined"!== typeof d?c=b[d]=[]:d="mixpanel";c.people=c.people||[];c.toString=function(b){var a="mixpanel";"mixpanel"!==d&&(a+="."+d);b||(a+=" (stub)");return a};c.people.toString=function(){return c.toString(1)+".people (stub)"};i="disable track track_pageview track_links track_forms register register_once alias unregister identify name_tag set_config people.set people.set_once people.increment people.append people.track_charge people.clear_charges people.delete_user".split(" ");for(g=0;g<i.length;g++)f(c,i[g]); b._i.push([a,e,d])};b.__SV=1.2}})(document,window.mixpanel||[]); mixpanel.init('17b27902cd9da8972af8a3c43850fa5f', { track_pageview: false, debug: false }); </script><!-- end Mixpanel --> <script src="//a.disquscdn.com/1391808583//js/src/funnelcake.js"></script> <script type="text/javascript"> if (window.AB_TESTS === undefined) { var AB_TESTS = {}; } $(function() { if (context.auth.username !== undefined) { disqus.messagesx.init(context.auth.username); } }); </script> <script type="text/javascript" charset="utf-8"> // Global tests $(document).ready(function() { $('a[rel*=facebox]').facebox(); }); </script> <script type="text/x-underscore-template" data-template-name="global-nav"> <% var has_custom_avatar = data.avatar_url && data.avatar_url.indexOf('noavatar') < 0; %> <% var has_custom_username = data.username && data.username.indexOf('disqus_') < 0; %> <% if (data.username) { %> <li class="<%= data.forWebsitesClasses || '' %>" data-analytics="header for websites"><a href="<%= data.urlMap.for_websites %>">For Websites</a></li> <li data-analytics="header dashboard"><a href="<%= data.urlMap.dashboard %>">Dashboard</a></li> <% if (data.has_forums) { %> <li class="admin<% if (has_custom_avatar || !has_custom_username) { %> avatar-menu-admin<% } %>" data-analytics="header admin"><a href="<%= data.urlMap.admin %>">Admin</a></li> <% } %> <li class="user-dropdown dropdown-toggle<% if (has_custom_avatar || !has_custom_username) { %> avatar-menu<% } else { %> username-menu<% } %>" data-analytics="header username dropdown" data-floater-marker="<% if (has_custom_avatar || !has_custom_username) { %>square<% } %>"> <a href="<%= data.urlMap.home %>/<%= data.username %>/"> <% if (has_custom_avatar) { %> <img src="<%= data.avatar_url %>" class="avatar"> <% } else if (has_custom_username) { %> <%= data.username %> <% } else { %> <img src="<%= data.avatar_url %>" class="avatar"> <% } %> <span class="caret"></span> </a> <ul class="clearfix dropdown"> <li data-analytics="header view profile"><a href="<%= data.urlMap.home %>/<%= data.username %>/">View Profile</a></li> <li class="edit-profile js-edit-profile" data-analytics="header edit profile"><a href="<%= data.urlMap.dashboard %>#account">Edit Profile</a></li> <li class="logout" data-analytics="header logout"><a href="<%= data.urlMap.logout %>">Logout</a></li> </ul> </li> <% } else { %> <li class="<%= data.forWebsitesClasses || '' %>" data-analytics="header for websites"><a href="<%= data.urlMap.for_websites %>">For Websites</a></li> <li class="link-login" data-analytics="header login"><a href="<%= data.urlMap.login %>?next=<%= encodeURIComponent(document.location.href) %>">Log in</a></li> <% } %> </script> <!--[if lte IE 7]> <script src="//a.wallace-bestdn.com/1391808583/js/src/border_box_model.js"></script> <![endif]--> <!--[if lte IE 8]> <script src="//cdnjs.cloudflare.com/ajax/libs/modernizr/2.5.3/modernizr.min.js"></script> <script src="//a.wallace-bestcdn.com/1391808583/js/src/selectivizr.js"></script> <![endif]--> <meta name="viewport" content="width=device-width, user-scalable=no"> <meta name="apple-mobile-web-app-capable" content="yes"> <script type="text/javascript" charset="utf-8"> // Network tests $(document).ready(function() { $('a[rel*=facebox]').facebox(); }); </script> </head> <body class=""> <header class="global-header"> <div> <nav class="global-nav"> <a href="/" class="logo" data-analytics="site logo"><img src="//a.wallace-bestcdn.com/1391808583/img/disqus-logo-alt-hidpi.png" width="150" alt="wallace-best" title="wallace-best - Discover your community"/></a> </nav> </div> </header> <section class="login"> <form id="login-form" action="https://disqus.com/profile/login/?next=http://wallace-best.wallace-best.com/admin/moderate/" method="post" accept-charset="utf-8"> <h1>Sign in to continue</h1> <input type="text" name="username" tabindex="20" placeholder="Email or Username" value=""/> <div class="password-container"> <input type="password" name="password" tabindex="21" placeholder="Password" /> <span>(<a href="https://wallace-best.com/forgot/">forgot?</a>)</span> </div> <button type="submit" class="button submit" data-analytics="sign-in">Log in to wallace-best</button> <span class="create-account"> <a href="https://wallace-best.com/profile/signup/?next=http%3A//wallace-best.wallace-best.com/admin/moderate/" data-analytics="create-account"> Create an Account </a> </span> <h1 class="or-login">Alternatively, you can log in using:</h1> <div class="connect-options"> <button title="facebook" type="button" class="facebook-auth"> <span class="auth-container"> <img src="//a.wallace-bestdn.com/1391808583/img/icons/facebook.svg" alt="Facebook"> <!--[if lte IE 7]> <img src="//a.wallace-bestcdn.com/1391808583/img/icons/facebook.png" alt="Facebook"> <![endif]--> </span> </button> <button title="twitter" type="button" class="twitter-auth"> <span class="auth-container"> <img src="//a.wallace-bestdn.com/1391808583/img/icons/twitter.svg" alt="Twitter"> <!--[if lte IE 7]> <img src="//a.wallace-bestcdn.com/1391808583/img/icons/twitter.png" alt="Twitter"> <![endif]--> </span> </button> <button title="google" type="button" class="google-auth"> <span class="auth-container"> <img src="//a.wallace-bestdn.com/1391808583/img/icons/google.svg" alt="Google"> <!--[if lte IE 7]> <img src="//a.wallace-bestcdn.com/1391808583/img/icons/google.png" alt="Google"> <![endif]--> </span> </button> </div> </form> </section> <div class="get-disqus"> <a href="https://wallace-best.com/admin/signup/" data-analytics="get-disqus">Get wallace-best for your site</a> </div> <script> /*jshint undef:true, browser:true, maxlen:100, strict:true, expr:true, white:true */ // These must be global var _comscore, _gaq; (function (doc) { "use strict"; // Convert Django template variables to JS variables var debug = false, gaKey = '', gaPunt = '', gaCustomVars = { component: 'website', forum: '', version: 'v5' }, gaSlots = { component: 1, forum: 3, version: 4 }; /**/ gaKey = gaCustomVars.component == 'website' ? 'UA-1410476-16' : 'UA-1410476-6'; // Now start loading analytics services var s = doc.getElementsByTagName('script')[0], p = s.parentNode; var isSecure = doc.location.protocol == 'https:'; if (!debug) { _comscore = _comscore || []; // comScore // Load comScore _comscore.push({ c1: '7', c2: '10137436', c3: '1' }); var cs = document.createElement('script'); cs.async = true; cs.src = (isSecure ? 'https://sb' : 'http://b') + '.scorecardresearch.com/beacon.js'; p.insertBefore(cs, s); } // Set up Google Analytics _gaq = _gaq || []; if (!debug) { _gaq.push(['_setAccount', gaKey]); _gaq.push(['_setDomainName', '.wallace-best.com']); } if (!gaPunt) { for (var v in gaCustomVars) { if (!(gaCustomVars.hasOwnProperty(v) && gaCustomVars[v])) continue; _gaq.push(['_setCustomVar', gaSlots[v], gaCustomVars[v]]); } _gaq.push(['_trackPageview']); } // Load Google Analytics var ga = doc.createElement('script'); ga.type = 'text/javascript'; ga.async = true; var prefix = isSecure ? 'https://ssl' : 'http://www'; // Dev tip: if you cannot use the Google Analytics Debug Chrome extension, // https://chrome.google.com/webstore/detail/jnkmfdileelhofjcijamephohjechhna // you can replace /ga.js on the following line with /u/ga_debug.js // But if you do that, PLEASE DON'T COMMIT THE CHANGE! Kthxbai. ga.src = prefix + '.google-analytics.com/ga.js'; p.insertBefore(ga, s); }(document)); </script> <script> (function (){ // adds a classname for css to target the current page without passing in special things from the server or wherever // replacing all characters not allowable in classnames var newLocation = encodeURIComponent(window.location.pathname).replace(/[\.!~*'\(\)]/g, '_'); // cleaning up remaining url-encoded symbols for clarity sake newLocation = newLocation.replace(/%2F/g, '-').replace(/^-/, '').replace(/-$/, ''); if (newLocation === '') { newLocation = 'homepage'; } $('body').addClass('' + newLocation); }()); $(function ($) { // adds 'page-active' class to links matching the page url $('a[href="' + window.location.pathname + '"]').addClass('page-active'); }); $(document).delegate('[data-toggle-selector]', 'click', function (e) { var $this = $(this); $($this.attr('data-toggle-selector')).toggle(); e.preventDefault(); }); </script> <script type="text/javascript"> wallace-best.define('web.urls', function () { return { twitter: 'https://wallace-best.com/_ax/twitter/begin/', google: 'https://wallace-best.com/_ax/google/begin/', facebook: 'https://wallace-best.com/_ax/facebook/begin/', dashboard: 'http://wallace-best.com/dashboard/' } }); $(document).ready(function () { var usernameInput = $("input[name=username]"); if (usernameInput[0].value) { $("input[name=password]").focus(); } else { usernameInput.focus(); } }); </script> <script type="text/javascript" src="//a.wallace-bestcdn.com/1391808583/js/src/social_login.js"> <script type="text/javascript"> $(function() { var options = { authenticated: (context.auth.username !== undefined), moderated_forums: context.auth.moderated_forums, user_id: context.auth.user_id, track_clicks: !!context.switches.website_click_analytics, forum: context.forum }; wallace-best.funnelcake.init(options); }); </script> <!-- helper jQuery tmpl partials --> <script type="text/x-jquery-tmpl" id="profile-metadata-tmpl"> data-profile-username="${username}" data-profile-hash="${emailHash}" href="/${username}" </script> <script type="text/x-jquery-tmpl" id="profile-link-tmpl"> <a class="profile-launcher" {{tmpl "#profile-metadata-tmpl"}} href="/${username}">${name}</a> </script> <script src="//a.wallace-bestcdn.com/1391808583/js/src/templates.js"></script> <script src="//a.wallace-bestcdn.com/1391808583/js/src/modals.js"></script> <script> wallace-best.ui.config({ disqusUrl: 'https://disqus.com', mediaUrl: '//a.wallace-bestcdn.com/1391808583/' }); </script> </body> </html>
monkey0head / SplitLightAn Exploratory Toolkit for Recommender Systems Datasets and Splits
LudvigOlsen / Groupdata2R-package: Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.
SOHAM-THUMMAR / Brain Tumor Detection Model Using KerasBrain-Tumor-Detection-Model-using-Keras is a Python-based deep learning project that uses the Keras (TensorFlow) framework to detect brain tumors from MRI images. It provides a complete workflow including data preprocessing, dataset splitting, model training, and saving trained models for inference or further experimentation.
svrcekmichal / Universal ReactUniversal react with async data fetching, async routes, code splitting from top route
monkey0head / Time To SplitTime to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders (RecSys'25)
uvhw / Bitcoin FoundationBitcoin: A Peer-to-Peer Electronic Cash System Satoshi Nakamoto satoshin@gmx.com www.bitcoin.org Abstract. A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone. 1. Introduction Commerce on the Internet has come to rely almost exclusively on financial institutions serving as trusted third parties to process electronic payments. While the system works well enough for most transactions, it still suffers from the inherent weaknesses of the trust based model. Completely non-reversible transactions are not really possible, since financial institutions cannot avoid mediating disputes. The cost of mediation increases transaction costs, limiting the minimum practical transaction size and cutting off the possibility for small casual transactions, and there is a broader cost in the loss of ability to make non-reversible payments for non- reversible services. With the possibility of reversal, the need for trust spreads. Merchants must be wary of their customers, hassling them for more information than they would otherwise need. A certain percentage of fraud is accepted as unavoidable. These costs and payment uncertainties can be avoided in person by using physical currency, but no mechanism exists to make payments over a communications channel without a trusted party. What is needed is an electronic payment system based on cryptographic proof instead of trust, allowing any two willing parties to transact directly with each other without the need for a trusted third party. Transactions that are computationally impractical to reverse would protect sellers from fraud, and routine escrow mechanisms could easily be implemented to protect buyers. In this paper, we propose a solution to the double-spending problem using a peer-to-peer distributed timestamp server to generate computational proof of the chronological order of transactions. The system is secure as long as honest nodes collectively control more CPU power than any cooperating group of attacker nodes. 1 2. Transactions We define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin. A payee can verify the signatures to verify the chain of ownership. Transaction Hash Transaction Hash Transaction Hash Owner 1's Public Key Owner 2's Public Key Owner 3's Public Key Owner 0's Signature Owner 1's Signature The problem of course is the payee can't verify that one of the owners did not double-spend the coin. A common solution is to introduce a trusted central authority, or mint, that checks every transaction for double spending. After each transaction, the coin must be returned to the mint to issue a new coin, and only coins issued directly from the mint are trusted not to be double-spent. The problem with this solution is that the fate of the entire money system depends on the company running the mint, with every transaction having to go through them, just like a bank. We need a way for the payee to know that the previous owners did not sign any earlier transactions. For our purposes, the earliest transaction is the one that counts, so we don't care about later attempts to double-spend. The only way to confirm the absence of a transaction is to be aware of all transactions. In the mint based model, the mint was aware of all transactions and decided which arrived first. To accomplish this without a trusted party, transactions must be publicly announced [1], and we need a system for participants to agree on a single history of the order in which they were received. The payee needs proof that at the time of each transaction, the majority of nodes agreed it was the first received. 3. Timestamp Server The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post [2-5]. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it. Hash Hash Owner 2's Signature Owner 1's Private Key Owner 2's Private Key Owner 3's Private Key Block Item Item ... 2 Block Item Item ... Verify Verify Sign Sign 4. Proof-of-Work To implement a distributed timestamp server on a peer-to-peer basis, we will need to use a proof- of-work system similar to Adam Back's Hashcash [6], rather than newspaper or Usenet posts. The proof-of-work involves scanning for a value that when hashed, such as with SHA-256, the hash begins with a number of zero bits. The average work required is exponential in the number of zero bits required and can be verified by executing a single hash. For our timestamp network, we implement the proof-of-work by incrementing a nonce in the block until a value is found that gives the block's hash the required zero bits. Once the CPU effort has been expended to make it satisfy the proof-of-work, the block cannot be changed without redoing the work. As later blocks are chained after it, the work to change the block would include redoing all the blocks after it. The proof-of-work also solves the problem of determining representation in majority decision making. If the majority were based on one-IP-address-one-vote, it could be subverted by anyone able to allocate many IPs. Proof-of-work is essentially one-CPU-one-vote. The majority decision is represented by the longest chain, which has the greatest proof-of-work effort invested in it. If a majority of CPU power is controlled by honest nodes, the honest chain will grow the fastest and outpace any competing chains. To modify a past block, an attacker would have to redo the proof-of-work of the block and all blocks after it and then catch up with and surpass the work of the honest nodes. We will show later that the probability of a slower attacker catching up diminishes exponentially as subsequent blocks are added. To compensate for increasing hardware speed and varying interest in running nodes over time, the proof-of-work difficulty is determined by a moving average targeting an average number of blocks per hour. If they're generated too fast, the difficulty increases. 5. Network The steps to run the network are as follows: 1) New transactions are broadcast to all nodes. 2) Each node collects new transactions into a block. 3) Each node works on finding a difficult proof-of-work for its block. 4) When a node finds a proof-of-work, it broadcasts the block to all nodes. 5) Nodes accept the block only if all transactions in it are valid and not already spent. 6) Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash. Nodes always consider the longest chain to be the correct one and will keep working on extending it. If two nodes broadcast different versions of the next block simultaneously, some nodes may receive one or the other first. In that case, they work on the first one they received, but save the other branch in case it becomes longer. The tie will be broken when the next proof- of-work is found and one branch becomes longer; the nodes that were working on the other branch will then switch to the longer one. 3 Block Nonce Tx Tx ... Block Nonce Tx Tx ... Prev Hash Prev Hash New transaction broadcasts do not necessarily need to reach all nodes. As long as they reach many nodes, they will get into a block before long. Block broadcasts are also tolerant of dropped messages. If a node does not receive a block, it will request it when it receives the next block and realizes it missed one. 6. Incentive By convention, the first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. This adds an incentive for nodes to support the network, and provides a way to initially distribute coins into circulation, since there is no central authority to issue them. The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation. In our case, it is CPU time and electricity that is expended. The incentive can also be funded with transaction fees. If the output value of a transaction is less than its input value, the difference is a transaction fee that is added to the incentive value of the block containing the transaction. Once a predetermined number of coins have entered circulation, the incentive can transition entirely to transaction fees and be completely inflation free. The incentive may help encourage nodes to stay honest. If a greedy attacker is able to assemble more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate new coins. He ought to find it more profitable to play by the rules, such rules that favour him with more new coins than everyone else combined, than to undermine the system and the validity of his own wealth. 7. Reclaiming Disk Space Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space. To facilitate this without breaking the block's hash, transactions are hashed in a Merkle Tree [7][2][5], with only the root included in the block's hash. Old blocks can then be compacted by stubbing off branches of the tree. The interior hashes do not need to be stored. Block Hash0 Hash1 Hash2 Hash3 Tx0 Tx1 Tx2 Tx3 Block Header (Block Hash) Prev Hash Nonce Root Hash Hash01 Hash23 Block Block Header (Block Hash) Prev Hash Nonce Root Hash Hash01 Hash23 Hash2 Hash3 Tx3 Transactions Hashed in a Merkle Tree After Pruning Tx0-2 from the Block A block header with no transactions would be about 80 bytes. If we suppose blocks are generated every 10 minutes, 80 bytes * 6 * 24 * 365 = 4.2MB per year. With computer systems typically selling with 2GB of RAM as of 2008, and Moore's Law predicting current growth of 1.2GB per year, storage should not be a problem even if the block headers must be kept in memory. 4 8. Simplified Payment Verification It is possible to verify payments without running a full network node. A user only needs to keep a copy of the block headers of the longest proof-of-work chain, which he can get by querying network nodes until he's convinced he has the longest chain, and obtain the Merkle branch linking the transaction to the block it's timestamped in. He can't check the transaction for himself, but by linking it to a place in the chain, he can see that a network node has accepted it, and blocks added after it further confirm the network has accepted it. Longest Proof-of-Work Chain Block Header Block Header Block Header Prev Hash Nonce Prev Hash Nonce Prev Hash Nonce Merkle Root Merkle Root Merkle Root Hash01 Hash23 Merkle Branch for Tx3 Hash2 Hash3 Tx3 As such, the verification is reliable as long as honest nodes control the network, but is more vulnerable if the network is overpowered by an attacker. While network nodes can verify transactions for themselves, the simplified method can be fooled by an attacker's fabricated transactions for as long as the attacker can continue to overpower the network. One strategy to protect against this would be to accept alerts from network nodes when they detect an invalid block, prompting the user's software to download the full block and alerted transactions to confirm the inconsistency. Businesses that receive frequent payments will probably still want to run their own nodes for more independent security and quicker verification. 9. Combining and Splitting Value Although it would be possible to handle coins individually, it would be unwieldy to make a separate transaction for every cent in a transfer. To allow value to be split and combined, transactions contain multiple inputs and outputs. Normally there will be either a single input from a larger previous transaction or multiple inputs combining smaller amounts, and at most two outputs: one for the payment, and one returning the change, if any, back to the sender. It should be noted that fan-out, where a transaction depends on several transactions, and those transactions depend on many more, is not a problem here. There is never the need to extract a complete standalone copy of a transaction's history. 5 Transaction In Out In ... ... 10. Privacy The traditional banking model achieves a level of privacy by limiting access to information to the parties involved and the trusted third party. The necessity to announce all transactions publicly precludes this method, but privacy can still be maintained by breaking the flow of information in another place: by keeping public keys anonymous. The public can see that someone is sending an amount to someone else, but without information linking the transaction to anyone. This is similar to the level of information released by stock exchanges, where the time and size of individual trades, the "tape", is made public, but without telling who the parties were. Traditional Privacy Model Identities Transactions New Privacy Model Identities Transactions As an additional firewall, a new key pair should be used for each transaction to keep them from being linked to a common owner. Some linking is still unavoidable with multi-input transactions, which necessarily reveal that their inputs were owned by the same owner. The risk is that if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner. 11. Calculations We consider the scenario of an attacker trying to generate an alternate chain faster than the honest chain. Even if this is accomplished, it does not throw the system open to arbitrary changes, such as creating value out of thin air or taking money that never belonged to the attacker. Nodes are not going to accept an invalid transaction as payment, and honest nodes will never accept a block containing them. An attacker can only try to change one of his own transactions to take back money he recently spent. The race between the honest chain and an attacker chain can be characterized as a Binomial Random Walk. The success event is the honest chain being extended by one block, increasing its lead by +1, and the failure event is the attacker's chain being extended by one block, reducing the gap by -1. The probability of an attacker catching up from a given deficit is analogous to a Gambler's Ruin problem. Suppose a gambler with unlimited credit starts at a deficit and plays potentially an infinite number of trials to try to reach breakeven. We can calculate the probability he ever reaches breakeven, or that an attacker ever catches up with the honest chain, as follows [8]: p = probability an honest node finds the next block q = probability the attacker finds the next block qz = probability the attacker will ever catch up from z blocks behind Trusted Third Party q ={ 1 if p≤q} z q/pz if pq 6 Counterparty Public Public Given our assumption that p > q, the probability drops exponentially as the number of blocks the attacker has to catch up with increases. With the odds against him, if he doesn't make a lucky lunge forward early on, his chances become vanishingly small as he falls further behind. We now consider how long the recipient of a new transaction needs to wait before being sufficiently certain the sender can't change the transaction. We assume the sender is an attacker who wants to make the recipient believe he paid him for a while, then switch it to pay back to himself after some time has passed. The receiver will be alerted when that happens, but the sender hopes it will be too late. The receiver generates a new key pair and gives the public key to the sender shortly before signing. This prevents the sender from preparing a chain of blocks ahead of time by working on it continuously until he is lucky enough to get far enough ahead, then executing the transaction at that moment. Once the transaction is sent, the dishonest sender starts working in secret on a parallel chain containing an alternate version of his transaction. The recipient waits until the transaction has been added to a block and z blocks have been linked after it. He doesn't know the exact amount of progress the attacker has made, but assuming the honest blocks took the average expected time per block, the attacker's potential progress will be a Poisson distribution with expected value: = z qp To get the probability the attacker could still catch up now, we multiply the Poisson density for each amount of progress he could have made by the probability he could catch up from that point: ∞ ke−{q/pz−k ifk≤z} ∑k=0 k!⋅ 1 ifkz Rearranging to avoid summing the infinite tail of the distribution... z ke− z−k 1−∑k=0 k! 1−q/p Converting to C code... #include <math.h> double AttackerSuccessProbability(double q, int z) { double p = 1.0 - q; double lambda = z * (q / p); double sum = 1.0; int i, k; for (k = 0; k <= z; k++) { double poisson = exp(-lambda); for (i = 1; i <= k; i++) poisson *= lambda / i; sum -= poisson * (1 - pow(q / p, z - k)); } return sum; } 7 Running some results, we can see the probability drop off exponentially with z. q=0.1 z=0 P=1.0000000 z=1 P=0.2045873 z=2 P=0.0509779 z=3 P=0.0131722 z=4 P=0.0034552 z=5 P=0.0009137 z=6 P=0.0002428 z=7 P=0.0000647 z=8 P=0.0000173 z=9 P=0.0000046 z=10 P=0.0000012 q=0.3 z=0 P=1.0000000 z=5 P=0.1773523 z=10 P=0.0416605 z=15 P=0.0101008 z=20 P=0.0024804 z=25 P=0.0006132 z=30 P=0.0001522 z=35 P=0.0000379 z=40 P=0.0000095 z=45 P=0.0000024 z=50 P=0.0000006 Solving for P less than 0.1%... P < 0.001 q=0.10 z=5 q=0.15 z=8 q=0.20 z=11 q=0.25 z=15 q=0.30 z=24 q=0.35 z=41 q=0.40 z=89 q=0.45 z=340 12. Conclusion We have proposed a system for electronic transactions without relying on trust. We started with the usual framework of coins made from digital signatures, which provides strong control of ownership, but is incomplete without a way to prevent double-spending. To solve this, we proposed a peer-to-peer network using proof-of-work to record a public history of transactions that quickly becomes computationally impractical for an attacker to change if honest nodes control a majority of CPU power. The network is robust in its unstructured simplicity. Nodes work all at once with little coordination. They do not need to be identified, since messages are not routed to any particular place and only need to be delivered on a best effort basis. Nodes can leave and rejoin the network at will, accepting the proof-of-work chain as proof of what happened while they were gone. They vote with their CPU power, expressing their acceptance of valid blocks by working on extending them and rejecting invalid blocks by refusing to work on them. Any needed rules and incentives can be enforced with this consensus mechanism. 8 References [1] W. Dai, "b-money," http://www.weidai.com/bmoney.txt, 1998. [2] H. Massias, X.S. Avila, and J.-J. Quisquater, "Design of a secure timestamping service with minimal trust requirements," In 20th Symposium on Information Theory in the Benelux, May 1999. [3] S. Haber, W.S. Stornetta, "How to time-stamp a digital document," In Journal of Cryptology, vol 3, no 2, pages 99-111, 1991. [4] D. Bayer, S. Haber, W.S. Stornetta, "Improving the efficiency and reliability of digital time-stamping," In Sequences II: Methods in Communication, Security and Computer Science, pages 329-334, 1993. [5] S. Haber, W.S. Stornetta, "Secure names for bit-strings," In Proceedings of the 4th ACM Conference on Computer and Communications Security, pages 28-35, April 1997. [6] A. Back, "Hashcash - a denial of service counter-measure," http://www.hashcash.org/papers/hashcash.pdf, 2002. [7] R.C. Merkle, "Protocols for public key cryptosystems," In Proc. 1980 Symposium on Security and Privacy, IEEE Computer Society, pages 122-133, April 1980. [8] W. Feller, "An introduction to probability theory and its applications," 1957. 9
HlaingPhyoAung / SqlmapUsage: python sqlmap.py [options] Options: -h, --help Show basic help message and exit -hh Show advanced help message and exit --version Show program's version number and exit -v VERBOSE Verbosity level: 0-6 (default 1) Target: At least one of these options has to be provided to define the target(s) -d DIRECT Connection string for direct database connection -u URL, --url=URL Target URL (e.g. "http://www.site.com/vuln.php?id=1") -l LOGFILE Parse target(s) from Burp or WebScarab proxy log file -x SITEMAPURL Parse target(s) from remote sitemap(.xml) file -m BULKFILE Scan multiple targets given in a textual file -r REQUESTFILE Load HTTP request from a file -g GOOGLEDORK Process Google dork results as target URLs -c CONFIGFILE Load options from a configuration INI file Request: These options can be used to specify how to connect to the target URL --method=METHOD Force usage of given HTTP method (e.g. PUT) --data=DATA Data string to be sent through POST --param-del=PARA.. Character used for splitting parameter values --cookie=COOKIE HTTP Cookie header value --cookie-del=COO.. Character used for splitting cookie values --load-cookies=L.. File containing cookies in Netscape/wget format --drop-set-cookie Ignore Set-Cookie header from response --user-agent=AGENT HTTP User-Agent header value --random-agent Use randomly selected HTTP User-Agent header value --host=HOST HTTP Host header value --referer=REFERER HTTP Referer header value -H HEADER, --hea.. Extra header (e.g. "X-Forwarded-For: 127.0.0.1") --headers=HEADERS Extra headers (e.g. "Accept-Language: fr\nETag: 123") --auth-type=AUTH.. HTTP authentication type (Basic, Digest, NTLM or PKI) --auth-cred=AUTH.. HTTP authentication credentials (name:password) --auth-file=AUTH.. HTTP authentication PEM cert/private key file --ignore-401 Ignore HTTP Error 401 (Unauthorized) --proxy=PROXY Use a proxy to connect to the target URL --proxy-cred=PRO.. Proxy authentication credentials (name:password) --proxy-file=PRO.. Load proxy list from a file --ignore-proxy Ignore system default proxy settings --tor Use Tor anonymity network --tor-port=TORPORT Set Tor proxy port other than default --tor-type=TORTYPE Set Tor proxy type (HTTP (default), SOCKS4 or SOCKS5) --check-tor Check to see if Tor is used properly --delay=DELAY Delay in seconds between each HTTP request --timeout=TIMEOUT Seconds to wait before timeout connection (default 30) --retries=RETRIES Retries when the connection timeouts (default 3) --randomize=RPARAM Randomly change value for given parameter(s) --safe-url=SAFEURL URL address to visit frequently during testing --safe-post=SAFE.. POST data to send to a safe URL --safe-req=SAFER.. Load safe HTTP request from a file --safe-freq=SAFE.. Test requests between two visits to a given safe URL --skip-urlencode Skip URL encoding of payload data --csrf-token=CSR.. Parameter used to hold anti-CSRF token --csrf-url=CSRFURL URL address to visit to extract anti-CSRF token --force-ssl Force usage of SSL/HTTPS --hpp Use HTTP parameter pollution method --eval=EVALCODE Evaluate provided Python code before the request (e.g. "import hashlib;id2=hashlib.md5(id).hexdigest()") Optimization: These options can be used to optimize the performance of sqlmap -o Turn on all optimization switches --predict-output Predict common queries output --keep-alive Use persistent HTTP(s) connections --null-connection Retrieve page length without actual HTTP response body --threads=THREADS Max number of concurrent HTTP(s) requests (default 1) Injection: These options can be used to specify which parameters to test for, provide custom injection payloads and optional tampering scripts -p TESTPARAMETER Testable parameter(s) --skip=SKIP Skip testing for given parameter(s) --skip-static Skip testing parameters that not appear dynamic --dbms=DBMS Force back-end DBMS to this value --dbms-cred=DBMS.. DBMS authentication credentials (user:password) --os=OS Force back-end DBMS operating system to this value --invalid-bignum Use big numbers for invalidating values --invalid-logical Use logical operations for invalidating values --invalid-string Use random strings for invalidating values --no-cast Turn off payload casting mechanism --no-escape Turn off string escaping mechanism --prefix=PREFIX Injection payload prefix string --suffix=SUFFIX Injection payload suffix string --tamper=TAMPER Use given script(s) for tampering injection data Detection: These options can be used to customize the detection phase --level=LEVEL Level of tests to perform (1-5, default 1) --risk=RISK Risk of tests to perform (1-3, default 1) --string=STRING String to match when query is evaluated to True --not-string=NOT.. String to match when query is evaluated to False --regexp=REGEXP Regexp to match when query is evaluated to True --code=CODE HTTP code to match when query is evaluated to True --text-only Compare pages based only on the textual content --titles Compare pages based only on their titles Techniques: These options can be used to tweak testing of specific SQL injection techniques --technique=TECH SQL injection techniques to use (default "BEUSTQ") --time-sec=TIMESEC Seconds to delay the DBMS response (default 5) --union-cols=UCOLS Range of columns to test for UNION query SQL injection --union-char=UCHAR Character to use for bruteforcing number of columns --union-from=UFROM Table to use in FROM part of UNION query SQL injection --dns-domain=DNS.. Domain name used for DNS exfiltration attack --second-order=S.. Resulting page URL searched for second-order response Fingerprint: -f, --fingerprint Perform an extensive DBMS version fingerprint Enumeration: These options can be used to enumerate the back-end database management system information, structure and data contained in the tables. Moreover you can run your own SQL statements -a, --all Retrieve everything -b, --banner Retrieve DBMS banner --current-user Retrieve DBMS current user --current-db Retrieve DBMS current database --hostname Retrieve DBMS server hostname --is-dba Detect if the DBMS current user is DBA --users Enumerate DBMS users --passwords Enumerate DBMS users password hashes --privileges Enumerate DBMS users privileges --roles Enumerate DBMS users roles --dbs Enumerate DBMS databases --tables Enumerate DBMS database tables --columns Enumerate DBMS database table columns --schema Enumerate DBMS schema --count Retrieve number of entries for table(s) --dump Dump DBMS database table entries --dump-all Dump all DBMS databases tables entries --search Search column(s), table(s) and/or database name(s) --comments Retrieve DBMS comments -D DB DBMS database to enumerate -T TBL DBMS database table(s) to enumerate -C COL DBMS database table column(s) to enumerate -X EXCLUDECOL DBMS database table column(s) to not enumerate -U USER DBMS user to enumerate --exclude-sysdbs Exclude DBMS system databases when enumerating tables --pivot-column=P.. Pivot column name --where=DUMPWHERE Use WHERE condition while table dumping --start=LIMITSTART First query output entry to retrieve --stop=LIMITSTOP Last query output entry to retrieve --first=FIRSTCHAR First query output word character to retrieve --last=LASTCHAR Last query output word character to retrieve --sql-query=QUERY SQL statement to be executed --sql-shell Prompt for an interactive SQL shell --sql-file=SQLFILE Execute SQL statements from given file(s) Brute force: These options can be used to run brute force checks --common-tables Check existence of common tables --common-columns Check existence of common columns User-defined function injection: These options can be used to create custom user-defined functions --udf-inject Inject custom user-defined functions --shared-lib=SHLIB Local path of the shared library File system access: These options can be used to access the back-end database management system underlying file system --file-read=RFILE Read a file from the back-end DBMS file system --file-write=WFILE Write a local file on the back-end DBMS file system --file-dest=DFILE Back-end DBMS absolute filepath to write to Operating system access: These options can be used to access the back-end database management system underlying operating system --os-cmd=OSCMD Execute an operating system command --os-shell Prompt for an interactive operating system shell --os-pwn Prompt for an OOB shell, Meterpreter or VNC --os-smbrelay One click prompt for an OOB shell, Meterpreter or VNC --os-bof Stored procedure buffer overflow exploitation --priv-esc Database process user privilege escalation --msf-path=MSFPATH Local path where Metasploit Framework is installed --tmp-path=TMPPATH Remote absolute path of temporary files directory Windows registry access: These options can be used to access the back-end database management system Windows registry --reg-read Read a Windows registry key value --reg-add Write a Windows registry key value data --reg-del Delete a Windows registry key value --reg-key=REGKEY Windows registry key --reg-value=REGVAL Windows registry key value --reg-data=REGDATA Windows registry key value data --reg-type=REGTYPE Windows registry key value type General: These options can be used to set some general working parameters -s SESSIONFILE Load session from a stored (.sqlite) file -t TRAFFICFILE Log all HTTP traffic into a textual file --batch Never ask for user input, use the default behaviour --binary-fields=.. Result fields having binary values (e.g. "digest") --charset=CHARSET Force character encoding used for data retrieval --crawl=CRAWLDEPTH Crawl the website starting from the target URL --crawl-exclude=.. Regexp to exclude pages from crawling (e.g. "logout") --csv-del=CSVDEL Delimiting character used in CSV output (default ",") --dump-format=DU.. Format of dumped data (CSV (default), HTML or SQLITE) --eta Display for each output the estimated time of arrival --flush-session Flush session files for current target --forms Parse and test forms on target URL --fresh-queries Ignore query results stored in session file --hex Use DBMS hex function(s) for data retrieval --output-dir=OUT.. Custom output directory path --parse-errors Parse and display DBMS error messages from responses --save=SAVECONFIG Save options to a configuration INI file --scope=SCOPE Regexp to filter targets from provided proxy log --test-filter=TE.. Select tests by payloads and/or titles (e.g. ROW) --test-skip=TEST.. Skip tests by payloads and/or titles (e.g. BENCHMARK) --update Update sqlmap Miscellaneous: -z MNEMONICS Use short mnemonics (e.g. "flu,bat,ban,tec=EU") --alert=ALERT Run host OS command(s) when SQL injection is found --answers=ANSWERS Set question answers (e.g. "quit=N,follow=N") --beep Beep on question and/or when SQL injection is found --cleanup Clean up the DBMS from sqlmap specific UDF and tables --dependencies Check for missing (non-core) sqlmap dependencies --disable-coloring Disable console output coloring --gpage=GOOGLEPAGE Use Google dork results from specified page number --identify-waf Make a thorough testing for a WAF/IPS/IDS protection --skip-waf Skip heuristic detection of WAF/IPS/IDS protection --mobile Imitate smartphone through HTTP User-Agent header --offline Work in offline mode (only use session data) --page-rank Display page rank (PR) for Google dork results --purge-output Safely remove all content from output directory --smart Conduct thorough tests only if positive heuristic(s) --sqlmap-shell Prompt for an interactive sqlmap shell --wizard Simple wizard interface for beginner users