31 skills found · Page 1 of 2
python-windrose / WindroseA Python Matplotlib, Numpy library to manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
ssebastianmag / Hydrogen WavefunctionsHydrogen wavefunction modeling and electron probability density plots
derrynknife / SurPyvalA Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
spsanderson / TidyDensityCreate tidy probability/density tibbles and plots of randomly generated and empirical data.
Gagniuc / Markov Chains Prediction FrameworkThis application makes predictions by multiplying a probability vector with a transition matrix multiple times (n steps - user defined). On each step the values from the resulting probability vectors are plotted on a chart. The resulting curves on the chart indicate the behavior of the system over a number of steps.
Gagniuc / Predictions With Markov ChainsPredictions with Markov Chains is a JS application that multiplies a probability vector with a transition matrix multiple times (n steps - user defined). On each step, the values from the resulting probability vectors are plotted on a chart. The resulting curves on the chart indicate the behavior of the system over n steps.
Steven-Wright1 / Probability Map Generator SAR Codes written to aid finding missing person in Search and Rescue Scenarios. The program takes x,y,z inputs from Digital Elevation Model (DEM) data plotted in ArcGis Pro and retrieved for processing in MATLAB. The program produces a probability map of likely locations a missing person may be found in wilderness SAR based upon elevation, subject category (ie. hiker, child etc.) and distance. This probability map output format is chosen such that a travelling salesman path planning algorithm may be applied, and the subsequent path uploaded to the CPU and autopilot of a UAV, such that a UAV may automatedly conduct the search pattern.
swairshah / Intensifycoloring terminal text with intensities (used for plotting probability, entropy with tokens)
Jai-Agarwal-04 / Sentiment Analysis With InsightsSentiment Analysis with Insights using NLP and Dash This project show the sentiment analysis of text data using NLP and Dash. I used Amazon reviews dataset to train the model and further scrap the reviews from Etsy.com in order to test my model. Prerequisites: Python3 Amazon Dataset (3.6GB) Anaconda How this project was made? This project has been built using Python3 to help predict the sentiments with the help of Machine Learning and an interactive dashboard to test reviews. To start, I downloaded the dataset and extracted the JSON file. Next, I took out a portion of 7,92,000 reviews equally distributed into chunks of 24000 reviews using pandas. The chunks were then combined into a single CSV file called balanced_reviews.csv. This balanced_reviews.csv served as the base for training my model which was filtered on the basis of review greater than 3 and less than 3. Further, this filtered data was vectorized using TF_IDF vectorizer. After training the model to a 90% accuracy, the reviews were scrapped from Etsy.com in order to test our model. Finally, I built a dashboard in which we can check the sentiments based on input given by the user or can check the sentiments of reviews scrapped from the website. What is CountVectorizer? CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). CountVectorizer creates a matrix in which each unique word is represented by a column of the matrix, and each text sample from the document is a row in the matrix. The value of each cell is nothing but the count of the word in that particular text sample. What is TF-IDF Vectorizer? TF-IDF stands for Term Frequency - Inverse Document Frequency and is a statistic that aims to better define how important a word is for a document, while also taking into account the relation to other documents from the same corpus. This is performed by looking at how many times a word appears into a document while also paying attention to how many times the same word appears in other documents in the corpus. The rationale behind this is the following: a word that frequently appears in a document has more relevancy for that document, meaning that there is higher probability that the document is about or in relation to that specific word a word that frequently appears in more documents may prevent us from finding the right document in a collection; the word is relevant either for all documents or for none. Either way, it will not help us filter out a single document or a small subset of documents from the whole set. So then TF-IDF is a score which is applied to every word in every document in our dataset. And for every word, the TF-IDF value increases with every appearance of the word in a document, but is gradually decreased with every appearance in other documents. What is Plotly Dash? Dash is a productive Python framework for building web analytic applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It's particularly suited for anyone who works with data in Python. Dash apps are rendered in the web browser. You can deploy your apps to servers and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready. Dash is an open source library, released under the permissive MIT license. Plotly develops Dash and offers a platform for managing Dash apps in an enterprise environment. What is Web Scrapping? Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Running the project Step 1: Download the dataset and extract the JSON data in your project folder. Make a folder filtered_chunks and run the data_extraction.py file. This will extract data from the JSON file into equal sized chunks and then combine them into a single CSV file called balanced_reviews.csv. Step 2: Run the data_cleaning_preprocessing_and_vectorizing.py file. This will clean and filter out the data. Next the filtered data will be fed to the TF-IDF Vectorizer and then the model will be pickled in a trained_model.pkl file and the Vocabulary of the trained model will be stored as vocab.pkl. Keep these two files in a folder named model_files. Step 3: Now run the etsy_review_scrapper.py file. Adjust the range of pages and product to be scrapped as it might take a long long time to process. A small sized data is sufficient to check the accuracy of our model. The scrapped data will be stored in csv as well as db file. Step 4: Finally, run the app.py file that will start up the Dash server and we can check the working of our model either by typing or either by selecting the preloaded scrapped reviews.
XsarfrazX / CDF PDF MatlabPlotting and analysing Cumulative Distribution Function(CDF) and Probability Density Function(PDF) of Uniform and Gaussian Distribution
RedDoorAnalytics / Msm.stackedStacked Probabilities Plots and Transition Probabilities for 'msm' Multi-State Models
Thomas-George-T / Social Media Analytics In RTaking a look at data of 1.6 million twitter users and drawing useful insights while exploring interesting patterns visualized with concise plots. The techniques used include text mining, sentimental analysis, probability, time series analysis and Hierarchical clustering on text/words using R.
Bribak / SURFY2This repository constitutes SURFY2 and corresponds to the bioRxiv preprint 'Updating the in silico human surfaceome with meta-ensemble learning and feature engineering' by Daniel Bojar. SURFY2 is a machine learning classifier to predict whether a human transmembrane protein is located at the surface of a cell (the plasma membrane) or in one of the intracellular membranes based on the sequence characteristics of the protein. Making use of the data described in the recent publication from Bausch-Fluck et al. (https://doi.org/10.1073/pnas.1808790115), SURFY2 considerably improves on their reported classifier SURFY in terms of accuracy (95.5%), precision (94.3%), recall (97.6%) and area under ROC curve (0.954) when using a test set never seen by the classifier before. SURFY2 consists of a layer of 12 base estimators generating 24 new engineered features (class probabilities for both classes) which are appended to the original 253 features. Then, a soft voting classifier with three optimized base estimators (Random Forest, Gradient Boosting and Logistic Regression) and optimized voting weights is trained on this expanded dataset, resulting in the final prediction. The motivation of SURFY2 is to provide an updated and better version of the in silico human surfaceome to facilitate research and drug development on human surface-exposed transmembrane proteins. Additionally, SURFY2 enabled insights into biological properties of these proteins and generated several new hypotheses / ideas for experiments. The workflow is as following: 1) dataPrep Gets training data from data.xlsx, labels it according to surface class and outputs 'train_data.csv' 2) split Gets train_data.csv, splits it into training, validation and test data and outputs 'train.csv', 'val.csv', 'test.csv'. 3) main_val Was used for optimizing hyperparameters of base estimators and estimators & weights of voting classifier. Stores all estimators. Evaluates meta-ensemble classifier SURFY2 on validation set. 4) classifier_selection All base estimators and meta-ensemble approaches are tested on the initial dataset as well as the expanded dataset including the engineered features and compared in terms of their cross-validation score. 5) main_test Evaluates SURFY2 on the separate test set (trained on training + validation set). 6) testing_SURFY Evaluates the original SURFY through cross-validation and on validation as well as test set. 7) pred_unlabeled Uses SURFY2 to predict the surface label (+ prediction score) for unlabeled proteins in data.xlsx. Also gets the feature importances of the voting classifier estimators. 8) getting_discrepancies Compare predictions with those made by SURFY ('surfy.xlsx') and store mismatches. Also store the 10 most confident mismatches (by SURFY2 classification score) from each class. 9) feature_importances Plot the 10 most important features for the voting classifier estimators (Random Forest, Gradient Boosting, Logistic Regression) to interpret predictions. 10) base_estimator_importances Plot the 10 most important features for the two most important base estimators (XGBClassifier and Gradient Boosting). 11) comparing_mismatches Separate datasets into shared & discrepant predictions (between SURFY and SURFY2). Compare feature means and select features with the highest class feature mean differences between prediction datasets. Statistically analyze differences in features means between classes in both prediction datasets. Plot 9 representative features with their means grouped according to class and prediction dataset to rationalize discrepant predictions. 12) tSNE_surfy2 Perform nonlinear dimensionality reduction using t-SNE on proteins with predictions from both SURFY and SURFY2. Plot the two t-SNE dimensions and label the proteins according to their prediction class in order to see where discrepant predictions reside in the landscape. Plot surface proteins with most prevalent annotated functional subclasses and label them according to their subclass to enable comparison to class predictions. Functional annotations came from 'surfy.xlsx'.
cdrago21 / Spectral HomA module for plotting Hong-Ou-Mandel coincidence probabilities for input photons with various spectral properties.
bhuyanamit986 / Exploratory Data AnalysisHere I did EDA on the iris dataset using histograms, scatterplots, probability density function(PDF), cumulative distribution function(CDF), box plots, whisker ports
drphilmarshall / PappyProbability distribution Amplification and Plotting in Python
yadav-vikas / Big Data And Data Science Interview Questionsprobability / statistics /maths / pandas / sql / plots(data analysis) / Machine learning / python code challenge
BryceWayne / DiracA python jupyter notebook that solves the Dirac equation using the Leapfrog scheme. The notebook solves the Dirac equation, plots the wave functions over time, plots the probability density over time and current. There is also a gif creator function at the end to help create gifs from the pictures.
kedarvkunte / Data Visualization And State Prediction Using Bayesian Inference With Markov Chain Monte CarloIn this project, the state of the object is predicted using Bayesian Inference Markov Chain Monte Carlo (MCMC) algorithm. For Data visualization, I used Matplotlib to correctly observe the Convergence to Stationary Distribution by plotting Histogram and Bar Charts. Successfully achieved low Mean Squared Error of 0.0003 for various test cases involving prior-posterior probability.
zelenkastiot / BrownianProject funded by DFG. A jupyter-book that explores mearly a chunk of the field of nonlinear dynamics, specifically diffusion and random search in heterogeneous media. The book has various simulations for the stochastic process known as Brownian motion. The motion dynamics are simulated by solving the Langevin equation numerically for the different initial parameters. After an ensemble finishes, the trajectories of Brownian particles together with the probability density function and the mean square displacement are plotted. The book has chapters on Brownian search, Backbone problems, and Stochastic resetting.