70 skills found · Page 2 of 3
Hyland / DocumentFiltersDocument Filters is an SDK for applications like content indexing, e-discovery, data migration, and feeding data into AI/ML models by extracting data from unstructured sources. It gives the ability to perform deep inspection, data extraction, output manipulation, and conversion for virtually any type of document, in any programming language.
periscop / ClanChunky Loop Analyzer: A Polyhedral Representation Extraction Tool for High Level Programs
Aironsoft / RimTransProgram for extraction language files from RimWorld mods for edition and translation.
op200 / Simple Subtitle OCRA simple OCR program with GUI for hard subtitles extraction. 内嵌字幕提取
coteyn / Phone Number Location Tracking ToolWith this program you can get the exact location of a phone just by knowing the phone number. It is based on GPS data extraction track bot tracker get
SirYadav1 / AdwanceSNI 2.0AdwanceSNI 2.0 is the enhanced version of the original AdwanceSNI program, designed to provide a comprehensive suite for network scanning and subdomain discovery. It retains the core functionalities of finding subdomains and scanning hosts while introducing new tools for IP extraction, IP generation, and a lite scanner.
ComPDFKit / Compdfkit Api SamplesComPDFKit PDF API is organized around the REST standard and supports various programming languages with rich PDF features, including conversion, document editor, data extraction, and so forth.
RitvikJoshi / Handwritten Mathematical Expression RecognitionProgram to recognize online handwritten mathematical expression. Includes implementation of various feature extraction, segmentation and classification algorithms for example - Geometric features, PCA, HOG, Parzen Shape Context Features, Line of Sight Algorithm, Random Forest etc.
ediloren / FastImpFastImp is a wideband impedance extraction program for 3D geometries
SAP-samples / Btp Cap Dox Invoice ValidationExplore this repository for an extensive invoice validation solution on the SAP Business Technology Platform (SAP BTP). We provide an example showcasing how to leverage the Document Information Extraction Service (DOX) in combination with SAP Cloud Application Programming Model (CAP) to validate PDF-based invoices.
Jai-Agarwal-04 / Sentiment Analysis With InsightsSentiment Analysis with Insights using NLP and Dash This project show the sentiment analysis of text data using NLP and Dash. I used Amazon reviews dataset to train the model and further scrap the reviews from Etsy.com in order to test my model. Prerequisites: Python3 Amazon Dataset (3.6GB) Anaconda How this project was made? This project has been built using Python3 to help predict the sentiments with the help of Machine Learning and an interactive dashboard to test reviews. To start, I downloaded the dataset and extracted the JSON file. Next, I took out a portion of 7,92,000 reviews equally distributed into chunks of 24000 reviews using pandas. The chunks were then combined into a single CSV file called balanced_reviews.csv. This balanced_reviews.csv served as the base for training my model which was filtered on the basis of review greater than 3 and less than 3. Further, this filtered data was vectorized using TF_IDF vectorizer. After training the model to a 90% accuracy, the reviews were scrapped from Etsy.com in order to test our model. Finally, I built a dashboard in which we can check the sentiments based on input given by the user or can check the sentiments of reviews scrapped from the website. What is CountVectorizer? CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). CountVectorizer creates a matrix in which each unique word is represented by a column of the matrix, and each text sample from the document is a row in the matrix. The value of each cell is nothing but the count of the word in that particular text sample. What is TF-IDF Vectorizer? TF-IDF stands for Term Frequency - Inverse Document Frequency and is a statistic that aims to better define how important a word is for a document, while also taking into account the relation to other documents from the same corpus. This is performed by looking at how many times a word appears into a document while also paying attention to how many times the same word appears in other documents in the corpus. The rationale behind this is the following: a word that frequently appears in a document has more relevancy for that document, meaning that there is higher probability that the document is about or in relation to that specific word a word that frequently appears in more documents may prevent us from finding the right document in a collection; the word is relevant either for all documents or for none. Either way, it will not help us filter out a single document or a small subset of documents from the whole set. So then TF-IDF is a score which is applied to every word in every document in our dataset. And for every word, the TF-IDF value increases with every appearance of the word in a document, but is gradually decreased with every appearance in other documents. What is Plotly Dash? Dash is a productive Python framework for building web analytic applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It's particularly suited for anyone who works with data in Python. Dash apps are rendered in the web browser. You can deploy your apps to servers and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready. Dash is an open source library, released under the permissive MIT license. Plotly develops Dash and offers a platform for managing Dash apps in an enterprise environment. What is Web Scrapping? Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Running the project Step 1: Download the dataset and extract the JSON data in your project folder. Make a folder filtered_chunks and run the data_extraction.py file. This will extract data from the JSON file into equal sized chunks and then combine them into a single CSV file called balanced_reviews.csv. Step 2: Run the data_cleaning_preprocessing_and_vectorizing.py file. This will clean and filter out the data. Next the filtered data will be fed to the TF-IDF Vectorizer and then the model will be pickled in a trained_model.pkl file and the Vocabulary of the trained model will be stored as vocab.pkl. Keep these two files in a folder named model_files. Step 3: Now run the etsy_review_scrapper.py file. Adjust the range of pages and product to be scrapped as it might take a long long time to process. A small sized data is sufficient to check the accuracy of our model. The scrapped data will be stored in csv as well as db file. Step 4: Finally, run the app.py file that will start up the Dash server and we can check the working of our model either by typing or either by selecting the preloaded scrapped reviews.
gccheng / ActiongraphAction recognition based on action graph, which describes the spatio-temporal relationship between dense trajectory clusters. The program consists of three parts: 1) Dense trajectory extraction based on Wang Heng's CVPR paper; 2) Action graph construction; 3) Video classification based on action recognition.
DARPA-CRITICALMAAS / Uncharted Ta1This repository contains Uncharted's TA1 contributions for DARPA's CriticalMAAS program. The main goals are automated feature extraction and georeferencing of geologic maps.
post-kerbin-mining-corporation / SpaceDustAdds atmospheric and exoatmospheric resource discovery and extraction to Kerbal Space Program.
marjanmo / XsectionProgram for profile extraction from Digital elevation models
jdank417 / Deep Learning For Stock Market PredictionsThis program provides a comprehensive pipeline for stock price prediction, integrating CNN for feature extraction and LSTM for sequence modeling, demonstrating a hybrid approach to capture both spatial and temporal patterns in stock data.
XiaokangLei / ImageRetrievalWith the large-scale image database in the field of science and medicine, as well as in the field of advertising and marketing, it becomes very important to organize the image database and the effective retrieval method. This paper mainly introduces the B/S architecture, focuses on Content-Based Images Retrieval technology, introduces the basic features of image low-level acquisition and corresponding retrieval matching algorithm, including graphics color, local texture and shape characteristics, the overall work summarized as follows: The main work of this paper can be divided into three parts: the first part focuses on the extraction of RGB and HSV two color space one-and three-dimensional color histogram features, and the use of Pap coefficient method and Euclidean distance method to calculate the similarity of different images; in the second part, we use the improved "joint mode" to obtain the texture characteristics of each part of the image by using Locality Binary Pattern. Uniform Pattern, extracting image texture features, using Euclidean distance to calculate image similarity; The third part studies the Shape feature extraction method based on image Edge Direction Histogram, the feature vectors obtained by this method satisfy the size transformation of different graphs, the translation of images and the invariant characteristics of rotation. Based on the study of the three kinds of feature extraction algorithms, this system uses the Struts2 framework based on B/S architecture, implements the different algorithms using the Java programming language, and completes the content-based image retrieval system. The System tested Image Library contains 2400 commonly used test images, which can be retrieved in the form of local uploaded images. The search conditions for the various image features described above, this article elaborated on the different characteristics of flowers, beaches, buses, elephants and other categories of image retrieval effects, and analysis of different search methods and the advantages and disadvantages of the relevant improvement methods.
sideeffectdk / RT MIR OSCAudio input -> real-time analysis -> OSC output. Takes in real-time audio, does feature extraction using smart algorithms then sends out OSC to be used in other programs.
deepthinking-qichao / A Program That Simultaneously Implements Face Detection And Human Skeleton ExtractionA program that simultaneously implements face detection and human skeleton extraction
gauravjain2 / FaceExtractionPython Program for Face Extraction from webcam image