16 skills found
myscale / MyScaleDBA @ClickHouse fork that supports high-performance vector search and full-text search.
aryn-ai / Sycamore🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
qminer / QminerAnalytic platform for real-time large-scale streams containing structured and unstructured data.
neostrange / Text2graphsA Python framework for automating domain-agnostic and domain-specific Knowledge Graphs from unstructured text. Integrates NLP and Neo4j for entity extraction, relationship mapping, and semantic enrichment. Ideal for text mining and analytics, with support for temporal and event tagging.
TsinghuaDatabaseGroup / Unifyan unstructured data analytics systems via LLM
MachineLearning-Nerd / Machine Learning For Healthcare AnalyticsMachine Learning (ML) has changed the way organizations and individuals use data to improve the efficiency of a system. ML algorithms allow strategists to deal with a variety of structured, unstructured, and semi-structured data. Machine Learning for Healthcare Analytics Projects is packed with new approaches and methodologies for creating powerful solutions for healthcare analytics.
IBM / Spark On ZOSIn this journey we demonstrate running an analytics application using Spark on z/OS. Apache Spark on z/OS is in-place, optimized abstraction and real-time analysis of structured and unstructured enterprise data
databricks-industry-solutions / Ocr Phi MaskingOur joint Solution Accelerator with John Snow Labs automates the detection of sensitive information contained within unstructured data using NLP models for healthcare. Extracted data is stored within the Lakehouse, where teams can use the pre-trained models to easily remove, obfuscate or mask data for downstream analytics at massive scale.
akashdathan / Opencalais TaggingOpen Calais attaches intelligent metadata-tags to your unstructured content, enabling powerful text analytics. The Open Calais natural language processing engine automatically analyzes and tags your input files in such a way that your consuming application can both easily pinpoint relevant data, and effectively leverage the invaluable intelligence and insights contained within the text.
bellevue-university / Dsc360Student Code for DSC360 - Data Mining: Text Analytics and Unstructured Data
microsoft / CELA OGC Intelligent FeedbackDemonstrates augmenting unstructured feedback text with Azure Cognitive Services text analytics service. You can see a live demonstration of this project here: https://aka.ms/IntelligentFeedback.
ibm-cloud-architecture / Refarch Cognitive AnalyticsPresent a reference implementation for a business application linking cognitive and analytics to learn customer's behavior and assess customer risk to churn. It is based on structured data, machine learning algorithm, data movement, and cognitive services for classifying unstructured data.
arkavb / Natural Language Processing Of Company Review DataIn this project we will be classifying reviews given by the employers for the employee or the company as positive or negative reviews. The dataset contains 67,529 rows and 15 columns. The dataset has information primarily regarding the company, position, date, pros and cons. This project can help a company analyze the ratio of employees that are satisfied or not satisfied with their work environment. This can help in future improvements and help give a better experience to the future employees. Using the word cloud for positive and negative sentiment, they can better understand which problems are more precarious than the others and focus on them rather than those that don’t need immediate attention. This can also be leveraged by rival companies to understand the problems of the competition to avoid those themselves. The positive views can also be used extensively to understand why the competition may be prospering and can be incorporated into a company’s work culture for a holistic work experience. Using sentiment analysis on reviews of any kind can help in understanding the deep-seated issues with a product or a workplace and can also be used to optimize on all the things that are going right and strive towards excellence. Steps: Data collection: the first step of sentiment analysis consists of collecting data from user generated content contained in blogs, forums, social networks and text analytics and natural language processing are used to extract and classify. In our case it is collected from Kaggle. Text preparation: consists of cleaning the extracted data before analysis. We will be using techniques such as bag of words and lemmatization. Feature Extraction: the extracted sentences of the reviews and opinions are examined. Use word embedding (count vectorizer, tf-idf transformation, Word2Vec) to transform reviews into numerical representations. Machine learning classifier: Fit numerical representations of reviews to machine learning algorithms. We will be using Naïve Bayes, Logistic Regression, Random Forest and LSTM. Sentiment classification: Subjective sentences are classified in positive, negative, good or bad. Presentation of output: the main objective of sentiment analysis is to convert unstructured text into meaningful
IBM / Cognos Analytics Using Unstructured DataUsing Discovery data in Cognos Analytics
bigconnect / ClavinCLAVIN (*Cartographic Location And Vicinity INdexer*) is an open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution. It combines a variety of open source tools with natural language processing techniques to extract location names from unstructured text documents and resolve them against gazetteer records. Importantly, CLAVIN does not simply "look up" location names; rather, it uses intelligent heuristics-based combinatorial optimization in an attempt to identify precisely which "Springfield" (for example) was intended by the author, based on the context of the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled location names, and it recognizes alternative names (e.g., "Ivory Coast" and "Côte d'Ivoire") as referring to the same geographic entity. By enriching text documents with structured geo data, CLAVIN enables hierarchical geospatial search and advanced geospatial analytics on unstructured data.
atifferoz / Data Warehouse And Business Intelligence Project On Tuberculosis WHO Implemented Data Warehouse and Business Intelligence project using different structured and unstructured data sources of various Incidents and Mortality rates of Tuberculosis for 197 countries around the world. The aim was to analyze the trends in mortality and incidence rates in countries around the world for tuberculosis. Data was web scrapped, cleansed and loaded using ETL designed star schema and deployed OLAP cube. Non-trivial BI queries were generated. First of all the data was extracted, cleaned and transformed using R language and further injected and loaded into SSMS where dimension tables were created using Insert query task. Kimbell's bottom-up approach was used to design the star schema in SSIS. Finally the cube was deployed in SSAS. Tableau was used for visual analytics to create dashboards. Technologies used: MS SQL, SQL Server Integration Services, SQL Server Analysis Services, Tableau. Video link of execution with explanation is available.