Xflow
A Real-Time Tweet Streaming Pipeline (Using Flume, Kafka & Spark-Streaming) with Deep Learning Sentiment Analysis Model for instant scoring.
Install / Use
/learn @sohamray19/XflowREADME
Xflow
A Real-Time Tweet Streaming Pipeline with Deep Learning Sentiment Analysis Model for instant scoring.
- The Real-Time Tweet Streaming Pipeline is built using Apache Flume, Apache Kafka & Spark-Streaming.
- The LSTM based Sentiment Analysis Model is built using Keras with Tensorflow Backend.(Uses Word-Embeddings)
- This Model is exposed as a RESTful Service which enables flexible usage.
Usage
Clone this repo on your system. Ensure maven is installed on your system for building it. Go to root directory of the project and run.
mvn clean install
- This trains the LSTM based Deep Learning Sentiment Analysis Model and exports it as a RESTful service.</br>
- The training dataset is 'Sentiment Analysis Dataset.csv' downloaded from https://bit.ly/1TVSjsF .</br>
- The RESTful Service is hosted on http://localhost:5003/ </br>
The Sentiment prediction for any Tweet/Sentence can be obtained by sending a POST request given as follows:
curl --header "Content-Type: application/json" --request POST --data '{"data":"YOUR TWEET HERE"}' http://localhost:5003/
For Real-Time sentiment analysis of tweets, the streaming data pipeline is built as follows:
- Download Confluent Open Source from https://www.confluent.io/download/ (Tested on v5.0). Extract it and inside the directory, run the following command:
bin/confluent start
- This will start Kafka, Schema Registry, Zookeeper etc.
-
Download and extract flume binary file from https://flume.apache.org/download.html </br>
-
Clone cloudera twitter-example-github repo from https://github.com/cloudera/cdh-twitter-example
- The flume-sources directory contains a Maven project with Cloudera custom Flume source designed to connect to the Twitter Streaming API and ingest tweets in a raw JSON format.
- To build the flume-sources JAR, from the root of the git repository:
$ cd flume-sources
$ mvn package
$ cd ..
-
Add the JAR to the Flume classpath. Copy
flume-sources-1.0-SNAPSHOT.jartoapache-flume-latest-version-bin/plugins.d/twitter-streaming/lib/ -
Tweets are ingested in raw JSON format and pushed to a Kafka sink. Flume configurations are set in FlumeConfig.conf and Agent is set as Twitter Agent.
- To start flume, run
bin/flume-ng agent --conf conf --conf-file FlumeConfig.conf --name TwitterAgent -Dflume.root.logger=INFO,console
- Run Score.scala
- Spark Streaming is used to consume tweets from the Kafka queue. Sentiment prediction for each of the tweets is obtained by sending a POST request to the RESTful Service as described above.
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
17.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
