[![LinkedIn][linkedin-shield]][linkedin-url]

<p align="center"> <h1 align="center">Predicting Stock Market Trends with Financial News</h3> <p align="center"> An application of BERT to profitable trading. <br /> <a href="https://github.com/altogi/StockPredictionsWithFinancialNews/blob/main/Prediction_of_Stock_Market_Evolutions_with_Financial_News.ipynb">View Demo</a> · <a href="https://github.com/altogi/StockPredictionsWithFinancialNews/issuess">Report Bug</a> · <a href="https://github.com/altogi/StockPredictionsWithFinancialNews/issues">Request Feature</a> </p> </p>  <details open="open"> <summary>Table of Contents</summary> <ol> <li> <a href="#about-the-project">About The Project</a> </li> <li> <a href="#getting-started">Getting Started</a> <ul> <li><a href="#prerequisites">Prerequisites</a></li> <li><a href="#installation">Installation</a></li> </ul> </li> <li><a href="#usage">Usage</a> <ul> <li><a href="#1-selecting-a-dataset">1. Selecting a Dataset</a></li> <li><a href="#2-creating-a-financialnewspredictor-object">2. Creating a `FinancialNewsPredictor` Object</a></li> <li><a href="#3-importing-financial-data">3. Importing Financial Data</a></li> <li><a href="#4-labeling-articles-according-to-price-data">4. Labeling Articles According to Price Data</a></li> <li><a href="#5-defining-the-text-classifier">5. Defining the Text Classifier</a></li> <li><a href="#6-training-and-predicting">6. Training and Predicting</a></li> <li><a href="#7-simulating-a-model-managed-portfolio">7. Simulating a Model-Managed Portfolio</a></li> </ul> </li> <li><a href="#project-structure">Project Structure</a></li> </ol> </details>

About The Project

In the stock market, information is money. Receiving the information first gives one a significant advantage over other traders. Thus, it makes sense that financial news have a great influence over the market.

Given the recent rise in the availability of data, and the apparition of revolutionary NLP techniques, it has been attempted in many occasions to predict market trends based on financial news. The majority of the existing solutions rely on sentiment analysis, assuming that a positive document sentiment is directly related to increases in a security's price, and viceversa. Sentiment is either extracted using a predefined dictionary of tagged words, or by applying deep learning techniques that rely on a large datasets of labeled news. An advanced example of rule-based sentiment analysis is VADER, a model that is sensitive not only to polarity, but also to a document's intensity.

With the recent dawn of the Transformer, it is now possible to extract the sentiment from a document in a much quicker non-sequential procedure, and with the usage of pre-trained models such as BERT, applying these models to a desired use case has never been simpler. An example of this is FinBERT, a text classifier predicting sentiment with a fine-tuned version of BERT.

Nevertheless, sentiment can act as an intermediate factor between the news, and the stock's price. As a result, developing a text classifier to predict sentiment is not as efficient as directly predicting price evolutions, when the objective is to develop a profitable trading strategy.

This work has implemented a text classifier based on BERT, fine-tuned with a dataset of financial news, and trained in order to predict whether a stock's price will rise or fall. As opposed to FinBERT, sentiment is not taken into account. Instead, a set of criteria based on the price evolutions close to the release date of every news article have been applied, in order to label the dataset with which BERT is fine-tuned. Moreover, this work has been developed based on a much more extensive dataset than the one used for FinBERT, thus further capturing the uncertainties of the market. In consequence, it is possible to profitably manage a portfolio relying exclusively on this text classifier, without complementing it with other trading strategies, as many sentiment-based trading applications do.

Getting Started

To get a local copy up and running follow these simple example steps.

Prerequisites

These are some things that you will need before locally setting up this application:

Memory Requirements: This project was completely run and tested on Google Colab, making use of its 12GB NVIDIA Tesla K80 GPU. Although the application can be run with less memory-consuming models that do not require a GPU (DistilBERT), it is recommendable to use similar levels of RAM, especially for large datasets.
yfinance: Install with the following command.
```
pip3 install yfinance
```
TensorFlow 2: If not already installed.
```
pip3 install tensorflow
```
ktrain: Install with the following command.
```
pip3 install ktrain
```

Installation

Clone the repo

git clone https://github.com/altogi/StockPredictionsWithFinancialNews.git

Install yfinance and ktrain.
Enjoy!

Usage

For a hands-on demo of the usage of the application, see this notebook, including a step-by-step analysis.

1. Selecting a Dataset

To use this application, a dataset of financial news is needed. This is the dataset on which the text classification model will be trained, and later on validated. This dataset has to have the following features:

An identification column id, representing every news article.
A column content containing the article's text.
A field ticker with the stock ticker of the company mentioned in the article.
A column release_date with the date in which the article came out.

A great dataset to use for this application is us_equities_news_dataset.csv from Kaggle's Historical financial news archive. This dataset is a news archive of more than 800 american companies for the last 12 years, and has been used in every step of the development of this project.

2. Creating a `FinancialNewsPredictor` Object

This object will carry out all of the steps of the application, and thus its correct definition is very important. Besides taking the aforementioned dataset as input, these three parameters are required to ensure the execution goes as desired:

base_directory: This is the root directory in which all of the files generated during the application's execution will be stored. By default, this is set to be the directory in which the application is located.
selection: To tailor the model to a reduced number of companies, it is possible to apply a selection of tickers instead of all of the companies included in the financial news dataset. As a list of strings, one can specify a number of tickers, a number of sectors, or a number of industries, in order to filter the dataset according to such selection.
selection_mode: To specify the selection mode, one must enter either 'ticker', 'industry', or 'sector' for this parameter, thus letting the application know what the selection stands for.

The code snippet below shows how one would create a FinancialNewsPredictor object focusing on companies from the technology or financial services sectors.

f = FinancialNewsPredictor(df_news, 
                         base_directory='./BERTforMarketPredictions',
                         selection=['Technology', 'Financial Services'],
                         selection_mode='sector')

3. Importing Financial Data

Now it is necessary to define an instance of class FinancialDataImporter through a method import_financial_data() of FinancialNewsPredictor, in order to import price data based on the dates and stock tickers associated to each news article. To do so, this class simply uses the Yahoo Finance API.

The only parameter for method import_financial_data() is deltas. This list of integers specifies what market days after each new's release date are considered relevant. The closing market price of each of these relevant dates is extracted for further analysis in the application.

This method also makes sure to store the resulting market data in a table market_data.csv, which itself is also in a folder whose name is determined by the selected deltas. This way, if the same deltas are used in a different occasion, the dataset can be taken advantage of, instead of performing the same computations all over again.

For example, with deltas = [1, 2], one would execute the following line of code:

f.import_financial_data(deltas)

And taking into account the base_directory specified earlier, there would be a table market_data.csv in ./BERTforMarketPredictions/deltas=1,2/, with columns:

id
ticker
sector
industry
date_base (associated article's release date)
date_+1
date_+2
open_base (opening pr

StockPredictionsWithFinancialNews

Install / Use

README