SkillAgentSearch skills...

StockPredictionsWithFinancialNews

A dataset of financial news is used to fine-tune BERT in order to extract investment opportunities.

Install / Use

/learn @altogi/StockPredictionsWithFinancialNews
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- *** Thanks for checking out the Best-README-Template. If you have a suggestion *** that would make this better, please fork the repo and create a pull request *** or simply open an issue with the tag "enhancement". *** Thanks again! Now go create something AMAZING! :D --> <!-- PROJECT SHIELDS --> <!-- *** I'm using markdown "reference style" links for readability. *** Reference links are enclosed in brackets [ ] instead of parentheses ( ). *** See the bottom of this document for the declaration of the reference variables *** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use. *** https://www.markdownguide.org/basic-syntax/#reference-style-links -->

[![LinkedIn][linkedin-shield]][linkedin-url]

<p align="center"> <h1 align="center">Predicting Stock Market Trends with Financial News</h3> <p align="center"> An application of BERT to profitable trading. <br /> <a href="https://github.com/altogi/StockPredictionsWithFinancialNews/blob/main/Prediction_of_Stock_Market_Evolutions_with_Financial_News.ipynb">View Demo</a> · <a href="https://github.com/altogi/StockPredictionsWithFinancialNews/issuess">Report Bug</a> · <a href="https://github.com/altogi/StockPredictionsWithFinancialNews/issues">Request Feature</a> </p> </p> <!-- TABLE OF CONTENTS --> <details open="open"> <summary>Table of Contents</summary> <ol> <li> <a href="#about-the-project">About The Project</a> </li> <li> <a href="#getting-started">Getting Started</a> <ul> <li><a href="#prerequisites">Prerequisites</a></li> <li><a href="#installation">Installation</a></li> </ul> </li> <li><a href="#usage">Usage</a> <ul> <li><a href="#1-selecting-a-dataset">1. Selecting a Dataset</a></li> <li><a href="#2-creating-a-financialnewspredictor-object">2. Creating a `FinancialNewsPredictor` Object</a></li> <li><a href="#3-importing-financial-data">3. Importing Financial Data</a></li> <li><a href="#4-labeling-articles-according-to-price-data">4. Labeling Articles According to Price Data</a></li> <li><a href="#5-defining-the-text-classifier">5. Defining the Text Classifier</a></li> <li><a href="#6-training-and-predicting">6. Training and Predicting</a></li> <li><a href="#7-simulating-a-model-managed-portfolio">7. Simulating a Model-Managed Portfolio</a></li> </ul> </li> <li><a href="#project-structure">Project Structure</a></li> </ol> </details> <!-- ABOUT THE PROJECT -->

About The Project

In the stock market, information is money. Receiving the information first gives one a significant advantage over other traders. Thus, it makes sense that financial news have a great influence over the market.

Given the recent rise in the availability of data, and the apparition of revolutionary NLP techniques, it has been attempted in many occasions to predict market trends based on financial news. The majority of the existing solutions rely on sentiment analysis, assuming that a positive document sentiment is directly related to increases in a security's price, and viceversa. Sentiment is either extracted using a predefined dictionary of tagged words, or by applying deep learning techniques that rely on a large datasets of labeled news. An advanced example of rule-based sentiment analysis is VADER, a model that is sensitive not only to polarity, but also to a document's intensity.

With the recent dawn of the Transformer, it is now possible to extract the sentiment from a document in a much quicker non-sequential procedure, and with the usage of pre-trained models such as BERT, applying these models to a desired use case has never been simpler. An example of this is FinBERT, a text classifier predicting sentiment with a fine-tuned version of BERT.

Nevertheless, sentiment can act as an intermediate factor between the news, and the stock's price. As a result, developing a text classifier to predict sentiment is not as efficient as directly predicting price evolutions, when the objective is to develop a profitable trading strategy.

This work has implemented a text classifier based on BERT, fine-tuned with a dataset of financial news, and trained in order to predict whether a stock's price will rise or fall. As opposed to FinBERT, sentiment is not taken into account. Instead, a set of criteria based on the price evolutions close to the release date of every news article have been applied, in order to label the dataset with which BERT is fine-tuned. Moreover, this work has been developed based on a much more extensive dataset than the one used for FinBERT, thus further capturing the uncertainties of the market. In consequence, it is possible to profitably manage a portfolio relying exclusively on this text classifier, without complementing it with other trading strategies, as many sentiment-based trading applications do.

<!-- GETTING STARTED -->

Getting Started

To get a local copy up and running follow these simple example steps.

Prerequisites

These are some things that you will need before locally setting up this application:

  • Memory Requirements: This project was completely run and tested on Google Colab, making use of its 12GB NVIDIA Tesla K80 GPU. Although the application can be run with less memory-consuming models that do not require a GPU (DistilBERT), it is recommendable to use similar levels of RAM, especially for large datasets.
  • yfinance: Install with the following command.
    pip3 install yfinance
    
  • TensorFlow 2: If not already installed.
    pip3 install tensorflow
    
  • ktrain: Install with the following command.
    pip3 install ktrain
    

Installation

  1. Clone the repo
    git clone https://github.com/altogi/StockPredictionsWithFinancialNews.git
    
  2. Install yfinance and ktrain.
  3. Enjoy!
<!-- USAGE EXAMPLES -->

Usage

For a hands-on demo of the usage of the application, see this notebook, including a step-by-step analysis.

1. Selecting a Dataset

To use this application, a dataset of financial news is needed. This is the dataset on which the text classification model will be trained, and later on validated. This dataset has to have the following features:

  • An identification column id, representing every news article.
  • A column content containing the article's text.
  • A field ticker with the stock ticker of the company mentioned in the article.
  • A column release_date with the date in which the article came out.

A great dataset to use for this application is us_equities_news_dataset.csv from Kaggle's Historical financial news archive. This dataset is a news archive of more than 800 american companies for the last 12 years, and has been used in every step of the development of this project.

2. Creating a FinancialNewsPredictor Object

This object will carry out all of the steps of the application, and thus its correct definition is very important. Besides taking the aforementioned dataset as input, these three parameters are required to ensure the execution goes as desired:

  1. base_directory: This is the root directory in which all of the files generated during the application's execution will be stored. By default, this is set to be the directory in which the application is located.
  2. selection: To tailor the model to a reduced number of companies, it is possible to apply a selection of tickers instead of all of the companies included in the financial news dataset. As a list of strings, one can specify a number of tickers, a number of sectors, or a number of industries, in order to filter the dataset according to such selection.
  3. selection_mode: To specify the selection mode, one must enter either 'ticker', 'industry', or 'sector' for this parameter, thus letting the application know what the selection stands for.

The code snippet below shows how one would create a FinancialNewsPredictor object focusing on companies from the technology or financial services sectors.

f = FinancialNewsPredictor(df_news, 
                         base_directory='./BERTforMarketPredictions',
                         selection=['Technology', 'Financial Services'],
                         selection_mode='sector')

3. Importing Financial Data

Now it is necessary to define an instance of class FinancialDataImporter through a method import_financial_data() of FinancialNewsPredictor, in order to import price data based on the dates and stock tickers associated to each news article. To do so, this class simply uses the Yahoo Finance API.

The only parameter for method import_financial_data() is deltas. This list of integers specifies what market days after each new's release date are considered relevant. The closing market price of each of these relevant dates is extracted for further analysis in the application.

This method also makes sure to store the resulting market data in a table market_data.csv, which itself is also in a folder whose name is determined by the selected deltas. This way, if the same deltas are used in a different occasion, the dataset can be taken advantage of, instead of performing the same computations all over again.

For example, with deltas = [1, 2], one would execute the following line of code:

f.import_financial_data(deltas)

And taking into account the base_directory specified earlier, there would be a table market_data.csv in ./BERTforMarketPredictions/deltas=1,2/, with columns:

  • id
  • ticker
  • sector
  • industry
  • date_base (associated article's release date)
  • date_+1
  • date_+2
  • open_base (opening pr
View on GitHub
GitHub Stars28
CategoryDevelopment
Updated1mo ago
Forks9

Languages

Jupyter Notebook

Security Score

75/100

Audited on Feb 19, 2026

No findings