GHOST
Highlights: (1) Hierarchical gating integrates market sentiment extracted from news. (2) Stock parameter-sharing Mamba models temporal dependencies efficiently. (3) Attention between stock tokens was computed based on Stock-wise Tokenization. (4) Tests show GHOST outperforms stock models in both Chinese and U.S. markets.
Install / Use
/learn @ICT-ZWJ/GHOSTREADME
Readme👻
News🎇This paper has been accepted by ESWA
Weijie Zhu, Liang Xie, Hanyu Fu, Jiangling Zhang,
GHOST: Sentiment-gated mamba and stock-wise tokenization attention for enhanced stock prediction,
Expert Systems with Applications,
Volume 301,
2026,
130474,
ISSN 0957-4174,
https://doi.org/10.1016/j.eswa.2025.130474.
https://doi.org/10.1016/j.eswa.2025.130474
Abstract
Stock trend prediction faces significant challenges due to market sentiment influences, long-term dependencies, and time-varying stock correlation. Specifically, (1) Current sentiment analysis methods exhibit insufficient market sentiment quantification and lack dynamic adaptive integration mechanismsfor market fluctuations. (2) Transformer-based models’ quadratic complexity limits long-term stock prediction while lacking finance-specific temporal inductive biases. (3)Traditional temporal tokenization paradigms forcibly merge multi-stock features, weakening stock correlation modeling while dramatically increasing computational costs. To address these constraints, we present GHOST (Gated Hybrid Organization with Sentiment-guided Temporal Mamba and Stock-wise Tokenization Attention). In particular, we leverage GDELT(Global Database of Events, Language and Tone) sentiment quantification through a Hierarchical Sentiment-Gated Layer for dynamic fusion of affective features with trading data. Additionally, Intra-Stock Mamba Selection Layer achieves time-series linear complexity for long-term forecasting by combining a dynamically parameterized shared state space model to provide specialized financial inductive biases. Moreover, our Stock-wise Tokenization Layer converts temporal tokens into stock tokens while preserving data integrity, enabling Inter-Stock Attention Layer to capture stock correlation via attention between stock tokens and reducing attention computation complexity. Experimental results on real-world stock datasets demonstratethe effectiveness of our model
Figure 1: Architecture of GHOST: Sentiment-Gated Mamba with Stock-wise Tokenization for Multi-Stock Prediction.
Figure 2: GDELT-based Sentiment Quantification
Figure 3: Backtesting Pipeline
Usage
i. First, configure the environment according to the requirement.txt file
Ensure that the following libraries have aligned versions:
- causal-conv1d==1.1.0
- mamba-ssm==1.1.1
- torch==2.1.1+cu118
- torchvision==0.16.1+cu118
- torchaudio==0.16.1+cu118
ii. Download stock data to .\dataset\stock_data
market sentiment data to .\dataset
iii. Finally, python run.py .
💡💡💡 Don't forget to modify the number of input stocks and the number of features.Considering that mamba-ssm depends on a Linux environment, we used Ubuntu 22.04 with miniconda as the basic environment. I have also placed the wheels of these two libraries in the corresponding links, hoping this will make the reproduction of this work more convenient and quick.
❄[Update opensource wheel]:Baidu
❄[Update opensource wheel]:OneDrive
Dataset
We provide stock data from two markets: CSI300 and NASDAQ100. After data preprocessing, 189 and 64 stocks remain respectively, along with corresponding market sentiment data: CHN_NEWS_sentiment.csv and USA_NEWS_sentiment.csv.
🔥[Update opensource data]: Baidu
🔥[Update opensource data]:OneDrive
The market sentiment information of China and the United States is time-step aligned with the stock data of CSI300 and NASDAQ100 respectively. You can choose any connection to download the data.
At the same time, you need to modify sentiment_path="your root" in .\data_provider\data_loader to adapt to the file path of sentiment data.Also, note that there are some differences in the feature engineering of NASDAQ100 and CSI300 stock data, so you need to modify the feature_columns in .\data_provider\data_loaderaccordingly.
🚀Finally, the complete code for market sentiment data will also be open-sourced in the future.<br>
Build by Weijie Zhu and Jiangling Zhang
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
