FinancialDatasets
SmoothNLP 金融文本数据集(公开) Public Financial Datasets for NLP Researches Only
Install / Use
/learn @smoothnlp/FinancialDatasetsREADME
FinancialDatasets
SmoothNLP 金融文本数据集(公开) | Public Financial Datasets for NLP Researches
数据一览
由于github存储有限, 如需全量数据集, 请联系: contact@smoothnlp.com
| 数据名称 | 数据字段 | 样本量 | 总量 | 下载链接 |
| ----- | ------ | ----- | ----- | ----- |
| 企业工商信息 | 名称,公司名称,公司介绍,工商,地址,工商注册id,成立时间,法人代表,注册资金,统一信用代码,网址 | 1万 | 50万 - (上市及中小型企业) |下载 |
| 金融讯息新闻 | title-新闻标题,content-新闻内容,pub_ts-发稿日期 | 2万 | 210万 | 下载 |
| 专栏资讯 | title-新闻标题,content-新闻内容,pub_ts-发稿日期 | 1万 | 58万 | 下载 |
| 投资机构信息 | 机构名称,介绍,行业,规模,轮次| 1K | 3万 | 下载 |
| 投资事件 | 事件资讯,投资方,融资方,融资事件,轮次,金额 | 2K | 7万 | 下载 |
|36氪新闻| title-新闻标题,content-新闻内容,url-网址 |1万|11万|下载
推荐研究方向
- Embedding (Word2Vec, Bert, 等)
- 实体识别 - NER
- 无监督聚类: 基于企业描述信息, 进行竞品聚类
- 企业行业分类
- 标题总结 - Text Summary
- 序列分类 - Sequence Classification
数据展示
投资机构

投资事件

企业工商信息

金融资讯新闻

专栏资讯

36氪新闻
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
Security Score
Audited on Apr 3, 2026
