FinEval
The FinEval financial domain evaluation benchmark, based on quantitative fundamental methods and developed through long-term objective research, summarization, and rigorous manual screening, utilizes over 26,000 diverse question types that are highly consistent with real-world application scenarios.
Install / Use
/learn @SUFE-AIFLM-Lab/FinEvalREADME
🌐Website | 🤗Hugging Face | 📃Paper
</div>Welcome to FinEval
Currently, while Large Language Models (LLMs) demonstrate excellent performance in general domains, their security and complex task processing capabilities in the highly specialized and risk-sensitive financial industry remain uncertain. This paper introduces FinEval, a pioneering Chinese benchmark dataset constructed to comprehensively evaluate the professional capabilities and security of LLMs in the financial domain, providing a solid foundation for addressing this challenge.
The FinEval financial domain evaluation benchmark, based on quantitative fundamental methods and developed through long-term objective research, summarization, and rigorous manual screening, utilizes over 26,000 diverse question types that are highly consistent with real-world application scenarios. These include multiple-choice questions, subjective and objective short-answer questions, reasoning and planning tasks, and retrieval-based question answering, covering financial academic knowledge, financial industry knowledge, financial security knowledge, financial intelligent agents, financial multi-modality, and financial rigor. It aims to comprehensively examine the overall application capabilities of large models in the financial domain. To ensure a comprehensive evaluation of model performance, FinEval combines subjective and objective scoring standards in its textual capability tests, including Accuracy, Rouge-L, and detailed expert evaluation criteria. It employs zero-shot, five-shot, zero-shot Chain-of-Thought (CoT), and five-shot CoT methods for evaluation.
By evaluating state-of-the-art LLMs on FinEval, the textual performance results show that Claude 3.5-Sonnet, under a zero-shot setting, achieved the highest average score of 72.9 across all financial domain tasks, indicating significant growth potential for LLMs in financial domain knowledge. In the multi-modal performance results, Qwen-VL-max performed the best among all evaluated models, achieving an average score of 76.3 and securing the top scores among evaluated models in ten sub-scenarios. This strongly suggests that Qwen-VL-max possesses stable and robust capabilities across multi-modal business scenarios of varying depths within finance. Our work provides a more comprehensive benchmark for financial knowledge assessment, utilizing common images from financial business scenarios, simulated examination data, and some open-ended questions, covering a broad scope of LLM evaluation.
Content
FinEval
- Financial Academic Knowledge
- Financial Industry Knowledge
- Financial Security Knowledge
- Financial Agent
- Financial Multimodal Capabilities
- Financial Rigor Testing
- Text Performance Leaderboard
- Multimodal Performance Leaderboard
Usage
- Installation
- Evaluation
- Dataset Preparation
- Supporting New Datasets and Models
- How to Submit
- Citation
FinEval
Financial Academic Knowledge
Financial Academic Knowledge is a collection of high-quality multiple-choice questions that encompass various fields such as Finance, Economy, Accounting, and Certificate. It consists of 4,661 questions covering 34 different academic subjects. FinEval aims to provide a comprehensive benchmark for assessing knowledge in financial academia. It utilizes simulated exam data and covers a wide range of evaluation scopes for large language models.
<div align="center"> <img src="docs/en/_static/image/subjects.png" width="700px" height="340px"/> <br /> <br /></div>Here are some examples of data for Financial Academic Knowledge:
Example of Insurance in Finance:
问题:保险合同辅助人不包括____。
Question: The insurance policy does not include an assistant for ______.
A.保险代理人 B.受益人 C.保险经纪人 D.保险公估人
A. Insurance agent B. Beneficiary C. Insurance broker D. Insurance appraiser
答案:B
Answer: B
Example of International Economics in Economy:
问题:从中间产品市场不完全性角度研究跨国公司对外投资的理论是____。
Question: The theory that studies the foreign investment of multinational corporations from the perspective of incomplete markets for intermediate goods is ______.
A.垄断优势理论 B.内部化理论 C.区位优势理论 D.边际产业转移理论
A. Monopolistic Advantage Theory B. Internalization Theory C. Location Advantage Theory D. Marginal Industry Transfer Theory
答案:B
Answer: B
Example of Auditing in Accounting:
问题:下列不属于公众利益实体的是____。
Question: Which of the following is not considered a public interest entity _____.
A.保险公司 B.全国大型医药连锁店 C.上市公司 D.个体工商户
A. Insurance company B. National chain of large pharmacies C. Listed company D. Individual business owner
答案:D
Answer: D
Example of China Actuary in Certificate:
问题:张先生辞去月薪1000元的工作,取出自有存款100000元(月息1%),办一家独资企业,如果不考虑商业风险,则张先生自办企业按月计算的机会成本是____元。
Question: Mr. Zhang resigns from a job with a monthly salary of 1,000 yuan and withdraws 100,000 yuan from his personal savings (with a monthly interest rate of 1%) to start a sole proprietorship. If we don't consider business risks, the opportunity cost of Mr. Zhang starting his own business, calculated on a monthly basis, is ____ yuan.
A.2000 B.10000 C.1000 D.101000
答案:A
Answer: A
Financial Industry Knowledge
Financial Industry Knowledge is a collection of high-quality text-based questions and answers, covering areas such as investment advisory, investment research, and financial operations. It consists of 1,434 questions, encompassing 10 different industry application scenarios. FinEval provides a more comprehensive benchmark for assessing the knowledge capabilities of large language models in the financial industry. The dataset is constructed using a combination of web scraping from financial websites and generation by GPT-4. It aims to evaluate the generalization ability of models in various application scenarios.
<div align="center"> <img src="docs/en/_static/image/v2.png" width="600px" height="250px"/> <br /> <br /></div>Here are some examples of data for Financial Industry Knowledge:
Example of Financial Investment Advice in Investment Advisory:
问题:我拥有一笔较大的债券投资,但近期市场利率波动较大,应该如何调整债券投资策略?
Question: I have a significant bond investment, but the market interest rates have been fluctuating recently. How should I adjust my bond investment strategy?
答案: 市场利率波动对债券投资有显著影响,建议:
1.债券种类:根据市场利率走势,选择合适种类的债券。在高利率环境下,考虑投资于长期债券以获取更高利息。
2.持有期限:根据个人投资目标和市场预期,调整债券投资的平均持有期限,以适应不同利率环境。
3.调整到期结构:管理债券组合的到期结构,确保在不同期限内有合适的债券分布,以降低重投资风险。
4.关注经济数据:密切关注经济数据和货币政策动向,这些因素会影响市场利率,帮助预测债券市场的走势。
Answer: Market interest rate fluctuations have a significant impact on bond investments. Here are some recommendations:
1.Bond Types: Based on the trend of market interest rates, select appropriate types of bonds. In a high-interest-rate environment, consider investing in long-term bonds to earn higher interest.
2.Holding Period: Adjust the average holding period of bond investments according to personal investment goals and market expectations to adapt to different interest rate environments.
3.Adjust Maturity Structure: Manage the maturity structure of the bond portfolio to ensure a suitable distribution of bonds across different timeframes, reducing reinvestment risk.
4.Monitor Economic Data: Keep a close eye on economic data and monetary policy trends as these factors can influence market interest rates and help predict the direction of the bond market.
Example of Financial Text Summarization in Investment Research:
问题:请根据上下文给出的中文短新闻,生成对应的不超过20个字的摘要。上下文:雷神技术(Raytheon Technologies Corp)周一表示,董事会已授权一项最高达60亿美元的股票回购计划。这家航空航天和国防公司表示,新的授权取代了该公司2021年12月7日批准的前一个计划。截至上周五,雷神技术拥有14.7亿股流通在外股。该公司今年1月曾表示,2021年回购了23亿美元的股票。
Question: Please generate a summary in no more than 20 words based on the given Chinese news context. Context: Raytheon Technologies Corp announced on Monday that its board has authorized a stock repurchase plan of up to $6 billion. The aerospace and defense company stated that the new authorization replaces the previous plan approved on December 7, 2021. As of last Friday, Raytheon Technologies had 1.47 billion shares outstanding. The company had previously announced repurchasing $2.3 billion worth of stock in 2021.
答案: 雷神技术批准60亿美元的股票回购计划
Answer: Raytheon Technologies approves $6 billion stock repurchase plan.
Example of Financial Event Extraction in Financial Operations:
问题:上下文:【北方国际:子公司拟与一机进出口签订4240万元采购合同】财联社11月10日电,北方国际公告,全资子公司中国北方车辆有限公司拟与内蒙古一机集团进出口有限责任公司(简称“一机进出口”)签订三项《采购合同》,从一机进出口采购车辆备件以及钻杆、钻机配件、钻铤等石油勘探开发钻具,合同金额合计4240万元。问题:签定采购合同的事件主体有哪些?请根据此上下文及问题,回答答案。
Question: Context: [North International: Subsidiary intends to sign a 42.4 million yuan procurement contract with Yiji Import and Export] Caixin, November 10th - North International announced that its wholly-owned subsidiary, China North Vehicle Co., Ltd., intends to sign three "Procurement Contracts" with Inner Mongolia Yiji Group Import and Export Co., Ltd. ("Yiji Import and Export") to purchase vehicle spare parts, as well as oil exploration and development drilling tools such as drilling rods, drilling machine accessories, and drill bits. The total contract amount is 42.4 million yuan. Question: What are the entities involved
