Leaderboard
Zero-shot evaluation, temperature=0, 1% numerical tolerance. Click column headers to sort.
| # | Model | Overall ▼ | Numerical | Consistency | Temporal | Params |
|---|
Subtask Breakdown
Submit Your Results
How to evaluate and submit
- Install jfinqa:
pip install jfinqa - Generate predictions as a JSON file mapping question IDs to answers:
{ "nr_001": "25.0%", "nr_002": "16.0%", "cc_001": "Yes", ... } - Evaluate locally:
jfinqa evaluate -p predictions.json -o results.json
- Open an issue on GitHub with:
- Model name and provider
- Parameter count (if known)
- Your
results.jsonfile - Any additional details (prompt template, few-shot, etc.)
About
jfinqa evaluates LLMs on multi-step numerical reasoning over real Japanese corporate financial statements from EDINET. Questions require 2–6 step arithmetic including DuPont decomposition, margin analysis, and YoY growth calculations. The benchmark covers J-GAAP (58%), IFRS (38%), and US-GAAP (4%) accounting standards.
Citation
@dataset{ogawa2025jfinqa,
title = {jfinqa: Japanese Financial Numerical Reasoning QA Benchmark},
author = {Ogawa, Saichi},
year = {2025},
url = {https://github.com/ajtgjmdjp/jfinqa},
license = {Apache-2.0}
}