jfinqa Leaderboard - Japanese Financial Numerical Reasoning QA

Leaderboard

Zero-shot evaluation, temperature=0, 1% numerical tolerance. Click column headers to sort.

#	Model	Overall ▼	Numerical	Consistency	Temporal	Params

Subtask Breakdown

Submit Your Results

How to evaluate and submit

Install jfinqa: pip install jfinqa

Generate predictions as a JSON file mapping question IDs to answers:

{
  "nr_001": "25.0%",
  "nr_002": "16.0%",
  "cc_001": "Yes",
  ...
}

Evaluate locally:

jfinqa evaluate -p predictions.json -o results.json

Open an issue on GitHub with:
- Model name and provider
- Parameter count (if known)
- Your results.json file
- Any additional details (prompt template, few-shot, etc.)

About

jfinqa evaluates LLMs on multi-step numerical reasoning over real Japanese corporate financial statements from EDINET. Questions require 2–6 step arithmetic including DuPont decomposition, margin analysis, and YoY growth calculations. The benchmark covers J-GAAP (58%), IFRS (38%), and US-GAAP (4%) accounting standards.

Citation

@dataset{ogawa2025jfinqa,
  title   = {jfinqa: Japanese Financial Numerical Reasoning QA Benchmark},
  author  = {Ogawa, Saichi},
  year    = {2025},
  url     = {https://github.com/ajtgjmdjp/jfinqa},
  license = {Apache-2.0}
}