jfinqa Leaderboard

Japanese Financial Numerical Reasoning QA Benchmark — 1,000 questions from 68 companies across J-GAAP, IFRS, and US-GAAP

1,000 questions 68 companies 3 subtasks v0.3.0

Leaderboard

Zero-shot evaluation, temperature=0, 1% numerical tolerance. Click column headers to sort.

# Model Overall Numerical Consistency Temporal Params

Subtask Breakdown

Submit Your Results

How to evaluate and submit

  1. Install jfinqa: pip install jfinqa
  2. Generate predictions as a JSON file mapping question IDs to answers:
    {
      "nr_001": "25.0%",
      "nr_002": "16.0%",
      "cc_001": "Yes",
      ...
    }
  3. Evaluate locally:
    jfinqa evaluate -p predictions.json -o results.json
  4. Open an issue on GitHub with:
    • Model name and provider
    • Parameter count (if known)
    • Your results.json file
    • Any additional details (prompt template, few-shot, etc.)

About

jfinqa evaluates LLMs on multi-step numerical reasoning over real Japanese corporate financial statements from EDINET. Questions require 2–6 step arithmetic including DuPont decomposition, margin analysis, and YoY growth calculations. The benchmark covers J-GAAP (58%), IFRS (38%), and US-GAAP (4%) accounting standards.

Citation

@dataset{ogawa2025jfinqa,
  title   = {jfinqa: Japanese Financial Numerical Reasoning QA Benchmark},
  author  = {Ogawa, Saichi},
  year    = {2025},
  url     = {https://github.com/ajtgjmdjp/jfinqa},
  license = {Apache-2.0}
}