MMedBench

A Medical Benchmark for Multilingual Comprehension

About

The task of MMedBench is to answer medical muti-choice questions from 6 different languages. In addition, each question has a rationale for the choice. abstracts.

For more details about MMedBench, please refer to this paper:

Dataset

MMedBench is a medical muti-choice dataset of 6 different languages. It contains 45k samples for the trainset and 8,518 samples for the testset. Each question is company with a right answer and high quality rationale.

Please visit our GitHub repository to download the dataset:

Submission

To submit your model, please follow the instructions in the GitHub repository.

Citation

If you use MMedBench in your research, please cite our paper by:

@misc{qiu2024building,
  title={Towards Building Multilingual Language Model for Medicine}, 
  author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie},
  year={2024},
  eprint={2402.13963},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
            
Leaderboard
Model Code Size Accuray (%) Rationale(BLEU-1)
1
GPT-4
paperlink
NA 74.27 NA
2 MMedLM 2
paperlink
GitHub 7B 67.30 48.81
3 Mistral
paperlink
GitHub 7B 60.73 45.37
4 InternLM 2
GitHub 7B 58.59 46.52
5 Gemini-1.0 pro
paperlink
NA 55.20 7.28
6 MMedLM
paperlink
GitHub 7B 55.01 45.05
7 GPT-3.5
paperlink
NA 51.82 26.01
8 InternLM
paperlink
GitHub 7B 45.67 42.12
9 BLOOMZ
paperlink
GitHub 7B 45.10 43.22
10 LLaMA 2
paperlink
GitHub 7B 42.26 44.24
11 Med-Alpaca
paperlink
GitHub 7B 41.11 43.49
12 PMC-LLaMA
paperlink
GitHub 7B 40.04 43.16
13 ChatDoctor
paperlink
GitHub 7B 39.53 42.21