MMedBench

A Medical Benchmark for Multilingual Comprehension

About

The task of MMedBench is to answer medical muti-choice questions from 6 different languages. In addition, each question has a rationale for the choice. abstracts.

For more details about MMedBench, please refer to this paper:

Dataset

MMedBench is a medical muti-choice dataset of 6 different languages. It contains 45k samples for the trainset and 8,518 samples for the testset. Each question is company with a right answer and high quality rationale.

Please visit our GitHub repository to download the dataset:

Submission

To submit your model, please follow the instructions in the GitHub repository.

Citation

If you use MMedBench in your research, please cite our paper by:

@misc{qiu2024building,
  title={Towards Building Multilingual Language Model for Medicine}, 
  author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie},
  year={2024},
  eprint={2402.13963},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
            
Leaderboard
Model Code Size Accuray (%) Rationale(BLEU-1)
1
GPT-4
paperlink
NA 74.27 NA
2 MMed-Llama 3
paperlink
GitHub 8B 67.75 47.21
3 MMedLM 2
paperlink
GitHub 7B 67.30 48.81
4 Llama 3
paperlink
GitHub 8B 62.79 46.76
5 Mistral
paperlink
GitHub 7B 60.73 45.37
6 InternLM 2
GitHub 7B 58.59 46.52
7 BioMistral
paperlink
GitHub 7B 57.45 45.93
8 Gemini-1.0 pro
paperlink
NA 55.20 7.28
9 MMedLM
paperlink
GitHub 7B 55.01 45.05
10 MEDITRON
paperlink
GitHub 7B 52.23 45.08
11 GPT-3.5
paperlink
NA 51.82 26.01
12 InternLM
paperlink
GitHub 7B 45.67 42.12
13 BLOOMZ
paperlink
GitHub 7B 45.10 43.22
14 LLaMA 2
paperlink
GitHub 7B 42.26 44.24
15 Med-Alpaca
paperlink
GitHub 7B 41.11 43.49
16 PMC-LLaMA
paperlink
GitHub 7B 40.04 43.16
17 ChatDoctor
paperlink
GitHub 7B 39.53 42.21