About
The task of MMedBench is to answer medical muti-choice questions from 6 different languages. In addition, each question has a rationale for the choice. abstracts.
For more details about MMedBench, please refer to this paper:
Dataset
MMedBench is a medical muti-choice dataset of 6 different languages. It contains 45k samples for the trainset and 8,518 samples for the testset. Each question is company with a right answer and high quality rationale.
Please visit our GitHub repository to download the dataset:
Submission
To submit your model, please follow the instructions in the GitHub repository.
Citation
If you use MMedBench in your research, please cite our paper by:
@misc{qiu2024building,
title={Towards Building Multilingual Language Model for Medicine},
author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie},
year={2024},
eprint={2402.13963},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Leaderboard
| Model | Code | Size | Accuray (%) | Rationale(BLEU-1) | |
|---|---|---|---|---|---|
| 1 |
GPT-4 paperlink |
NA | 74.27 | NA | |
| 2 | MMed-Llama 3 paperlink |
|
8B | 67.75 | 47.21 |
| 3 | MMedLM 2 paperlink |
|
7B | 67.30 | 48.81 |
| 4 | Llama 3 paperlink |
|
8B | 62.79 | 46.76 |
| 5 | Mistral paperlink |
|
7B | 60.73 | 45.37 |
| 6 | InternLM 2 |
|
7B | 58.59 | 46.52 |
| 7 | BioMistral paperlink |
|
7B | 57.45 | 45.93 |
| 8 | Gemini-1.0 pro paperlink |
NA | 55.20 | 7.28 | |
| 9 | MMedLM paperlink |
|
7B | 55.01 | 45.05 |
| 10 | MEDITRON paperlink |
|
7B | 52.23 | 45.08 |
| 11 | GPT-3.5 paperlink |
NA | 51.82 | 26.01 | |
| 12 | InternLM paperlink |
|
7B | 45.67 | 42.12 |
| 13 | BLOOMZ paperlink |
|
7B | 45.10 | 43.22 |
| 14 | LLaMA 2 paperlink |
|
7B | 42.26 | 44.24 |
| 15 | Med-Alpaca paperlink |
|
7B | 41.11 | 43.49 |
| 16 | PMC-LLaMA paperlink |
|
7B | 40.04 | 43.16 |
| 17 | ChatDoctor paperlink |
|
7B | 39.53 | 42.21 |