About
The task of MMedBench is to answer medical muti-choice questions from 6 different languages. In addition, each question has a rationale for the choice. abstracts.
For more details about MMedBench, please refer to this paper:
Dataset
MMedBench is a medical muti-choice dataset of 6 different languages. It contains 45k samples for the trainset and 8,518 samples for the testset. Each question is company with a right answer and high quality rationale.
Please visit our GitHub repository to download the dataset:
Submission
To submit your model, please follow the instructions in the GitHub repository.
Citation
If you use MMedBench in your research, please cite our paper by:
@misc{qiu2024building, title={Towards Building Multilingual Language Model for Medicine}, author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie}, year={2024}, eprint={2402.13963}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Leaderboard
Model | Code | Size | Accuray (%) | Rationale(BLEU-1) | |
---|---|---|---|---|---|
1 |
GPT-4 paperlink |
NA | 74.27 | NA | |
2 | MMed-Llama 3 paperlink |
8B | 67.75 | 47.21 | |
3 | MMedLM 2 paperlink |
7B | 67.30 | 48.81 | |
4 | Llama 3 paperlink |
8B | 62.79 | 46.76 | |
5 | Mistral paperlink |
7B | 60.73 | 45.37 | |
6 | InternLM 2 |
7B | 58.59 | 46.52 | |
7 | BioMistral paperlink |
7B | 57.45 | 45.93 | |
8 | Gemini-1.0 pro paperlink |
NA | 55.20 | 7.28 | |
9 | MMedLM paperlink |
7B | 55.01 | 45.05 | |
10 | MEDITRON paperlink |
7B | 52.23 | 45.08 | |
11 | GPT-3.5 paperlink |
NA | 51.82 | 26.01 | |
12 | InternLM paperlink |
7B | 45.67 | 42.12 | |
13 | BLOOMZ paperlink |
7B | 45.10 | 43.22 | |
14 | LLaMA 2 paperlink |
7B | 42.26 | 44.24 | |
15 | Med-Alpaca paperlink |
7B | 41.11 | 43.49 | |
16 | PMC-LLaMA paperlink |
7B | 40.04 | 43.16 | |
17 | ChatDoctor paperlink |
7B | 39.53 | 42.21 |