What Is MedS-Ins
MedS-Bench is a comprehensive medical evaluation benchmark designed to assess the capabilities of various Large Language Models in the medical field beyond just multiple-choice questions. It encompasses 11 task categories and integrates 39 existing datasets. For each dataset, MedS-Bench transforms the data into formats suitable for LLMs. This process involves manually writing clear task definitions, known as Instructions, to guide the models in understanding and responding to the medical data effectively. Our leaderboard provide a comparative analysis for evaluating the versatility and depth of LLMs in handling complex, domain-specific challenges in medicine.
For more details about MedS-Bench, please refer to this paper: