Quantity Affects Quality: Instruction Fine-Tuning on LLM’s Multiple-Choice Question Abilities

EasyChair Preprint 15242, version 2

Versions: 12→history

9 pages•Date: October 23, 2024

Hsuan-Lei Shao, Wei-Hsin Wang and Sieh-Chuen Huang

Abstract

This paper discovered the potential of instruction fine-tuning to significantly performance of large language models (LLMs) on legal multiple-choice questions (MCQs) abilities. By manipulating the volume of training data, we aim to demonstrate a strong correlation between the quantity of data used in fine-tuning can lift LLM’s quality, paving the way for LLMs in specific task (ex: legal knowledge). We compared Breeze-7B (based on Mistral-7B) and its fine-tuned version. Adding more MCQs data can enhance their abilities, there are two models: the first is adding 5,000 new samples(bz5k), and the second is 70,000(bz70k). We compare these with the general baseline model, GPT-3.5, GPT-4o, and one traditional Mandarin LLM(TAME). Then, the MCQs dataset of the MMLU, TMMLU, and the 2023 Taiwanese Bar Examination be evaluated. We find that fine-tuning LLMs might degrade its original capabilities little. However, surpassing a specific data volume can markedly enhances the model's effectiveness. This balance ensures that while the LLM's proficiency in specialized legal domains is enhanced. Practically speaking, we developed a legal MCQ-specific LLM that demonstrated the benefits of model customization. For specialized applications, smaller-scale, personalized LLMs can be developed with reduced training costs, making advanced legal tools more accessible and adaptable to specific knowledge areas or unique legal frameworks. This approach also addresses concerns about digital sovereignty by aligning the model's functionalities with jurisdiction-specific legal regulations.

Keyphrases: Instruction Fine-Tuning, Legal AI, Legal Multiple-Choice Questions, large language models, model customization

Links:

https://easychair.org/publications/preprint/R3dt

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15242,
  author    = {Hsuan-Lei Shao and Wei-Hsin Wang and Sieh-Chuen Huang},
  title     = {Quantity Affects Quality: Instruction Fine-Tuning on LLM’s Multiple-Choice Question Abilities},
  howpublished = {EasyChair Preprint 15242},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser