Instruction data

  • OpenAI’s ChatML format (OpenAI, 2023a) to format the instruction data.
    • ChatML documents consist of a series of messages, starting with a special token <|im_start|>, followed by the role of messenger (i.e., the “user” or the “assistant”), a new line, and then the message itself. The message is then suffixed with a second special token: <|im_end|>.During training, we only compute the loss with respect to the response tokens (including <|im_start|> and <|im_end|>).
	- Example:  < |im_start| > system
You are a medical doctor answering real-world medical entrance exam questions. Based
on your understanding of basic and clinical science, medical knowledge, and mechanisms
underlying health, disease, patient care, and modes of therapy, answer the following multiple-
choice question. Select one correct answer from A to D. Base your answer on the current and
standard practices referenced in medical guidelines. < |im_end| >
< |im_start| > question
Question: Which of the following ultrasound findings has the highest association with
aneuploidy?
Options:
(A) Choroid plexus cyst
(B) Nuchal translucency
(C) Cystic hygroma
(D) Single umbilical artery < |im_end| >
< |im_start| > answer