ROUGE

  • Recall-Oriented Understudy for Gisting Evaluation
  • used to evaluate summarization / translation
  • ROUGE metrics range between 0 and 1, with higher scores indicating higher similarity between the automatically produced summary and the reference
    • "rouge1": unigram (1-gram) based scoring
  • "rouge2": bigram (2-gram) based scoring
  • "rougeL": Longest common subsequence based scoring.
  • "rougeLSum": splits text using "\n"