BLEU has 2 limitations that it doesn’t assign partial credit and it can penalize semantically correct hypothesis, limiting its application in the training and evaluation of MT systems. This paper proposes the new metric SIMLE which improves both the discriminative training and the alignment with human evaluation.
SIMLE is a joint metric of SIM and LE, which promotes semantic textual similarity and penalizes length discrepancy. Specifically, SIM is a STS model trained on back translated pseudo paraphrase data which demonstrates very strong performance on out-of-domain data.
- SIM model emerges before the BERT, is there any evidence that BERT can also benefit from SIM’s margin-based loss and pseudo data?
- It’s good to know that minimum risk training is a good alternative of reinforcement learning.
- 5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
- 4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
- 4: Strong: I learned a lot from it. I would like to see it accepted.
- 3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
- 3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
- 2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
- 2: Mediocre: I would rather not see it in the conference.
- 1.5: Weak: I am pretty confident that it should be rejected.
- 1: Poor: I would fight to have it rejected.