This amazing paper casts style transfer to a paraphrase generation problem and outperforms the state of the art by 2-3x on automatic evaluations and 4-5x on human evaluations. Their method is surprisingly simple:
- Train a paraphrase model on back translated text pairs to normalize text
- Use this paraphrase model to create pseudo parallel sentence pairs for each individual style data
- Train an inverse paraphrase model on this pseudo parallel data
- During decoding, paraphrase source text then feed it to reversed paraphrase model of the target style
The authors also have some interesting critique of existing style transfer evaluations. Their naive baseline that randomly copies the source or picks a target style sentence from the training set performs on par with some SOTA on poorly designed metrics:
We further show that only 3 out of 23 prior style transfer papers properly evaluate their models: in fact, a naive baseline that randomly chooses to either copy its input or retrieve a random sentence written in the target style outperforms prior work on poorly-designed metrics.
- 5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
- 4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
- 4: Strong: I learned a lot from it. I would like to see it accepted.
- 3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
- 3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
- 2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
- 2: Mediocre: I would rather not see it in the conference.
- 1.5: Weak: I am pretty confident that it should be rejected.
- 1: Poor: I would fight to have it rejected.