Self-Attention Guided Copy Mechanism for Abstractive Summarization

1114 · June 16, 2021, 9:21pm

This paper incorporates TextRank into the attention map to amplify and polarize the attention scores for a better copy distribution. Given an attention map, a directed graph is built by treating the attention map as a soft adjacent matrix. By normalizing the adjacent matrix, a transition probability matrix is derived and iteratively refined by multiplying it multiple times. The final score matrix is added to the key states and normalized using a vanilla dot-product attention head. The generated copy distribution is matched with the original attention matrix to encourage the consistency with the attention mechanism by KL divergence.

Comments

TextRank is indeed an effective method in keyword extraction. It’s good to know it’s still working in neural models.
Idea is neat and it seems to have improved the ROUGE F1.
The KL loss is somewhat surprising, I can’t imagine that teaching an attention head to learn its amplified attention map actually worked. Is the KL loss necessary?

Rating

5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
4: Strong: I learned a lot from it. I would like to see it accepted.
3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
2: Mediocre: I would rather not see it in the conference.
1.5: Weak: I am pretty confident that it should be rejected.
1: Poor: I would fight to have it rejected.

0 投票者