NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Taggers are expected to perform reliably not only under clean text but also real-world noisy text. This paper proposes two training strategies to improve the robustness of popular sequence labeling models while preserving accuracy on the original input.

Data Augmentation Method

The first method introduces noisy data as one kind of data augmentation and train it with the clean text.

\begin{align*} %\label{eqn:augmentation} \begin{split} \mathcal{L}_{augm}(x,\tilde{x},y;\theta) &= \mathcal{L}_0(x,y;\theta) + \alpha\mathcal{L}_0(\tilde{x},y;\theta), \end{split} \end{align*}

where \tilde{x} is the perturbed sentence, and \alpha is a weight of the noisy loss component.

Stability Training Method

\begin{align*} %\label{eqn:stability} \begin{split} \mathcal{L}_{stabil}(x,\tilde{x},y;\theta) &= \mathcal{L}_0(x,y;\theta) + \alpha\mathcal{L}_{sim}(x,\tilde{x};\theta), \\ \mathcal{L}_{sim}(x,\tilde{x};\theta) &= \mathcal{D}\big(y(x), y(\tilde{x})\big), \end{split} \end{align*}

where \mathcal{L}_{sim} encourages the similarity of the model outputs for both x and \tilde{x}, \mathcal{D} is a task-specific feature distance measure (usually \mathcal{D}_{KL}), and \alpha balances the strength of the similarity objective.


Their approaches especially the stability one achieved significant error reduction across all perturbation levels and all entity types.


  • The motivation is very clear and practical.
  • Idea is simple but it works surprisingly well.
  • 5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
  • 4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
  • 4: Strong: I learned a lot from it. I would like to see it accepted.
  • 3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
  • 3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
  • 2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
  • 2: Mediocre: I would rather not see it in the conference.
  • 1.5: Weak: I am pretty confident that it should be rejected.
  • 1: Poor: I would fight to have it rejected.

0 投票者