Learning to Contextually Aggregate Multi-Source Supervision for Sequence...

Learning from multi-domain data is attractive yet non-trivial. This paper augments BiLSTM-CRF with a very simple linear transform followed by domain attention for multi-source learning.

Approach

The structure prediction score is defined as:

\begin{equation} s({x},{y}) = \sum_{t=1}^T (U_{t, {y}_{t}} + M_{{y}_{t-1},{y}_t}), \end{equation}

Their simple method is to transform the emission matrix U and transition matrix M source-wise:

\begin{equation} s^{(k)}({x},{y}) = \sum_{t=1}^T \left((U A^{(k)})_{t, {y}_t} + (M A^{(k)})_{{y}_{t-1},{y}_t}\right). \end{equation}

These linear transforms are trained jointly in a trivial joint learning fashion.

To produce the final prediction, the authors propose to vote based on an attention of these sources:

\begin{align} \mathbf{A}_i^* = \sum_{k=1}^K {q}_{i,k} A^{(k)}. \end{align}

where {q}_{i,k} is an attention score produced by softmax \mathbf{q}_i= \text{softmax}(\mathbf{Q} \mathbf{h}^{(i)}),\text{where}\;\mathbf{Q}\in\mathbb{R}^{K \times 2d}.

Comments

Despite its crudity, the authors extend it to a 9-page ACL paper. Their title and abstract are very attractive. However, the first 3 pages made me drowsy while the rest of them fell below my expections. It’s completely fine to propose a simple method, but it’s an affectation to write it in the way they did.

  • 5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
  • 4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
  • 4: Strong: I learned a lot from it. I would like to see it accepted.
  • 3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
  • 3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
  • 2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
  • 2: Mediocre: I would rather not see it in the conference.
  • 1.5: Weak: I am pretty confident that it should be rejected.
  • 1: Poor: I would fight to have it rejected.