Embarrassingly Simple Unsupervised Aspect Extraction

This lovely paper presents an embarrassingly simple yet surprisingly high-performancing method for unsupervised aspect extraction. Their method only needs word vectors and a single attention head, no transformers, to surpass recent complex neural models.

They simply adopt a POS tagger to extract aspect term candidates, denoted as A. The core of their model is called a Contrastive Attention, which uses an Radial Basis Function (RBF) kernel to turn an unbounded distance between word vectors to a bounded similarity. These similarities between word embeddings S and candidate aspect embeddings are used as attention scores to further select aspects.

\DeclareMathOperator{\rbf}{rbf} \DeclareMathOperator{\att}{att} \DeclareMathOperator{\softmax}{softmax} \DeclareMathOperator*{\argmax}{argmax} \begin{align*} \rbf(x, y, \gamma) &= \exp(-\gamma ||x - y||^{2}_{2}) \\ \att &= \frac{\sum_{a \in A} \rbf(w, a, \gamma)}{\sum_{w \in S} \sum_{a \in A} \rbf(w, a, \gamma)} \end{align*}

Note that A and S share the same lookup table, which are in-domain word2vec embeddings.

Finally, The most similar aspect label (C: food, staff, etc.) measured by cosine similarity is assigned to each aspect term. Again, as they share the same lookup table, the math works.

\hat y = \argmax_{c \in C}(\cos(d, \vec{c}))


  • Fantastic, this method needs no training at all except for the word vectors, which are easy to obtain.
  • Their result outperforms many complex models.
  • A little concern is that this method is only evaluated on one dataset, which doesn’t demonstrate its generality.
  • 5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
  • 4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
  • 4: Strong: I learned a lot from it. I would like to see it accepted.
  • 3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
  • 3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
  • 2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
  • 2: Mediocre: I would rather not see it in the conference.
  • 1.5: Weak: I am pretty confident that it should be rejected.
  • 1: Poor: I would fight to have it rejected.

0 投票者