TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced...

reviewer5 · January 28, 2022, 10:43pm

This technique report introduces TexSmart, a NLU system that support NER/FGNER and clustering. TexSmart is built on top of clustering of tokens. They first mined many is-a relations from the web and cluster them into thousands of categories. These categories are manually given a hierarchical label. During testing, a mention and its context is taken as input to compute the similarity against each cluster to predict its fine grained label.

Comments

The authors are wrapping vanilla clustering technique with lots of fancy terms like semantic expansion. But once you read that section, it’s nothing else but clustering.
Their Fine-Grained NER module is interesting to me. However, the most interesting part is not the technique but the hierarchical ontology. I understand their reason to manually label clusters instead of re-using some WordNet ontologies but I wonder if their clusters are as intuitive as WordNet.
They should not sell their clustering as “knowledge base” because their clusters provide only “is-a” relation but a KB usually offers lots more!
The rest of their modules are not interesting to me as most of them are outdated or underperforming the transformer models.

Rating

5: Transformative: This paper is likely to change our field. It should be considered for a best paper award.
4.5: Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
4: Strong: I learned a lot from it. I would like to see it accepted.
3.5: Leaning positive: It can be accepted more or less in its current form. However, the work it describes is not particularly exciting and/or inspiring, so it will not be a big loss if people don’t see it in this conference.
3: Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., I didn’t learn much from it, evaluation is not convincing, it describes incremental work). I believe it can significantly benefit from another round of revision, but I won’t object to accepting it if my co-reviewers are willing to champion it.
2.5: Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
2: Mediocre: I would rather not see it in the conference.
1.5: Weak: I am pretty confident that it should be rejected.
1: Poor: I would fight to have it rejected.

0 投票人