使用免费接口进行词性标注发现有陌生CTB词性 IC,想知道其具体含义

访问代码:

from hanlp_restful import HanLPClient
HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='zh')
temp_res = HanLP.parse("继续搅拌一段时间T2至有机溶剂完全挥发;步骤5.经离心、洗涤、收集、真空干燥,制得头孢噻呋微球。", tasks='pos/ctb')
print(temp_res)

结果为

{
  "tok/fine": [
    ["继续", "搅拌", "一", "段", "时间", "T2", "至", "有机", "溶剂", "完全", "挥发", ";", "步骤", "5", ".", "经", "离心", "、", "洗涤", "、", "收集", "、", "真空", "干燥", ",", "制", "得", "头孢噻呋", "微", "球", "。"]
  ],
  "pos/ctb": [
    ["VV", "VV", "CD", "M", "NN", "IC", "P", "JJ", "NN", "AD", "VV", "PU", "NN", "CD", "PU", "P", "VV", "PU", "NN", "PU", "NN", "PU", "AD", "VA", "PU", "VV", "VV", "NN", "JJ", "NN", "PU"]
  ]
}

从上面词性标注结果可以看到 “T2”, 被分为 IC。但官网 ctb — HanLP Documentation (hankcs.com) 展示的词性不包含 【IC】。请问这IC 是什么词性?

1 Like

很遗憾,除了CTB的作者们可能没人知道。我查询了 The Bracketing Guidelines for the Penn Chinese Treebank (3.0)Extending and Scaling up the Chinese Treebank Annotation,都没有提及这个标签。IC在语言学中有多个解释:

  1. independent clause. A IC is used to mark the boundary of embedded clause in the sentence. E.g. [如何/r 多/a 方面/n 开辟/v 就业/vn 渠道/n]IC 是/v (how to provide more job opportunities is)
  2. immediate constituent. An immediate constituent is any one of the largest grammatical units that constitute a construction. Immediate constituents are often further reducible.
  3. incomplete component. 通过对CTB9的观察,我发现绝大部分IC出现在broadcast conversation programs里面,比如就是/AD 好/VV xin-/IC。所以我猜测IC应该指的是不完整成分,尤指ASR导致的错误。

好的,非常感谢。