比如sequence labeling的场景就有这种需要,需对齐 ground truth labeling 和 hanlp tokenization。
文档结尾:
1 Like
您好,感谢回复。我想在multi task pipeline里面使用,好像不灵
我的code:
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MMINILMV2L6)
HanLP.config.output_spans = True
result = HanLP(['round trip fares from baltimore to philadelphia less than 1000 dollars round trip',
'give me all flights from new york city to las vegas that arrive on a sunday'])
print(result['tok'][0])
Output: (no offset)
['round', 'trip', 'fares', 'from', 'baltimore', 'to', 'philadelphia', 'less', 'than', '1000', 'dollars', 'round', 'trip']
你得看MTL的教程:
1 Like
感谢!附上正确code
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MMINILMV2L6)
HanLP['tok'].config.output_spans = True
然后直接使用HanLP
就可以看到token offset了
1 Like