版本号: v1.7.8 常规维护
触发代码:
Segment segment = HanLP.newSegment().enableCustomDictionaryForcing(true).enableOffset(true).enableNumberQuantifierRecognize(true);
segment.seg("活动门做550MM高!");
输出结果:
粗分词网:
0:[ ]
1:[活, 活动, 活动门]
2:[动]
3:[门]
4:[做]
5:[550]
6:[]
7:[]
8:[MM]
9:[]
10:[高]
11:[!]
12:[ ]
粗分结果[活动门/nz, 做/v, 550/m, MM/nx, 高/a, !/w]
人名角色观察:[ K 1 A 1 ][活动门 A 20833310 ][做 L 180 K 29 ][550 L 11 ][MM A 20833310 ][高 B 3092 C 340 D 135 L 41 K 10 E 9 ][! L 24 K 19 ][ K 1 A 1 ]
人名角色标注:[ /K ,活动门/A ,做/K ,550/L ,MM/A ,高/K ,!/L , /A]
如上输出中的 活动门三个词 ,我在自定义的csv格式的或txt格式的词典中设置为
活动门,glassDirection,2000
可是分词的词性标注结果令人诧异,活动门词性被标注为nz,这个结果是强制性的,无法通过自定义词典的方式改变.
想请问一下,大家有没有碰到这样的情况
{"sentence":"活动门做550MM高!","segment":[{"word":"活动门","nature":{"ordinal":48,"name":"nz"},"offset":0},{"word":"做","nature":{"ordinal":64,"name":"v"},"offset":3},{"word":"550","nature":{"ordinal":94,"name":"m"},"offset":4},{"word":"MM","nature":{"ordinal":3,"name":"nx"},"offset":7},{"word":"高","nature":{"ordinal":74,"name":"a"},"offset":9},{"word":"!","nature":{"ordinal":130,"name":"w"},"offset":10}],"successful":true}