用于自定义语料在hanlp2.0如何训练?

何博士您好,我使用默认语料库进行某一领域的实体识别结果并不是很好,加上自定义词典效果也还是一般,想问一下hanlp2.0是否支持训练自定义语料呢?我在论坛里面查找发现所问的问题都是关于语料标注的,但是自定义语料要如何进行训练呢?是否有类似的接口直接加载呢?在python中的实现方式是什么样子的?

2 Likes

当然有,整个框架都是模块化的。例子参考:

语料格式参考:https://hanlp.hankcs.com/docs/api/hanlp/datasets/index.html

1 Like

我也有同样的场景。在这个 demo 中只留下 NER 任务可以正常运行,但是在 NER 任务中加上参数 crf=TRUE,就会抛出一个异常:

Traceback (most recent call last):
File “/Users/onion/Library/Application Support/IntelliJIdea2019.3/python/helpers/pydev/pydevd.py”, line 1434, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/onion/Library/Application Support/IntelliJIdea2019.3/python/helpers/pydev/pydev_imps/pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, ‘exec’), glob, loc)
File “/Users/onion/Work/NLP/HanlpV2/HanLP/plugins/hanlp_demo/hanlp_demo/zh/train/ner_demov4.py”, line 33, in
mtl.fit(
File “/Users/onion/Work/NLP/HanlpV2/HanLP/hanlp/components/mtl/multi_task_learning.py”, line 641, in fit
return super().fit(**merge_locals_kwargs(locals(), kwargs, excludes=(‘self’, ‘kwargs’, ‘class’, ‘tasks’)),
File “/Users/onion/Work/NLP/HanlpV2/HanLP/hanlp/common/torch_component.py”, line 276, in fit
criterion = self.build_criterion(**merge_dict(config, trn=trn))
File “/Users/onion/Work/NLP/HanlpV2/HanLP/hanlp/components/mtl/multi_task_learning.py”, line 265, in build_criterion
return dict((k, v.build_criterion(decoder=self.model
.decoders[k], **kwargs)) for k, v in self.tasks.items())
File “/Users/onion/Work/NLP/HanlpV2/HanLP/hanlp/components/mtl/multi_task_learning.py”, line 265, in
return dict((k, v.build_criterion(decoder=self.model
.decoders[k], **kwargs)) for k, v in self.tasks.items())
File “/Users/onion/Work/NLP/HanlpV2/HanLP/hanlp/components/taggers/tagger.py”, line 33, in build_criterion
model = self.model
AttributeError: ‘TaggingNamedEntityRecognition’ object has no attribute ‘model’

看起来在 TaggingNamedEntityRecognition 中 model 还未生成的时候,用到了 model 参数?

感谢反馈,已经修复:

:+1:t2: 我还正在看 decode 那里抛出来的异常呢,你这儿已经修复了,很及时