何博士,我这里有几个基于HanLP 1.7.6 的问题,还请帮忙解答一下
1、如果想调整感知器特征向量生成的方式是不是要继承 com.hankcs.hanlp.model.perceptron.instance.NERInstance 针对要识别的命名实体不同调整生成不同的特征?
2、想要使用自己定制的NERInstance,是不是就要继承com.hankcs.hanlp.model.perceptron.PerceptronNERecognizer,实现其中的三个方法,因为我发现只有这三个方法会用到 Instance
public class CustomsPerceptronNERecognizer extends PerceptronNERecognizer {
private String nerLabel;
public CustomsPerceptronNERecognizer(String nerLabel, LinearModel nerModel) {
super(nerModel);
this.nerLabel = nerLabel;
}
public CustomsPerceptronNERecognizer(String nerLabel, String nerModelPath) throws IOException {
super(nerModelPath);
this.nerLabel = nerLabel;
}
@Override
public boolean learn(String segmentedTaggedNERSentence) {
// 调用定制的Instance
return super.learn(createInstance(Sentence.create(segmentedTaggedNERSentence), model.featureMap));
}
@Override
protected Instance createInstance(Sentence sentence, FeatureMap featureMap) {
NERTagSet tagSet = (NERTagSet) featureMap.tagSet;
List<String[]> collector = Utility.convertSentenceToNER(sentence, tagSet);
String[] wordArray = new String[collector.size()];
String[] posArray = new String[collector.size()];
String[] nerArray = new String[collector.size()];
Utility.reshapeNER(collector, wordArray, posArray, nerArray);
// 调用定制的Instance
return CustomsNERInstanceFactory.getNERInstance(nerLabel, wordArray, posArray, nerArray, tagSet, featureMap);
}
@Override
public String[] recognize(String[] wordArray, String[] posArray) {
// 调用定制的Instance
return super.recognize(CustomsNERInstanceFactory.getNERInstance(nerLabel, wordArray, posArray, model.featureMap));
}
}
3、自定义的语料库,比如说 有一个需要识别的 命名实体代号为 GeographicalLocation,但是语料库可能只有几千条,是否是将这些语料库跟PKU语料库合并起来一起作为 命名实体识别的语料库会 比 只有这几千条语料 效果好?