Hanlp model licencing question

Hi,

Hello,

My apologies, I don’t speak Chinese, so hopefully someone will be able to answer my question. I understand that Hanlp operates under the Apace licence. My question is: Does the licence over only use of the Python package or all of the models (tokeniser and tagger) that I can download?

For example:
tokenizer = hanlp.load(‘PKU_NAME_MERGED_SIX_MONTHS_CONVSEG’)
tagger = hanlp.load(hanlp.pretrained.pos.CTB5_POS_RNN_FASTTEXT_ZH)

Thanks,
Roger

Hi Roger,

Thank you for asking. It’s an open question, I’m not sure whether the model trained on some corpus must inherit the licence of the corpus or not. Stanford University is in exactly the same situation as us. They said that

The copyright and licensing status of machine learning models is not very clear (to us). We list in the table below the Treebank License of the underlying data from which each language pack (set of machine learning models for a treebank) was trained. To the extent that The Trustees of Leland Stanford Junior University have ownership and rights over these language packs, all these Stanza language packs are made available under the Open Data Commons Attribution License v1.0.

We have the research licence for the corpora but the licence doesn’t permit commercial use. It’s better to assume the Apache Licence doesn’t apply to the models.

1 Like

thanks for the detailed response. Much appreciated.

我想问下目前的fine_electra_small_20220615_231803是什么License,基于什么语料库训练得到的?谢谢!@ hankcs

tok — HanLP Documentation 中有提到fine_electra_small_20220615_231803是用 fine-grained CWS corpora 训练的, 但我没能搜到CWS corpora的信息 @hankcs (上面估计没@到 :joy:

由于我们是上述语料库的作者,我们现在将该模型的license定为 Apache License 2.0,不存在版权争议。

1 Like