hanlp是否可以先下载预训练模型在本地加载

eshaoliu · December 23, 2020, 8:20am

微软的ner预训练模型 https://file.hankcs.com/hanlp/ner/ner_bert_base_msra_20200104_185735.zip。
issue中已经有人问了，https://github.com/hankcs/HanLP/issues/1401但没有得到解决方案。现在我的环境比较特殊，连不了外网，想下载到本地再加载，但是一直失败
The identifier /media/cfs/liuhongru3/ccks2020-baseline-master/ner_bert_base_msra_20200104_185735.zip resolves to a non-exist meta file /media/cfs/liuhongru3/ccks2020-baseline-master/ner_bert_base_msra_20200104_185735.zip/meta.json.
不知道哪位大佬有成功的从本地加载的例子

hankcs · December 25, 2020, 5:02pm

185735.zip/meta.json多了一个zip。不要手动解压，zip放到提示位置自动解压。或者本地先调试好，然后把~/.hanlp上传到服务器的相应位置。

eshaoliu · December 27, 2020, 8:39am

hanlp.load(里面加不带zip的路径的话)，会提示下载zip包，因为我这个环境特殊，连不了外网，需要本地加载。

hankcs · February 4, 2021, 3:23am

3 个帖子已被合并到了现有主题：关于hanlp2.1 的模型，如果只是分词和词性标注，是否有更小的模型

AliBug · February 4, 2021, 3:06am

原来何博正好在线，多提点问题

我在测试的时候用的docker，挂载了 /root/.hanlp 路径，测试开始以后还会下载两个文件，但是下载的文件似乎并没有存到 /root/.hanlp 里面，因为再次通过容器启动的时候仍然会再次去下载那两个文件。
请问这几个文件保存路径是哪啊？

hankcs · February 4, 2021, 3:12am

这种进度条形式的是huggingface transformers下载的文件，你可以参考Server without Internet：

AliBug · February 4, 2021, 3:17am

明白了，看来还得仔细读读文档

yaven001 · March 12, 2021, 10:29am

你好，想咨询一下，native api是否可以不访问 hugging face获取transformer模型？

已经把hankcs的close_tok_pos_ner_srl_dep_sdp_con_electra_small_20210111_124159下载到了本地。

感谢！

Gary · April 9, 2021, 6:48am

hanlp2.1 在断网的时候，加载模型分词时，回去请求一个cached_path，不知道这样的作用是什么，想问一下，应该怎麽样才可以离线运行。

Failed to load https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_mt5_small_20210228_123458.zip. See traceback below:
================================ERROR LOG BEGINS================================
Traceback (most recent call last):
  File "D:\python\env\venv\lib\site-packages\hanlp\utils\component_util.py", line 81, in load_from_meta_file
    obj.load(save_dir, verbose=verbose, **kwargs)
  File "D:\python\env\venv\lib\site-packages\hanlp\common\torch_component.py", line 173, in load
    self.load_config(save_dir, **kwargs)
  File "D:\python\env\venv\lib\site-packages\hanlp\common\torch_component.py", line 125, in load_config
    self.config[k] = Configurable.from_config(v)
  File "D:\python\env\venv\lib\site-packages\hanlp_common\configurable.py", line 30, in from_config
    return cls(**deserialized_config)
  File "D:\python\env\venv\lib\site-packages\hanlp\layers\embeddings\contextual_word_embedding.py", line 143, in __init__
    do_basic_tokenize=do_basic_tokenize)
  File "D:\python\env\venv\lib\site-packages\hanlp\layers\transformers\encoder.py", line 124, in build_transformer_tokenizer
    return AutoTokenizer.from_pretrained(transformer, use_fast=use_fast, do_basic_tokenize=do_basic_tokenize)
  File "D:\python\env\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 379, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "D:\python\env\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1750, in from_pretrained
    use_auth_token=use_auth_token,
  File "D:\python\env\venv\lib\site-packages\transformers\file_utils.py", line 1086, in cached_path
    local_files_only=local_files_only,
  File "D:\python\env\venv\lib\site-packages\transformers\file_utils.py", line 1265, in get_from_cache
    "Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
=================================ERROR LOG ENDS=================================

YoungXu06 · April 13, 2021, 10:25am

请教大佬，在一个联网机器上下载了模型，然后拷贝到不联网的机器上，加载时指定save_dir，好像还是加载不了，这是为什么？
貌似即便下载了hanlp的模型，在加载时还会依赖hugging face的一些文件，请问有什么办法能解决这个问题？因为内网正式环境的机器一般都是离线的。

YoungXu06 · April 14, 2021, 2:13am

已解决！根据：https://hanlp.hankcs.com/docs/install.html#server-without-internet

CaesarL · May 5, 2022, 3:27pm

请问transformers文件要放到哪里？

JeremyRenner · September 27, 2023, 7:42am

hanlp.load(“tok/sighan2005_pku_bert_base_zh_20201231_141130”）
模型下载后放在自定义的路径，我在本地运行能够load成功，但是在不能联网的服务器上就load失败，模型的路径能够读到，但是就是加载失败，不知道这是什么原因呢。
Failed to load https://od.hankcs.com/hanlp/data/tok/sighan2005_pku_bert_base_zh_20201231_141130.zip