Failed to load https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210602_211620.zip

啊,被hanlp整烦了。
我从github上把hanlp克隆下来,然后运行两行代码,就报错了。以下是代码:

import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE)

看错误提示是会下载一个包,可是不管是我手动下载,还是程序自动下载,都会报错。
后来load函数的参数我换了别的,还是这样。

HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH)

后来我去查issue,看到2月份有个哥们碰到类似的。https://github.com/hankcs/HanLP/issues/1618
issue里作者说要用最新版本的master。我试过了依然没用。后来看到下面作者提到的 171bd2d 版本,我切换过去了,能跑CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH,但是UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE由于url过期了,还是跑不了。

我到底怎么办啊?都磨了一天一夜了。

Failed to load https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210602_211620.zip. See traceback below:
================================ERROR LOG BEGINS================================
Traceback (most recent call last):
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\utils\component_util.py”, line 81, in load_from_meta_file
obj.load(save_dir, verbose=verbose, **kwargs)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\common\torch_component.py”, line 173, in load
self.load_config(save_dir, **kwargs)
m_config
return cls(**deserialized_config)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\layers\embeddings\contextual_word_embedding.py”, line 141, in init
self.transformer_tokenizer = AutoTokenizer.from_pretrained(self.transformer,
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\layers\transformers\pt_imports.py”, line 65, in from_pretrained
tokenizer = cls.from_pretrained(get_mirror(transformer), use_fast=use_fast, do_basic_tokenize=do_basic_tokenize,
File “D:\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 523, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File “D:\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 416, in get_tokenizer_config
resolved_config_file = cached_path(
File “D:\Python\Python39\lib\site-packages\transformers\file_utils.py”, line 1347, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse C:\Users\dashe\AppData\Roaming\hanlp\hanlp\transformers\xlm-roberta-base_20210526_112208\tokenizer_config.json as a URL or as a local path
=================================ERROR LOG ENDS=================================
If the problem still persists, please submit an issue to https://github.com/hankcs/HanLP/issues
When reporting an issue, make sure to paste the FULL ERROR LOG above and the system info below.
OS: Windows-10-10.0.19041-SP0
Python: 3.9.2
PyTorch: 1.9.0+cpu
HanLP: 2.1.0-alpha.51
PS E:\project\sicp\py>
PS E:\project\sicp\py>
PS E:\project\sicp\py>
PS E:\project\sicp\py> python server.py
Decompressing C:\Users\dashe\AppData\Roaming\hanlp\hanlp\mtl\ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210602_211620.zip to C:\Users\dashe\AppData\Roaming\hanlp\hanlp\mtl
Failed to load https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210602_211620.zip. See traceback below:
================================ERROR LOG BEGINS================================
Traceback (most recent call last):
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\utils\component_util.py”, line 81, in load_from_meta_file
obj.load(save_dir, verbose=verbose, **kwargs)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\common\torch_component.py”, line 173, in load
self.load_config(save_dir, **kwargs)
d_config
self.config[k] = Configurable.from_config(v)
File “D:\Python\Python39\lib\site-packages\hanlp_common-0.0.9-py3.9.egg\hanlp_common\configurable.py”, line 30, in from_config
return cls(**deserialized_config)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\layers\embeddings\contextual_word_embedding.py”, line 141, in init
self.transformer_tokenizer = AutoTokenizer.from_pretrained(self.transformer,
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\layers\transformers\pt_imports.py”, line 65, in from_pretrained
tokenizer = cls.from_pretrained(get_mirror(transformer), use_fast=use_fast, do_basic_tokenize=do_basic_tokenize,
File “D:\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 523, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File “D:\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 416, in get_tokenizer_config
resolved_config_file = cached_path(
File “D:\Python\Python39\lib\site-packages\transformers\file_utils.py”, line 1347, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse C:\Users\dashe\AppData\Roaming\hanlp\hanlp\transformers\xlm-roberta-base_20210526_112208\tokenizer_config.json as a URL or as a local path
=================================ERROR LOG ENDS=================================
If the problem still persists, please submit an issue to https://github.com/hankcs/HanLP/issues
When reporting an issue, make sure to paste the FULL ERROR LOG above and the system info below.
OS: Windows-10-10.0.19041-SP0
Python: 3.9.2
PyTorch: 1.9.0+cpu
HanLP: 2.1.0-alpha.51
PS E:\project\sicp\py> python server.py
Failed to load https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210602_211620.zip. See traceback below:
================================ERROR LOG BEGINS================================
Traceback (most recent call last):
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\utils\component_util.py”, line 81, in load_from_meta_file
obj.load(save_dir, verbose=verbose, **kwargs)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\common\torch_component.py”, line 173, in load
self.load_config(save_dir, **kwargs)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\common\torch_component.py”, line 125, in load_config
self.config[k] = Configurable.from_config(v)
File “D:\Python\Python39\lib\site-packages\hanlp_common-0.0.9-py3.9.egg\hanlp_common\configurable.py”, line 30, in from_config
return cls(**deserialized_config)
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\layers\embeddings\contextual_word_embedding.py”, line 141, in init
self.transformer_tokenizer = AutoTokenizer.from_pretrained(self.transformer,
File “D:\Python\Python39\lib\site-packages\hanlp-2.1.0a51-py3.9.egg\hanlp\layers\transformers\pt_imports.py”, line 65, in from_pretrained
tokenizer = cls.from_pretrained(get_mirror(transformer), use_fast=use_fast, do_basic_tokenize=do_basic_tokenize,
File “D:\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 523, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File “D:\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 416, in get_tokenizer_config
resolved_config_file = cached_path(
File “D:\Python\Python39\lib\site-packages\transformers\file_utils.py”, line 1347, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse C:\Users\dashe\AppData\Roaming\hanlp\hanlp\transformers\xlm-roberta-base_20210526_112208\tokenizer_config.json as a URL or as a local path
=================================ERROR LOG ENDS=================================
If the problem still persists, please submit an issue to https://github.com/hankcs/HanLP/issues
When reporting an issue, make sure to paste the FULL ERROR LOG above and the system info below.
OS: Windows-10-10.0.19041-SP0
Python: 3.9.2
PyTorch: 1.9.0+cpu
HanLP: 2.1.0-alpha.51

我使用 171bd2d 版本,然后把最新的包手动改名成旧包名,能跑起来了。但是最新master是根本跑不动,难道这不算是Bug吗?

你没有执行安装

https://hanlp.hankcs.com/docs/contributing.html#development

我按照文档执行安装成功后还是提示相同错误。

(pure_hanlp) C:\Users\chentao\clone_dir\HanLP>python -m unittest discover ./tests
Failed to load https://file.hankcs.com/hanlp/mtl/close_tok_pos_ner_srl_dep_sdp_con_electra_small_20210111_124159.zip. See traceback below:
================================ERROR LOG BEGINS================================
Traceback (most recent call last):
File “C:\Users\chentao\clone_dir\HanLP\hanlp\utils\component_util.py”, line 81, in load_from_meta_file
obj.load(save_dir, verbose=verbose, **kwargs)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\common\torch_component.py”, line 173, in load
self.load_config(save_dir, **kwargs)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\common\torch_component.py”, line 125, in load_config
self.config[k] = Configurable.from_config(v)
File “c:\users\chentao\clone_dir\hanlp\plugins\hanlp_common\hanlp_common\configurable.py”, line 30, in from_config
return cls(**deserialized_config)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\layers\embeddings\contextual_word_embedding.py”, line 141, in init
self.transformer_tokenizer = AutoTokenizer.from_pretrained(self.transformer,
File “C:\Users\chentao\clone_dir\HanLP\hanlp\layers\transformers\pt_imports.py”, line 65, in from_pretrained
tokenizer = cls.from_pretrained(get_mirror(transformer), use_fast=use_fast, do_basic_tokenize=do_basic_tokenize,
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 523, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 416, in get_tokenizer_config
resolved_config_file = cached_path(
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\site-packages\transformers\file_utils.py”, line 1347, in cached_path raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse C:\Users\chentao\AppData\Roaming\hanlp\hanlp\transformers\electra_zh_small_20210520_124451\tokenizer_config.json as a URL or as a local path
=================================ERROR LOG ENDS=================================
If the problem still persists, please submit an issue to https://github.com/hankcs/HanLP/issues
When reporting an issue, make sure to paste the FULL ERROR LOG above and the system info below.
OS: Windows-10-10.0.19042-SP0
Python: 3.8.10
PyTorch: 1.9.0+cpu
HanLP: 2.1.0-alpha.52
.E

ERROR: test_mtl (unittest.loader._FailedTest)

ImportError: Failed to import test module: test_mtl
Traceback (most recent call last):
File “C:\Users\chentao\clone_dir\HanLP\hanlp\utils\component_util.py”, line 81, in load_from_meta_file
obj.load(save_dir, verbose=verbose, **kwargs)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\common\torch_component.py”, line 173, in load
self.load_config(save_dir, **kwargs)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\common\torch_component.py”, line 125, in load_config
self.config[k] = Configurable.from_config(v)
File “c:\users\chentao\clone_dir\hanlp\plugins\hanlp_common\hanlp_common\configurable.py”, line 30, in from_config
return cls(**deserialized_config)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\layers\embeddings\contextual_word_embedding.py”, line 141, in init
self.transformer_tokenizer = AutoTokenizer.from_pretrained(self.transformer,
File “C:\Users\chentao\clone_dir\HanLP\hanlp\layers\transformers\pt_imports.py”, line 65, in from_pretrained
tokenizer = cls.from_pretrained(get_mirror(transformer), use_fast=use_fast, do_basic_tokenize=do_basic_tokenize,
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 523, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 416, in get_tokenizer_config
resolved_config_file = cached_path(
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\site-packages\transformers\file_utils.py”, line 1347, in cached_path raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse C:\Users\chentao\AppData\Roaming\hanlp\hanlp\transformers\electra_zh_small_20210520_124451\tokenizer_config.json as a URL or as a local path

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\unittest\loader.py”, line 436, in _find_test_path
module = self._get_module_from_name(name)
File “C:\Users\chentao.conda\envs\pure_hanlp\lib\unittest\loader.py”, line 377, in get_module_from_name
import(name)
File “C:\Users\chentao\clone_dir\HanLP\tests\test_mtl.py”, line 6, in
mtl = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH, devices=-1)
File "C:\Users\chentao\clone_dir\HanLP\hanlp_init
.py", line 43, in load
return load_from_meta_file(save_dir, ‘meta.json’, verbose=verbose, **kwargs)
File “C:\Users\chentao\clone_dir\HanLP\hanlp\utils\component_util.py”, line 121, in load_from_meta_file
exit(1)
SystemExit: 1


Ran 2 tests in 0.000s

FAILED (errors=1)

Successfully installed alnlp-1.0.0rc27 chardet-4.0.0 click-8.0.1 colorama-0.4.4 filelock-3.0.12 hanlp-2.1.0a52 hanlp-downloader-0.0.22 huggingface-hub-0.0.12 idna-2.10 joblib-1.0.1 numpy-1.21.0 packaging-21.0 pynvml-11.0.0 pyparsing-2.4.7 pyyaml-5.4.1 regex-2021.7.6 requests-2.25.1 sacremoses-0.0.45 sentencepiece-0.1.96 six-1.16.0 termcolor-1.1.0 tokenizers-0.10.3 toposort-1.5 torch-1.9.0 tqdm-4.61.1 transformers-4.8.2 typing-extensions-3.10.0.0 urllib3-1.26.6

1 Like

感谢反馈,该问题由于第三方huggingface的bug引起,影响了部分Windows系统。HanLP每次提交都经过了ubuntu-latest, macos-latest, windows-latest三大操作系统的检测:

其中,检测的Windows版本为Windows Server 2019,但民用Win10不在此列。

现在已经修复了这个第三方bug,请升级pip install hanlp -U

谢谢谢谢谢谢谢谢

可以考虑在C:\Users\dashe\AppData\Roaming\hanlp\hanlp\transformers\xlm-roberta-base_20210526_112208下建立一个tokenizer_config.json文件,里面放一个空字典{}就行了。最近的ernie-gram也有这个问题,我已经解决了。

2 Likes