何老师和各位群友晚上好!打扰大家一下,这几处跟着教材做,发现报错的地方,没想出来什么原因,几经周折,也没搜到相关的解答,还望老师和各位群友不吝赐教!非常感谢!2020-04-08T16:00:00Z
1.第二章P87直接运行demo_stopwords.py,报错:
https://github.com/hankcs/pyhanlp/blob/master/tests/book/ch02/demo_stopwords.py
Traceback (most recent call last):
File “”, line 58, in
trie = load_from_file(HanLP.Config.CoreStopWordDictionaryPath)
File “”, line 14, in load_from_file
for word in src:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x90 in position 2519: illegal multibyte sequence
———————————————————————————————————————–————
2.第三章P99直接运行sighan05_statistics.py,报错:
https://github.com/hankcs/pyhanlp/blob/master/tests/book/ch03/sighan05_statistics.py
Traceback (most recent call last):
File “”, line 38, in
(data.upper(),) + count_corpus(train_path, test_path)))
File “”, line 10, in count_corpus
train_counter, train_freq, train_chars = count_word_freq(train_path)
File “”, line 22, in count_word_freq
for line in src:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x84 in position 26: illegal multibyte sequence
————————————————————————————————————————————
3.第三章P101直接运行demo_corpus_loader.py,中文显示乱码
https://github.com/hankcs/pyhanlp/blob/master/tests/book/ch03/demo_corpus_loader.py
————————————————————————————————————————————
4.第三章P103my_cws_model.txt中文显示乱码:
…\static\data\test\my_cws_model.txt
————————————————————————————————————————————
5.第三章P114运行ngram_segment.py结果不一致:
https://github.com/hankcs/pyhanlp/blob/master/tests/book/ch03/ngram_segment.py
我的输出:
[’ ', ‘商品和服’, ‘务’, ’ ']
教材为:
[’ ‘, ‘商品’, ‘和’, 服务’, ’ ']
——————————————————————————————————
6.第三章P117demo_custom_dict.py运行结果与教材不一致:
https://github.com/hankcs/pyhanlp/blob/master/tests/book/ch03/demo_custom_dict.py
我的输出:
不挂载词典: [社会摇摆简称社会摇/n]
低优先级词典: [社会摇摆简称社会摇/n]
高优先级词典: [社会摇/nz, 摆简称/n, 社会摇/nz]
教材输出:
不挂载词典: [社会/n, 摇摆/v, 简称/v, 社会/n,摇/n]
低优先级词典: [社会/n, 摇摆/v, 简称/v, 社会摇/nz]
高优先级词典: [社会摇/nz, 摆简称/n, 社会摇/nz]
————————————————————————————————————