是否可以训练cbt9数据?

结果应该比CBT6更好. 特别是社交媒体数据:
https://catalog.ldc.upenn.edu/docs/LDC2016T13/README.txt
Genres:
Newswire: [0001-0325, 0400-0454, 0500-0540, 0600-0885, 0900-0931, 4000-4050]
Magazine articles: [0590-0596, 1001-1151]
Broadcast news:[2000-3145, 4051-4111]
Broadcast conversations: [4112-4197]
Weblogs: [4198-4411]
Discussion forums: [5000-5558]
SMS/Chat messages: [6000-6700]
conversational speech: [7000-7017]

希望可以将当前结果有所改善。示例:
“武书连英文2019中国大学排行榜 清华浙大北大。” =>
[“武书”, “连”, “英文”, “2019”, “中国”, “大学”, “排行榜”, " ", “清”, “华浙”, “大北大”, “。”]