|
LIMIT-BERT : Linguistics Informed Multi-Task BERT
|
|
0
|
792
|
October 18, 2021
|
|
Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese
|
|
0
|
945
|
October 18, 2021
|
|
What Context Features Can Transformer Language Models Use?
|
|
0
|
880
|
September 22, 2021
|
|
有个疑问,为什么ltp,hanlp都用electra,而不是其他预训练模型?
|
|
2
|
1955
|
September 11, 2021
|
|
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
|
|
0
|
1130
|
September 3, 2021
|
|
Are Pretrained Convolutions Better than Pretrained Transformers?
|
|
0
|
808
|
August 18, 2021
|
|
ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for...
|
|
0
|
1431
|
August 7, 2021
|
|
Multi-view Subword Regularization
|
|
0
|
788
|
July 19, 2021
|
|
Enabling Language Models to Fill in the Blanks
|
|
0
|
696
|
June 30, 2021
|
|
Stolen Probability: A Structural Weakness of Neural Language Models
|
|
0
|
875
|
May 27, 2021
|
|
Exploring the Limits of Transfer Learning with a Unified Text-to-Text...
|
|
0
|
1077
|
May 14, 2021
|
|
Spelling Error Correction with Soft-Masked BERT
|
|
0
|
1555
|
May 9, 2021
|
|
Structured Pruning of Large Language Models
|
|
0
|
942
|
February 25, 2021
|
|
Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine...
|
|
0
|
784
|
February 23, 2021
|
|
A Mixture of h - 1 Heads is Better than h Heads
|
|
0
|
723
|
February 20, 2021
|
|
FastBERT: a Self-distilling BERT with Adaptive Inference Time
|
|
0
|
1596
|
February 19, 2021
|
|
How does BERT’s attention change when you fine-tune? An analysis methodology...
|
|
0
|
1074
|
February 19, 2021
|
|
What Does BERT Learn about the Structure of Language?
|
|
0
|
1558
|
January 25, 2021
|
|
请问如何在已有模型的基础上训练自己的模型
|
|
2
|
1659
|
December 11, 2020
|
|
BPE-Dropout: Simple and Effective Subword Regularization
|
|
0
|
1959
|
October 17, 2020
|
|
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language...
|
|
0
|
1479
|
September 21, 2020
|
|
复现了ACL 2020的FastBert,欢迎试用和反馈
|
|
1
|
1139
|
August 17, 2020
|