BPE-Dropout: Simple and Effective Subword Regularization

BPE tokenization is a de-facto technique to reduce vocabulary size by splitting a word into subwords following a merging table, which is trained by merging frequent subwords to a larger one till the desired vocabulary size is reached. This paper introduces a very simple method to improve generalization via randomly drop some rules in the merging table.

As shown in Figure 1, the word “unrelated” could have several different segmentation during training while only the original segmentation in subfigure 1 is used during inference. Their method shows statistical significance on several MT datasets.


Personally I like this kind of “simple” methods since performance boost is obtained for free which greatly benefits production systems. The only regret is that these authors didn’t experiment their dropout on any pretrained Transformers.

