O guia definitivo para roberta pires
Nomes Masculinos A B C D E F G H I J K L M N Este P Q R S T U V W X Y Z TodosThe original BERT uses a subword-level tokenization with the vocabulary size of 30K which is learned after input preprocessing and using several heuristics. RoBERTa uses bytes instead of unicode characters as the base for subwords and expands the vocabulary size up to 50K