Arxiv URL: https://arxiv.org/abs/2103.10385
Authors: Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang
The paper proposes a novel method called P-tuning, which employs trainable continuous prompt embeddings to improve the performance of GPTs on natural language understanding (NLU) tasks. The method is shown to be better than or comparable to similar-sized BERTs on NLU tasks and substantially improves the previous best on the knowledge probing (LAMA) benchmark. P-tuning also improves BERTs’ performance in both few-shot and supervised settings while reducing the need for prompt engineering. The paper shows that language models contain much more world knowledge and prior task knowledge than previously assumed.
Key Insights & Learnings:
- GPTs can be as competitive as BERTs in natural language understanding with P-tuning, which can boost pre-trained language models’ performance.
- P-tuning is a general method to improve GPTs and BERTs in both few-shot and fully-supervised settings.
- Language models have grasped more world knowledge and prior-task knowledge during pre-training than previously thought.
- Giant models suffer from poor transferability, and fine-tuning on downstream tasks hardly works for those trillion-scale models.
- Handcraft prompt searching heavily relies on large validation sets and can result in overfitting.
Terms Mentioned: natural language understanding, pre-training, language models, GPT, BERT, P-tuning, knowledge probing, LAMA, SuperGlue, few-shot, supervised learning, world knowledge, prior task knowledge, transferability, fine-tuning, downstream tasks, trillion-scale models, handcrafted prompts, overfitting
Technologies / Libraries Mentioned: PyTorch