This paper provides a comprehensive overview of the current state of knowledge on reasoning in Large Language Models (LLMs), including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous r
The paper presents a method to extend the context length of BERT, a Transformer-based model in natural language processing, by incorporating token-based memory storage and segment-level recurrence with recurrent memory (RMT). The method enables the model to store task-specific information across up
The paper introduces a new mechanism called Distilling step-by-step that trains smaller models to outperform larger language models (LLMs) while using less training data and smaller model sizes. The mechanism extracts LLM rationales as additional supervision for small models within a multi-task tra
The paper discusses the limitations of large language models (LMs) and proposes a neuro-symbolic architecture called the Modular Reasoning, Knowledge and Language (MRKL) system that combines LMs with external knowledge sources and discrete reasoning modules to overcome these limitations.
The paper presents a multilingual pre-trained language model named MiLMo that performs better on minority language tasks, including Mongolian, Tibetan, Uyghur, Kazakh and Korean. The authors also construct a minority multilingual text classification dataset named MiTC, and train a word2vec model fo
This paper surveys the recent advances in Large Language Models (LLMs), which are pre-trained Transformer models over large-scale corpora. The paper discusses the background, key findings, and mainstream techniques of LLMs, focusing on pre-training, adaptation tuning, utilization, and capacity eval
The paper explores prompt tuning, a mechanism for learning soft prompts to condition frozen language models for specific downstream tasks. The approach outperforms GPT-3's few-shot learning and becomes more competitive with scale. Prompt tuning confers benefits in robustness to domain transfer and
The paper proposes a novel method called P-tuning, which employs trainable continuous prompt embeddings to improve the performance of GPTs on natural language understanding (NLU) tasks. The method is shown to be better than or comparable to similar-sized BERTs on NLU tasks and substantially improve