Sign in Subscribe

BERT

Towards Reasoning in Large Language Models: A Survey - Summary

This paper provides a comprehensive overview of the current state of knowledge on reasoning in Large Language Models (LLMs), including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous r

Scaling Transformer to 1M tokens and beyond with RMT - Summary

The paper presents a method to extend the context length of BERT, a Transformer-based model in natural language processing, by incorporating token-based memory storage and segment-level recurrence with recurrent memory (RMT). The method enables the model to store task-specific information across up

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes - Summary

The paper introduces a new mechanism called Distilling step-by-step that trains smaller models to outperform larger language models (LLMs) while using less training data and smaller model sizes. The mechanism extracts LLM rationales as additional supervision for small models within a multi-task tra

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning - Summary

The paper discusses the limitations of large language models (LMs) and proposes a neuro-symbolic architecture called the Modular Reasoning, Knowledge and Language (MRKL) system that combines LMs with external knowledge sources and discrete reasoning modules to overcome these limitations.

MiLMo:Minority Multilingual Pre-trained Language Model - Summary

The paper presents a multilingual pre-trained language model named MiLMo that performs better on minority language tasks, including Mongolian, Tibetan, Uyghur, Kazakh and Korean. The authors also construct a minority multilingual text classification dataset named MiTC, and train a word2vec model fo

A Survey of Large Language Models - Summary

This paper surveys the recent advances in Large Language Models (LLMs), which are pre-trained Transformer models over large-scale corpora. The paper discusses the background, key findings, and mainstream techniques of LLMs, focusing on pre-training, adaptation tuning, utilization, and capacity eval

The Power of Scale for Parameter-Efficient Prompt Tuning - Summary

The paper explores prompt tuning, a mechanism for learning soft prompts to condition frozen language models for specific downstream tasks. The approach outperforms GPT-3's few-shot learning and becomes more competitive with scale. Prompt tuning confers benefits in robustness to domain transfer and