Sign in Subscribe

Language models

Discovering Language Model Behaviors with Model-Written Evaluations - Summary

The paper explores the use of language models (LMs) to automatically generate evaluations for testing LM behaviors. The generated evaluations are diverse and of high quality, and the approach is significantly cheaper, lower effort, and faster than manual data creation. The paper discovers new cases

We're Afraid Language Models Aren't Modeling Ambiguity - Summary

The paper discusses the importance of managing ambiguity in natural language understanding and evaluates the ability of language models (LMs) to recognize and disentangle possible meanings. The authors present AMBIENT, a linguist-annotated benchmark of 1,645 examples with diverse kinds of ambiguity

Just Tell Me: Prompt Engineering in Business Process Management - Summary

The paper discusses the use of prompt engineering to leverage pre-trained language models for business process management (BPM) tasks. It identifies the potentials and challenges of prompt engineering for BPM research.

Self-Consistency Improves Chain of Thought Reasoning in Language Models - Summary

Self-Consistency Improves Chain of Thought Reasoning in Language Models - Summary

The paper proposes a new decoding strategy called self-consistency to improve the performance of chain-of-thought prompting in language models for complex reasoning tasks. Self-consistency first samples a diverse set of reasoning paths and then selects the most consistent answer by marginalizing ou

GPT Understands, Too - Summary

The paper proposes a novel method called P-tuning, which employs trainable continuous prompt embeddings to improve the performance of GPTs on natural language understanding (NLU) tasks. The method is shown to be better than or comparable to similar-sized BERTs on NLU tasks and substantially improve

Training language models to follow instructions with human feedback - Summary

The paper presents a method for aligning language models with user intent by fine-tuning with human feedback. The resulting models, called InstructGPT, show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Th

Language Models are Few-Shot Learners - Summary

The paper discusses the limitations of pre-trained language representations in NLP systems and the need for task-specific datasets and fine-tuning. The authors show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with pri