The paper introduces a new mechanism called Distilling step-by-step that trains smaller models to outperform larger language models (LLMs) while using less training data and smaller model sizes. The mechanism extracts LLM rationales as additional supervision for small models within a multi-task tra
The paper discusses the use of prompt engineering to leverage pre-trained language models for business process management (BPM) tasks. It identifies the potentials and challenges of prompt engineering for BPM research.
Choosing an LLM from 20+ models available today is hard. We explore Elo ratings as a method to objectively rank and pick the best performers for our use case.
The paper proposes a novel method called P-tuning, which employs trainable continuous prompt embeddings to improve the performance of GPTs on natural language understanding (NLU) tasks. The method is shown to be better than or comparable to similar-sized BERTs on NLU tasks and substantially improve
The paper presents a method for aligning language models with user intent by fine-tuning with human feedback. The resulting models, called InstructGPT, show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Th
The paper proposes Low-Rank Adaptation (LoRA) as an approach to reduce the number of trainable parameters for downstream tasks in natural language processing. LoRA injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable
The paper discusses the limitations of pre-trained language representations in NLP systems and the need for task-specific datasets and fine-tuning. The authors show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with pri