⭐ Building Reliable LLM Apps: 5 Things To Know In this blog post, we explore a roadmap for building reliable large language model applications. Let’s get started!
⭐️ Decoding OpenAI Evals Learn how to use the eval framework to evaluate models & prompts to optimise LLM systems for the best outputs.
⭐️ Ranking LLMs with Elo Ratings Choosing an LLM from 50+ models available today is hard. We explore Elo ratings as a method to objectively rank and pick the best performers for our use case.
Self-Consistency Improves Chain of Thought Reasoning in Language Models - Summary The paper proposes a new decoding strategy called self-consistency to improve the performance of chain-of-thought prompting in language models for complex reasoning tasks. Self-consistency first samples a diverse set of reasoning paths and then selects the most consistent answer by marginalizing ou
The Power of Scale for Parameter-Efficient Prompt Tuning - Summary The paper explores prompt tuning, a mechanism for learning soft prompts to condition frozen language models for specific downstream tasks. The approach outperforms GPT-3's few-shot learning and becomes more competitive with scale. Prompt tuning confers benefits in robustness to domain transfer and
GPT Understands, Too - Summary The paper proposes a novel method called P-tuning, which employs trainable continuous prompt embeddings to improve the performance of GPTs on natural language understanding (NLU) tasks. The method is shown to be better than or comparable to similar-sized BERTs on NLU tasks and substantially improve
Large Language Models Are Human-Level Prompt Engineers - Summary The paper proposes Automatic Prompt Engineer (APE), an algorithm that generates and selects natural language instructions for large language models (LLMs) to improve task performance. APE treats the instruction as a program and optimizes it by searching over a pool of instruction candidates propose