The paper introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that outperforms existing models like Llama 2 70B and GPT-3.5 on various benchmarks. It uses a routing network to select two experts per token, allowing access to 47B parameters while actively using only 13B, enhan
This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning based method for distribution alignment. Experimental
As developers and founders, you might find yourself asking how does your startup differentiate from ChatGPT. More so, how do you convince a customer to try your product over a generic Chat client.
This paper introduces the Skeleton-of-Thought (SoT) method to decrease the generation latency of large language models (LLMs). SoT guides LLMs to first generate the skeleton of the answer and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point. The m
This paper provides a comprehensive overview of the current state of knowledge on reasoning in Large Language Models (LLMs), including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous r
This paper reviews and compares methods for single-label and multi-label text classification, categorizing them into bag-of-words, sequence-based, graph-based, and hierarchical methods. The findings reveal that pre-trained language models outperform all recently proposed graph-based and hierarchy-b
The paper proposes a re-ranking approach for explainable recommender systems using knowledge graphs to optimize for recency, popularity, and diversity of explanations. The approach is evaluated on two public datasets and shows an increase in explanation quality while preserving recommendation utili
The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning
The paper proposes a prompt ensembling method for large language models called 'boosted prompting', which uses a small dataset to construct a set of few shot prompts that together comprise a boosted prompt ensemble. The few shot examples for each prompt are chosen in a stepwise fashion to be 'hard'
The paper discusses the cost associated with querying large language models (LLMs) and proposes FrugalGPT, a framework that uses LLM APIs to process natural language queries within a budget constraint. The framework uses prompt adaptation, LLM approximation, and LLM cascade to reduce the inference