paper summaries - Portkey Blog

Evaluating Long-Context LLMs

This paper proposes a novel way to evaluate large language models (LLMs) that claim to handle long contexts effectively. The researchers introduce a benchmark known as N, enhancing the traditional Needle-in-a-Haystack (NIAH) tests by eliminating literal matches between the search context and the re

Multi-LLM Text Summarization

The paper introduces a novel framework called Multi-LLM for text summarization, which leverages multiple large language models (LLMs) to generate better summaries, especially for long documents. This framework is designed to overcome the limitations of using a single LLM, which might fail to captur

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference - Summary

The research paper introduces ModernBERT, an updated version of the original BERT model, which is an encoder-only transformer designed to improve retrieval and classification tasks. Despite the original BERT's widespread use, until now, there have been limited improvements in te

Mixtral of Experts - Summary

The paper introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that outperforms existing models like Llama 2 70B and GPT-3.5 on various benchmarks. It uses a routing network to select two experts per token, allowing access to 47B parameters while actively using only 13B, enhan

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - Summary

This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning based method for distribution alignment. Experimental

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

This paper introduces the Skeleton-of-Thought (SoT) method to decrease the generation latency of large language models (LLMs). SoT guides LLMs to first generate the skeleton of the answer and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point. The m

Towards Reasoning in Large Language Models: A Survey - Summary

This paper provides a comprehensive overview of the current state of knowledge on reasoning in Large Language Models (LLMs), including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous r