Sign in Subscribe

The Quill

San Francisco

Evaluating Long-Context LLMs

This paper proposes a novel way to evaluate large language models (LLMs) that claim to handle long contexts effectively. The researchers introduce a benchmark known as N, enhancing the traditional Needle-in-a-Haystack (NIAH) tests by eliminating literal matches between the search context and the re

Multi-LLM Text Summarization

The paper introduces a novel framework called Multi-LLM for text summarization, which leverages multiple large language models (LLMs) to generate better summaries, especially for long documents. This framework is designed to overcome the limitations of using a single LLM, which might fail to captur

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference - Summary

<p>The research paper introduces <strong>ModernBERT</strong>, an updated version of the original BERT model, which is an encoder-only transformer designed to improve retrieval and classification tasks. Despite the original BERT's widespread use, until now, there have been limited improvements in te

Expanding the AI Gateway with Google Vertex AI Integration

Expanding the AI Gateway with Google Vertex AI Integration

At Portkey.ai, our mission is to empower developers and businesses to leverage the full potential of artificial intelligence in their applications. We're thrilled to announce a significant milestone in this journey: the integration of Google's Vertex AI into our AI gateway. This integration expands your

Mixtral of Experts - Summary

Mixtral of Experts - Summary

The paper introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that outperforms existing models like Llama 2 70B and GPT-3.5 on various benchmarks. It uses a routing network to select two experts per token, allowing access to 47B parameters while actively using only 13B, enhan

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - Summary

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - Summary

This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning based method for distribution alignment. Experimental

How to differentiate your AI product - Jasper style!

How to differentiate your AI product - Jasper style!

As developers and founders, you might find yourself asking how does your startup differentiate from ChatGPT. More so, how do you convince a customer to try your product over a generic Chat client.