Portkey Blog Portkey Blog
  • Home
  • Production Guides
  • New Releases
  • Talks
  • Upcoming Events
  • Paper Summaries
  • Portkey Docs
  • Join Community
Sign in Subscribe

inference latency

GPT-4 is Getting Faster 🐇

GPT-4 is Getting Faster 🐇

Over the past few months, we've been keenly observing latencies for both GPT 3.5 & 4. The emerging patterns have been intriguing. The standout observation? GPT-4 is catching up in speed, closing the latency gap with GPT 3.5. Our findings reveal a consistent decline in GPT-4 latency. While your
Vrushank Vyas Oct 16, 2023
Our AI overlords

⭐ Reducing LLM Costs & Latency with Semantic Cache

Implementing semantic cache from scratch for production use cases.
Vrushank Vyas Jul 11, 2023

LoRA: Low-Rank Adaptation of Large Language Models - Summary

The paper proposes Low-Rank Adaptation (LoRA) as an approach to reduce the number of trainable parameters for downstream tasks in natural language processing. LoRA injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable
Rohit Agarwal Apr 15, 2023

Subscribe to Portkey Blog

  • Portkey Blog
  • Portkey Website
Portkey Blog © 2025. Powered by Ghost