Portkey Blog Portkey Blog
  • Home
  • Production Guides
  • New Releases
  • Talks
  • Upcoming Events
  • Paper Summaries
  • Portkey Docs
  • Join Community
Sign in Subscribe

API calls

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

This paper introduces the Skeleton-of-Thought (SoT) method to decrease the generation latency of large language models (LLMs). SoT guides LLMs to first generate the skeleton of the answer and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point. The m
The Quill Aug 21, 2023
Our AI overlords

⭐ Reducing LLM Costs & Latency with Semantic Cache

Implementing semantic cache from scratch for production use cases.
Vrushank Vyas Jul 11, 2023

Subscribe to Portkey Blog

  • Portkey Blog
  • Portkey Website
Portkey Blog © 2025. Powered by Ghost