Portkey Blog Portkey Blog
  • Home
  • Production Guides
  • New Releases
  • Talks
  • Upcoming Events
  • Portkey Docs
Sign in Subscribe

API calls

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

This paper introduces the Skeleton-of-Thought (SoT) method to decrease the generation latency of large language models (LLMs). SoT guides LLMs to first generate the skeleton of the answer and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point. The m
The Quill 21 Aug 2023
Our AI overlords

⭐ Semantic Cache for Large Language Models

Learn how semantic caching for large language models reduces cost, improves latency, and stabilizes high-volume AI applications by reusing responses based on intent, not just text.
Vrushank Vyas 11 Jul 2023

Subscribe to Portkey Blog

  • Blog Home
  • Portkey Website
Portkey Blog © 2026. Powered by Ghost