Portkey Blog Portkey Blog
  • Home
  • Production Guides
  • New Releases
  • Talks
  • Upcoming Events
  • Paper Summaries
  • Portkey Docs
  • Join Community
Sign in Subscribe

relevance

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - Summary

This paper introduces the Skeleton-of-Thought (SoT) method to decrease the generation latency of large language models (LLMs). SoT guides LLMs to first generate the skeleton of the answer and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point. The m
The Quill Aug 21, 2023

Subscribe to Portkey Blog

  • Portkey Blog
  • Portkey Website
Portkey Blog © 2025. Powered by Ghost