Multimodal Capabilities

This feature is available on all Portkey plans.

The Gateway is your unified interface for multimodal models, along with chat, text, and embedding models. Using the Gateway, you can call vision, audio (text-to-speech & speech-to-text), image generation and other multimodal models from multiple providers (like OpenAI, Anthropic, Stability AI, etc.) — all using the familiar OpenAI signature.

Explore the AI Gateway’s Multimodal capabilities below:

Vision

Image Generation

Function Calling

Speech-to-Text

Text-to-Speech

Last modified on October 16, 2024

Conditional Routing Image Generation

⌘I

​Explore the AI Gateway’s Multimodal capabilities below:

Vision

Image Generation

Function Calling

Speech-to-Text

Text-to-Speech

Explore the AI Gateway’s Multimodal capabilities below: