Multimodal Capabilities
Multimodal Capabilities
This feature is available on all Portkey plans.
The Gateway is your unified interface for multimodal models, along with chat, text, and embedding models.
Using the Gateway, you can call vision
, audio (text-to-speech & speech-to-text)
, image generation
and other multimodal models from multiple providers (like OpenAI
, Anthropic
, Stability AI
, etc.) — all using the familiar OpenAI signature.
Explore the AI Gateway’s Multimodal capabilities below:
Vision
Image Generation
Function Calling
Speech-to-Text
Text-to-Speech
Was this page helpful?