Multimodal Capabilities
The Gateway is your unified interface for multimodal models, along with chat, text, and embedding models.
Using the Gateway, you can call vision
, audio (text-to-speech & speech-to-text)
, image generation
and other multimodal models from multiple providers (like OpenAI
, Anthropic
, Stability AI
, etc.) — all using the familiar OpenAI signature.
Explore the AI Gateway's Multimodal capabilities below:
pageVisionpageImage GenerationpageFunction CallingpageSpeech-to-TextpageText-to-SpeechLast updated