Multimodal Capabilities

The Gateway is your unified interface for multimodal models, along with chat, text, and embedding models.

Using the Gateway, you can call vision, audio (text-to-speech & speech-to-text), image generation and other multimodal models from multiple providers (like OpenAI, Anthropic, Stability AI, etc.) — all using the familiar OpenAI signature.

Explore the AI Gateway's Multimodal capabilities below:

pageVisionpageImage GenerationpageFunction CallingpageSpeech-to-TextpageText-to-Speech

Last updated