Multimodal Capabilities

This feature is available on all Portkey plans.

The Gateway is your unified interface for multimodal models, along with chat, text, and embedding models. Using the Gateway, you can call vision, audio (text-to-speech & speech-to-text), image generation and other multimodal models from multiple providers (like OpenAI, Anthropic, Stability AI, etc.) — all using the familiar OpenAI signature.

Explore the AI Gateway’s Multimodal capabilities below:

Introduction

Product

Self-Hosting

Support

Multimodal Capabilities

Explore the AI Gateway’s Multimodal capabilities below:

Vision

Image Generation

Function Calling

Speech-to-Text

Text-to-Speech

Introduction

Product

Self-Hosting

Support

​Explore the AI Gateway’s Multimodal capabilities below:

Vision

Image Generation

Function Calling

Speech-to-Text

Text-to-Speech

Explore the AI Gateway’s Multimodal capabilities below: