Provider Slug.
google
Portkey SDK Integration with Google Gemini Models
Portkey provides a consistent API to interact with models from various providers. To integrate Google Gemini with Portkey:1. Install the Portkey SDK
Add the Portkey SDK to your application to interact with Google Gemini’s API through Portkey’s gateway.2. Initialize Portkey with the Virtual Key
To use Gemini with Portkey, get your API key from here, then add it to Portkey to create the virtual key.3. Invoke Chat Completions with Google Gemini
Use the Portkey instance to send requests to Google Gemini. You can also override the virtual key directly in the API call if needed.Portkey supports the
system_instructions
parameter for Google Gemini 1.5 - allowing you to control the behavior and output of your Gemini-powered applications with ease.Simply include your Gemini system prompt as part of the {"role":"system"}
message within the messages
array of your request body. Portkey Gateway will automatically transform your message to ensure seamless compatibility with the Google Gemini API.Function Calling
Portkey supports function calling mode on Google’s Gemini Models. Explore this Cookbook for a deep dive and examples: Function CallingAdvanced Multimodal Capabilities with Gemini
Gemini models are inherently multimodal, capable of processing and understanding content from a wide array of file types. Portkey streamlines the integration of these powerful features by providing a unified, OpenAI-compatible API.The Portkey Advantage: A Unified Format for All MediaTo simplify development, Portkey uses a consistent format for all multimodal requests. Whether you’re sending an image, audio, video, or document, you will use an object with
type: 'image_url'
within the user message’s content
array.Portkey’s AI Gateway intelligently interprets your request—based on the URL or data URI you provide—and translates it into the precise format required by the Google Gemini API. This means you only need to learn one structure for all your media processing needs.Image Processing
Gemini can analyze images to describe their content, answer visual questions, or identify objects.Gemini Image Understanding Docs
To upload files and get Google Files URLs, use the Files API. The URL format will be similar to:
https://generativelanguage.googleapis.com/v1beta/files/[FILE_ID]
data:<MIME_TYPE>;base64,<YOUR_BASE64_DATA>
Supported Image MIME types:
image/png
, image/jpeg
, image/webp
, image/heic
, image/heif
Audio Processing
Gemini can transcribe speech, summarize audio content, or answer questions about sounds.Gemini Audio Understanding Docs
Supported Audio MIME types:
audio/wav
, audio/mp3
, audio/aiff
, audio/aac
, audio/ogg
, audio/flac
Video Processing
Gemini can summarize videos, answer questions about specific events, and describe scenes.Gemini Video Understanding Docs
Supported Video MIME types:
video/mp4
, video/mpeg
, video/mov
, video/avi
, video/webm
, video/wmv
Document Processing (PDF)
Gemini’s vision capabilities excel at understanding the content of PDF documents, including text, tables, and images.Gemini Documents Understanding Docs
While you can send other document types like
.txt
or .html
, they will be treated as plain text. Gemini’s native document vision capabilities are optimized for the application/pdf
MIME type.Code Execution Tool
Gemini can use a built-in code interpreter tool to solve complex computational problems, perform calculations, and generate code. To enable this, simply include thecode_execution
tool in your request. The model will automatically decide when to invoke it.
Important: For all file uploads (except YouTube videos), it’s recommended to use the Google Files API to upload your files first, then use the returned file URL in your requests. This approach provides better performance and reliability for larger files.
Grounding with Google Search
Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results. Grounding is invoked by passing thegoogle_search
tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval
(for older models like gemini-1.5-flash) in the tools
array.
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
Extended Thinking (Reasoning Models) (Beta)
The assistants thinking response is returned in the
response_chunk.choices[0].delta.content_blocks
array, not the response.choices[0].message.content
string.gemini-2.5-flash-preview-04-17
gemini-2.5-flash-preview-04-17
support extended thinking.
This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well.
Note that you will have to set strict_open_ai_compliance=False
in the headers to use this feature.
Single turn conversation
To disable thinking for gemini models like
gemini-2.5-flash-preview-04-17
, you are required to explicitly set budget_tokens
to 0
.Gemini grounding mode may not work via Portkey SDK. Contact [email protected] for assistance.