Google Gemini

Portkey provides a robust and secure gateway to facilitate the integration of various Large Language Models (LLMs) into your applications, including Google Gemini APIs. With Portkey, you can take advantage of features like fast AI gateway access, observability, prompt management, and more, all while ensuring the secure management of your LLM API keys through a virtual key system.

Provider Slug. google

Portkey SDK Integration with Google Gemini Models

Portkey provides a consistent API to interact with models from various providers. To integrate Google Gemini with Portkey:

1. Install the Portkey SDK

Add the Portkey SDK to your application to interact with Google Gemini’s API through Portkey’s gateway.

npm install --save portkey-ai

2. Initialize Portkey with the Virtual Key

To use Gemini with Portkey, get your API key from here, then add it to Portkey to create the virtual key.

import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]
    provider:"@PROVIDER" // Your Google Virtual Key
})

3. Invoke Chat Completions with Google Gemini

Use the Portkey instance to send requests to Google Gemini. You can also override the virtual key directly in the API call if needed.

const chatCompletion = await portkey.chat.completions.create({
    messages: [
        { role: 'system', content: 'You are not a helpful assistant' },
        { role: 'user', content: 'Say this is a test' }
    ],
    model: 'gemini-1.5-pro',
});

console.log(chatCompletion.choices);

Portkey supports the system_instructions parameter for Google Gemini 1.5 - allowing you to control the behavior and output of your Gemini-powered applications with ease.Simply include your Gemini system prompt as part of the {"role":"system"} message within the messages array of your request body. Portkey Gateway will automatically transform your message to ensure seamless compatibility with the Google Gemini API.

Function Calling

Portkey supports function calling mode on Google’s Gemini Models. Explore this Cookbook for a deep dive and examples: Function Calling

Advanced Multimodal Capabilities with Gemini

Gemini models are inherently multimodal, capable of processing and understanding content from a wide array of file types. Portkey streamlines the integration of these powerful features by providing a unified, OpenAI-compatible API.

The Portkey Advantage: A Unified Format for All MediaTo simplify development, Portkey uses a consistent format for all multimodal requests. Whether you’re sending an image, audio, video, or document, you will use an object with type: 'image_url' within the user message’s content array.Portkey’s AI Gateway intelligently interprets your request—based on the URL or data URI you provide—and translates it into the precise format required by the Google Gemini API. This means you only need to learn one structure for all your media processing needs.

Image Processing

Gemini can analyze images to describe their content, answer visual questions, or identify objects.

Gemini Image Understanding Docs

Method 1: Sending an Image via Google Files URL Use the Google Files API to upload your image and get a URL. This is the recommended approach for larger files or when you need persistent storage.

To upload files and get Google Files URLs, use the Files API. The URL format will be similar to: https://generativelanguage.googleapis.com/v1beta/files/[FILE_ID]

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            {
                type: 'image_url',
                image_url: {
                    url: 'https://generativelanguage.googleapis.com/v1beta/files/your-file-id'
                }
            },
            {
                type: 'text',
                text: 'Describe this image in detail.'
            }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Method 2: Sending a Local Image as Base64 Data Use this method for local image files. The file is encoded into a Base64 string and sent as a data URI. This is ideal for smaller files when you don’t want to use the Files API. The data URI format is: data:<MIME_TYPE>;base64,<YOUR_BASE64_DATA>

import fs from 'fs';

const imageBytes = fs.readFileSync('local-image.png');
const base64Image = imageBytes.toString('base64');
const imageUri = `data:image/png;base64,${base64Image}`;

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            { type: 'image_url', image_url: { url: imageUri }},
            { type: 'text', text: 'What is in this picture?' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Supported Image MIME types: image/png, image/jpeg, image/webp, image/heic, image/heif

Audio Processing

Gemini can transcribe speech, summarize audio content, or answer questions about sounds.

Gemini Audio Understanding Docs

Method 1: Sending Audio via Google Files URL Upload your audio file using the Files API to get a Google Files URL.

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            {
                type: 'image_url',
                image_url: {
                    url: 'https://generativelanguage.googleapis.com/v1beta/files/your-audio-file-id'
                }
            },
            { type: 'text', text: 'Please transcribe the speech in this audio.' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Method 2: Sending Local Audio as Base64 Data This is the standard way to process local audio files directly through the API.

import fs from 'fs';

const audioBytes = fs.readFileSync('audio-example.mp3');
const base64Audio = audioBytes.toString('base64');
const audioUri = `data:audio/mp3;base64,${base64Audio}`;

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            { type: 'image_url', image_url: { url: audioUri }},
            { type: 'text', text: 'Describe this audio' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Supported Audio MIME types: audio/wav, audio/mp3, audio/aiff, audio/aac, audio/ogg, audio/flac

Video Processing

Gemini can summarize videos, answer questions about specific events, and describe scenes.

Gemini Video Understanding Docs

Method 1: Sending a Video via YouTube URL YouTube is the only supported public URL source for videos. Simply provide the YouTube video URL.

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            {
                type: 'text',
                text: 'Describe this video in 3 sentences.'
            },
            {
                type: 'image_url',
                image_url: {
                    url: 'https://www.youtube.com/watch?v=9hE5-98ZeCg'
                }
            }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Method 2: Sending Local Video as Base64 Data For smaller video files, you can encode them as base64. Note that this method has size limitations.

import fs from 'fs';

const videoBytes = fs.readFileSync('video-example.mp4');
const base64Video = videoBytes.toString('base64');
const videoUri = `data:video/mp4;base64,${base64Video}`;

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            { type: 'image_url', image_url: { url: videoUri }},
            { type: 'text', text: 'Describe this video' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Method 3: Sending Video via Google Files URL For larger video files, upload them using the Files API to get a Google Files URL.

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            {
                type: 'image_url',
                image_url: {
                    url: 'https://generativelanguage.googleapis.com/v1beta/files/your-video-file-id'
                }
            },
            { type: 'text', text: 'Please describe the main events in this video.' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Supported Video MIME types: video/mp4, video/mpeg, video/mov, video/avi, video/webm, video/wmv

Document Processing (PDF)

Gemini’s vision capabilities excel at understanding the content of PDF documents, including text, tables, and images.

Gemini Documents Understanding Docs

Method 1: Sending a Document via Google Files URL Upload your PDF using the Files API to get a Google Files URL.

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            {
                type: 'image_url',
                image_url: {
                    url: 'https://generativelanguage.googleapis.com/v1beta/files/your-pdf-file-id'
                }
            },
            { type: 'text', text: 'Summarize the key findings of this research paper.' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

Method 2: Sending a Local Document as Base64 Data This is suitable for smaller, local PDF files.

import fs from 'fs';

const pdfBytes = fs.readFileSync('whitepaper.pdf');
const base64Pdf = pdfBytes.toString('base64');
const pdfUri = `data:application/pdf;base64,${base64Pdf}`;

const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            { type: 'image_url', image_url: { url: pdfUri }},
            { type: 'text', text: 'What is the main conclusion of this document?' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);

While you can send other document types like .txt or .html, they will be treated as plain text. Gemini’s native document vision capabilities are optimized for the application/pdf MIME type.

Code Execution Tool

Gemini can use a built-in code interpreter tool to solve complex computational problems, perform calculations, and generate code. To enable this, simply include the code_execution tool in your request. The model will automatically decide when to invoke it.

const response = await portkey.chat.completions.create({
    model: "gemini-1.5-pro",
    messages: [{
        "role": "user",
        "content": "Calculate the 20th Fibonacci number. Then find the nearest palindrome to it."
    }],
    tools: [{ "type": "code_execution" }]
});

console.log(response.choices[0].message.content);

Important: For all file uploads (except YouTube videos), it’s recommended to use the Google Files API to upload your files first, then use the returned file URL in your requests. This approach provides better performance and reliability for larger files.

Grounding with Google Search

Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results. Grounding is invoked by passing the google_search tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval (for older models like gemini-1.5-flash) in the tools array.

"tools": [
    {
        "type": "function",
        "function": {
            "name": "google_search" // or google_search_retrieval for older models
        }
    }]

If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.

Extended Thinking (Reasoning Models) (Beta)

The assistants thinking response is returned in the response_chunk.choices[0].delta.content_blocks array, not the response.choices[0].message.content string.

Models like gemini-2.5-flash-preview-04-17 gemini-2.5-flash-preview-04-17 support extended thinking. This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well. Note that you will have to set strict_open_ai_compliance=False in the headers to use this feature.

Single turn conversation

from portkey_ai import Portkey

# Initialize the Portkey client
portkey = Portkey(
    api_key="PORTKEY_API_KEY",  # Replace with your Portkey API key
    provider="@PROVIDER",
    strict_open_ai_compliance=False
)

# Create the request
response = portkey.chat.completions.create(
  model="gemini-2.5-flash-preview-04-17",
  max_tokens=3000,
  thinking={
      "type": "enabled",
      "budget_tokens": 2030
  },
  stream=True,
  messages=[
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
              }
          ]
      }
  ]
)
print(response)
# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
# response = portkey.chat.completions.create(
#   ...same config as above but with stream: true
# )
# for chunk in response:
#     if chunk.choices[0].delta:
#         content_blocks = chunk.choices[0].delta.get("content_blocks")
#         if content_blocks is not None:
#             for content_block in content_blocks:
#                 print(content_block)

To disable thinking for gemini models like gemini-2.5-flash-preview-04-17, you are required to explicitly set budget_tokens to 0.

"thinking": {
    "type": "enabled",
    "budget_tokens": 0
}

Gemini grounding mode may not work via Portkey SDK. Contact [email protected] for assistance.

Next Steps

The complete list of features supported in the SDK are available on the link below.

SDK

You’ll find more information in the relevant sections:

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

Portkey SDK Integration with Google Gemini Models

1. Install the Portkey SDK

2. Initialize Portkey with the Virtual Key

3. Invoke Chat Completions with Google Gemini

Function Calling

Advanced Multimodal Capabilities with Gemini

Image Processing

Gemini Image Understanding Docs

Audio Processing

Gemini Audio Understanding Docs

Video Processing

Gemini Video Understanding Docs

Document Processing (PDF)

Gemini Documents Understanding Docs

Code Execution Tool

Grounding with Google Search

Extended Thinking (Reasoning Models) (Beta)

Single turn conversation

Next Steps

SDK

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

​Portkey SDK Integration with Google Gemini Models

​1. Install the Portkey SDK

​2. Initialize Portkey with the Virtual Key

​3. Invoke Chat Completions with Google Gemini

​Function Calling

​Advanced Multimodal Capabilities with Gemini

​Image Processing

Gemini Image Understanding Docs

​Audio Processing

Gemini Audio Understanding Docs

​Video Processing

Gemini Video Understanding Docs

​Document Processing (PDF)

Gemini Documents Understanding Docs

​Code Execution Tool

​Grounding with Google Search

​Extended Thinking (Reasoning Models) (Beta)

​Single turn conversation

​Next Steps

SDK

Portkey SDK Integration with Google Gemini Models

1. Install the Portkey SDK

2. Initialize Portkey with the Virtual Key

3. Invoke Chat Completions with Google Gemini

Function Calling

Advanced Multimodal Capabilities with Gemini

Image Processing

Audio Processing

Video Processing

Document Processing (PDF)

Code Execution Tool

Grounding with Google Search

Extended Thinking (Reasoning Models) (Beta)

Single turn conversation

Next Steps