Complete guide to integrate OpenAI API with Portkey. Support for gpt-4o, o1, chat completions, vision, and audio APIs with built-in reliability and monitoring features.
OpenAI’s API offers powerful language, embedding, and multimodal models (gpt-4o, o1, whisper, dall-e, etc.). Portkey makes your OpenAI requests production-ready with its observability, fallbacks, guardrails, and more features. Portkey also lets you use OpenAI API’s other capabilities like
Just paste your OpenAI API Key from here to Portkey to create your Virtual Key.
Your OpenAI personal or service account API keys can be saved to Portkey. Additionally, your OpenAI Admin API Keys can also be saved to Portkey so that you can route to OpenAI Admin routes through Portkey API.
Optional
Add your OpenAI organization and project ID details: (Docs)
Directly use OpenAI API key without the Virtual Key: (Docs)
Create a short-lived virtual key OR one with usage/rate limits: (Docs)
Note: While OpenAI supports setting budget & rate limits at Project level, on Portkey, along with that, you can set granular budget & rate limits per each key.
import OpenAI from 'openai';import { PORTKEY_GATEWAY_URL, createHeaders } from 'portkey-ai'const openai = new OpenAI({ apiKey: 'xx', baseURL: PORTKEY_GATEWAY_URL, defaultHeaders: createHeaders({ apiKey: "PORTKEY_API_KEY", provider:"@OPENAI_PROVIDER" })});async function main() { const completion = await openai.chat.completions.create({ messages: [{ role: 'user', content: 'Say this is a test' }], model: 'gpt-4o', }); console.log(chatCompletion.choices);}main();
…
…
Viewing the LogPortkey will log your request and give you useful data such as timestamp, request type, LLM used, tokens generated, and cost. For multimodal models, Portkey will also show the image sent with vision/image models, as well as the image generated.
Tool calling feature lets models trigger external tools based on conversation context. You define available functions, the model chooses when to use them, and your application executes them and returns results.Portkey supports OpenAI Tool Calling and makes it interoperable across multiple providers. With Portkey Prompts, you can templatize various your prompts & tool schemas as well.
Node.js
Python
cURL
Portkey Prompts
Get Weather Tool
Copy
Ask AI
let tools = [{ type: "function", function: { name: "getWeather", description: "Get the current weather", parameters: { type: "object", properties: { location: { type: "string", description: "City and state" }, unit: { type: "string", enum: ["celsius", "fahrenheit"] } }, required: ["location"] } }}];let response = await portkey.chat.completions.create({ model: "gpt-4o", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "What's the weather like in Delhi - respond in JSON" } ], tools, tool_choice: "auto",});console.log(response.choices[0].finish_reason);
Get Weather Tool
Copy
Ask AI
tools = [{ "type": "function", "function": { "name": "getWeather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City and state"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } }}]response = portkey.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What's the weather like in Delhi - respond in JSON"} ], tools=tools, tool_choice="auto")print(response.choices[0].finish_reason)
Get Weather Tool
Copy
Ask AI
curl -X POST "https://api.portkey.ai/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_PORTKEY_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What'\''s the weather like in Delhi - respond in JSON"} ], "tools": [{ "type": "function", "function": { "name": "getWeather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City and state"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } }], "tool_choice": "auto" }'
Tracing the RequestOn Portkey you can easily trace the whole tool call - from defining tool schemas to getting the final LLM output:
OpenAI’s vision models can analyze images alongside text, enabling visual question-answering capabilities. Images can be provided via URLs or base64 encoding in user messages.
OpenAI’s embedding models (like text-embedding-3-small) transform text inputs into lists of floating point numbers - smaller distances between vectors indicate higher text similarity. They power use cases like semantic search, content clustering, recommendations, and anomaly detection.Simply send text to the embeddings API endpoint to generate these vectors for your applications.
Copy
Ask AI
response = portkey.embeddings.create( input="Your text string goes here", model="text-embedding-3-small")print(response.data[0].embedding)
Prompt caching automatically reuses results from similar API requests, reducing latency by up to 80% and costs by 50%. This feature works by default for all OpenAI API calls, requires no setup, and has no additional fees.Portkey accurately logs the usage statistics and costs for your cached requests.
OpenAI’s Images API enables AI-powered image generation, manipulation, and variation creation for creative and commercial applications. Whether you’re building image generation features, editing tools, or creative applications, the API provides powerful visual AI capabilities through DALL·E models.The API offers three core capabilities:
Generate new images from text prompts (DALL·E 3, DALL·E 2)
Edit existing images with text-guided replacements (DALL·E 2)
Create variations of existing images (DALL·E 2)
Copy
Ask AI
import Portkey from 'portkey-ai';const client = new Portkey({ apiKey: 'PORTKEY_API_KEY', provider:'@PROVIDER'});async function main() { const image = await client.images.generate({ model: "dall-e-3", prompt: "Lucy in the sky with diamonds" }); console.log(image.data);}main();
Tracing Image Generation RequestsPortkey logs the generated image along with your whole request:
OpenAI’s Audio API converts speech to text using the Whisper model. It offers transcription in the original language and translation to English, supporting multiple file formats and languages with high accuracy.
OpenAI’s Text to Speech (TTS) API converts written text into natural-sounding audio using six distinct voices. It supports multiple languages, streaming capabilities, and various audio formats for different use cases.
Copy
Ask AI
from pathlib import Pathspeech_file_path = Path(__file__).parent / "speech.mp3"response = portkey.audio.speech.create( model="tts-1", voice="alloy", input="Today is a wonderful day to build something people love!")with open(speech_file_path, "wb") as f: f.write(response.content)
OpenAI’s Realtime API enables dynamic, low-latency conversations combining text, voice, and function calling capabilities. Built on GPT-4o models optimized for realtime interactions, it supports both WebRTC for client-side applications and WebSockets for server-side implementations.Portkey enhances OpenAI’s Realtime API with production-ready features:
Complete request/response logging for realtime streams
Cost tracking and budget management for streaming sessions
Multi-modal conversation monitoring
Session-based analytics and debugging
The API bridges the gap between traditional request-response patterns and interactive, real-time AI experiences, with Portkey adding the reliability and observability needed for production deployments. Developers can access this functionality through two model variants:
Portkey allows you to track user IDs passed with the user parameter in OpenAI requests, enabling you to monitor user-level costs, requests, and more:
Copy
Ask AI
response = portkey.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Say this is a test"}], user="user_123456")
When you include the user parameter in your requests, Portkey logs will display the associated user ID, as shown in the image below:In addition to the user parameter, Portkey allows you to send arbitrary custom metadata with your requests. This powerful feature enables you to associate additional context or information with each request, which can be useful for analysis, debugging, or other custom use cases.
Here’s a simplified version of how to use Portkey’s Gateway Configuration:
1
Create a Gateway Configuration
You can create a Gateway configuration using the Portkey Config Dashboard or by writing a JSON configuration in your code. In this example, requests are routed based on the user’s subscription plan (paid or free).
When a user makes a request, it will pass through Portkey’s AI Gateway. Based on the configuration, the Gateway routes the request according to the user’s metadata.
3
Set Up the Portkey Client
Pass the Gateway configuration to your Portkey client. You can either use the config object or the Config ID from Portkey’s hosted version.
Copy
Ask AI
from portkey_ai import Portkeyportkey = Portkey( api_key="PORTKEY_API_KEY", provider="@PROVIDER", config=portkey_config)
That’s it! Portkey seamlessly allows you to make your AI app more robust using built-in gateway features. Learn more about advanced gateway features:
Portkey’s AI gateway enables you to enforce input/output checks on requests by applying custom hooks before and after processing. Protect your user’s/company’s data by using PII guardrails and many more available on Portkey Guardrails:
Organization management is particularly useful if you belong to multiple organizations or are accessing projects through a legacy OpenAI user API key. Specifying the organization and project IDs also helps you maintain better control over your access rules, usage, and costs.In Portkey, you can add your OpenAI Org & Project details by Using Virtual Keys, Using Configs, or While Making a Request.
Using Virtual Keys
When selecting OpenAI from the Virtual Key dropdown menu while creating a virtual key, Portkey displays optional fields for the organization ID and project ID alongside the API key field.
Portkey takes budget management a step further than OpenAI. While OpenAI allows setting budget limits per project, Portkey enables you to set budget limits for each virtual key you create. For more information on budget limits, refer to this documentation
Using Configs
You can also specify the organization and project details in your request config, either at the root level or within a specific target.
Medical images: Vision models are not suitable for interpreting specialized medical images like CT scans and shouldn’t be used for medical advice.
Non-English: The models may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
Small text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The models may misinterpret rotated / upside-down text or images.
Visual elements: The models may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial reasoning: The models struggle with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The models may generate incorrect descriptions or captions in certain scenarios.
Image shape: The models struggle with panoramic and fisheye images.
Metadata and resizing: The models do not process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
CAPTCHAS: For safety reasons, CAPTCHA submissions are blocked by OpenAI.
Can I use gpt-4o or other chat models to generate images?
No, you can use dall-e-3 to generate images and gpt-4o and other chat models to understand images.
What type of files can I upload for vision requests?
OpenAI currently supports PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif).
For vision requests, Iis there a limit to the size of the image I can upload?
OpenAI currently restricts image uploads to 20MB per image.
How do rate limits work for vision requests?
OpenAI processes images at the token level, so each image that’s processed counts towards your tokens per minute (TPM) limit. See how OpenAI calculates costs here for details on the formula used to determine token count per image.
The cutoff date for V3 embedding models (text-embedding-3-large & text-embedding-3-small) is September 2021 - so they do not know about the most recent events.
OpenAI Prompt caches are not shared between organizations. Only members of the same organization can access caches of identical prompts.
Does Prompt Caching affect output token generation or the final response of the API?
Prompt Caching does not influence the generation of output tokens or the final response provided by the API. Regardless of whether caching is used, the output generated will be identical. This is because only the prompt itself is cached, while the actual response is computed anew each time based on the cached prompt.
Is there a way to manually clear the cache?
Manual cache clearing is not currently available. Prompts that have not been encountered recently are automatically cleared from the cache. Typical cache evictions occur after 5-10 minutes of inactivity, though sometimes lasting up to a maximum of one hour during off-peak periods.
Will I be expected to pay extra for writing to Prompt Caching?
No. Caching happens automatically, with no explicit action needed or extra cost paid to use the caching feature.
Do cached prompts contribute to TPM rate limits?
Yes, as caching does not affect rate limits.
Is discounting for Prompt Caching available on Scale Tier and the Batch API?
Discounting for Prompt Caching is not available on the Batch API but is available on Scale Tier. With Scale Tier, any tokens that are spilled over to the shared API will also be eligible for caching.
Does Prompt Caching work on Zero Data Retention requests?
Yes, Prompt Caching is compliant with existing Zero Data Retention policies.
What's the difference between DALL·E 2 and DALL·E 3?
DALL·E 3 offers higher quality images and enhanced capabilities, but only supports image generation. DALL·E 2 supports all three capabilities: generation, editing, and variations.
How long do the generated image URLs last?
Generated image URLs expire after one hour. Download or process the images before expiration.
What are the size requirements for uploading images?
Images must be square PNG files under 4MB. For editing features, both the image and mask must have identical dimensions.
Can I disable DALL·E 3's automatic prompt enhancement?
While you can’t completely disable it, you can add “I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:” to your prompt.
How many images can I generate per request?
DALL·E 3 supports 1 image per request (use parallel requests for more), while DALL·E 2 supports up to 10 images per request.
What image formats are supported?
The API requires PNG format for all image uploads and manipulations. Generated images can be returned as either a URL or Base64 data.
How does image editing (inpainting) work?
Available only in DALL·E 2, inpainting requires both an original image and a mask. The transparent areas of the mask indicate where the image should be edited, and your prompt should describe the complete new image, not just the edited area.
The API supports mp3, mp4, mpeg, mpga, m4a, wav, and webm formats, with a maximum file size of 25 MB.
Can I translate audio to languages other than English?
No, currently the translation API only supports output in English, regardless of the input language.
How do I handle audio files longer than 25 MB?
You’ll need to either compress the audio file or split it into smaller chunks. Tools like PyDub can help split audio files while avoiding mid-sentence breaks.
Does the API support all languages equally well?
While the model was trained on 98 languages, only languages with less than 50% word error rate are officially supported. Other languages may work but with lower accuracy.
Can I get timestamps in the transcription?
Yes, using the timestamp_granularities parameter, you can get timestamps at the segment level, word level, or both.
How can I improve transcription accuracy for specific terms?
You can use the prompt parameter to provide context or correct spellings of specific terms, or use post-processing with GPT-4 for more extensive corrections.
What's the difference between transcription and translation?
Transcription provides output in the original language, while translation always converts the audio to English text.
What are the differences between TTS-1 and TTS-1-HD models?
TTS-1 offers lower latency for real-time applications but may include more static. TTS-1-HD provides higher quality audio but with increased generation time.
Which audio formats are supported?
The API supports multiple formats: MP3 (default), Opus (for streaming), AAC (for mobile), FLAC (lossless), WAV (uncompressed), and PCM (raw 24kHz samples).
Can I create or clone custom voices?
No, the API only supports the six built-in voices (alloy, echo, fable, onyx, nova, and shimmer). Custom voice creation is not available.
How well does it support non-English languages?
While the voices are optimized for English, the API supports multiple languages with varying effectiveness. Performance quality may vary by language.
Can I control the emotional tone or style of the speech?
There’s no direct mechanism to control emotional output. While capitalization and grammar might influence the output, results are inconsistent.
Is real-time streaming supported?
Yes, the API supports real-time audio streaming using chunk transfer encoding, allowing audio playback before complete file generation.
Do I need to disclose that the audio is AI-generated?
Yes, OpenAI’s usage policies require clear disclosure to end users that they are hearing AI-generated voices, not human ones.