Guardrails run at the Gateway layer β provider-agnostic, between your app and any upstream LLM.
Supported Endpoints
| Endpoint | Input | Output |
|---|
/v1/chat/completions | β | β |
/v1/completions | β | β |
/v1/embeddings | β | β |
/v1/messages (Anthropic) | β | β |
/v1/responses | β | β |
/v1/prompts/{promptId}/completions | β | β |
Endpoints That Donβt Support Guardrails
Guardrails are only evaluated on LLM inference endpoints where Portkey can extract a text input and/or text output.
The following Inference API endpoints do not run guardrails:
- Assistants API (all endpoints):
/v1/assistants/*, /v1/threads/*
- Audio (all endpoints):
/v1/audio/*
- Images (all endpoints):
/v1/images/*
- Files (all endpoints):
/v1/files/*
- Batch (all endpoints):
/v1/batches/*
- Fine-tuning (all endpoints):
/v1/fine_tuning/*
- Moderations:
/v1/moderations
- Models:
/v1/models
- Prompt rendering (no model call):
/v1/prompts/{promptId}/render
- Response retrieval (no model call):
GET /v1/responses/* (guardrails apply only to POST /v1/responses)
Sync vs. Async
| Mode | async | Behavior | Latency |
|---|
| Async (default) | true | Runs in parallel with the LLM call. Results are logged only β no effect on the response. | None |
| Sync | false | Runs before forwarding the request (input) or before returning the response (output). Can deny or modify. | Adds guardrail check latency |
Use async to observe and log without blocking. Use sync when the guardrail result must gate the request.
With async=false, Portkey returns 246 (failed, but allowed) or 446 (failed, denied) instead of the standard provider status codes. See Guardrail behavior on the gateway.
Streaming
| Guardrail Type | Streaming |
|---|
input_guardrails | Supported β evaluated before the stream starts |
output_guardrails | Not supported β requires the full response |
With input guardrails on a streaming request: if the check passes, the stream proceeds normally. If it fails with deny=true, the stream is blocked with 446 before any tokens are sent.
To receive hook_results inside streaming chunks, set:
x-portkey-strict-open-ai-compliance: false
hook_results appear in the first chunk for /chat/completions, or as a separate hook_results event for /messages.
For Anthropic /messages streaming, hook_results arenβt accessible via the Anthropic SDK. Use cURL or raw HTTP requests.
See Streaming Hook Results for SDK examples.
What Gets Evaluated
Input guardrails evaluate the last message in the request. Output guardrails evaluate the generated text in the response.
Guardrails are text-only. Image inputs (base64 or URLs in multimodal messages) are not evaluated. Only the text portions of a message are passed to checks.
| Scenario | Evaluated by |
|---|
| Model deciding to call a tool | Output guardrails |
| Tool result sent back to the model | Input guardrails (on the follow-up request) |
| Function definitions in the system prompt | Input guardrails |
Tool call arguments (JSON) are treated as text β checks like Regex Match, Contains, and LLM-based detectors (e.g., Prompt Injection) all process this content.
Quick Reference
| Capability | Supported |
|---|
/chat/completions | β |
/completions | β |
/embeddings | β (input only) |
/messages (Anthropic) | β |
/responses | β |
/prompts/{promptId}/completions | β |
| All providers & models | β |
| Input guardrails | β |
| Output guardrails | β |
| Async execution | β (default) |
| Sync execution | β |
| Streaming β input | β |
| Streaming β output | β |
| Image inputs | β |
| Text in multimodal messages | β |
| Tool call text | β |