v1.75.5-stable - Redis latency improvements

August 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.75.5-stable

pip install litellm
pip install litellm==1.75.5.post2

Key Highlights

Redis - Latency Improvements - Reduces P99 latency by 50% with Redis enabled.
Responses API Session Management - Support for managing responses API sessions with images.
Oracle Cloud Infrastructure - New LLM provider for calling models on Oracle Cloud Infrastructure.
Digital Ocean's Gradient AI - New LLM provider for calling models on Digital Ocean's Gradient AI platform.

Risk of Upgrade

If you build the proxy from the pip package, you should hold off on upgrading. This version makes prisma migrate deploy our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual prisma generate step.

Users of our Docker image, are not affected by this change.

Redis Latency Improvements

This release adds in-memory caching for Redis requests, enabling faster response times in high-traffic. Now, LiteLLM instances will check their in-memory cache for a cache hit, before checking Redis. This reduces caching-related latency from 100ms for LLM API calls to sub-1ms, on cache hits.

Responses API Session Management w/ Images

LiteLLM now supports session management for Responses API requests with images. This is great for use-cases like chatbots, that are using the Responses API to track the state of a conversation. LiteLLM session management works across ALL LLM API's (including Anthropic, Bedrock, OpenAI, etc). LiteLLM session management works by storing the request and response content in an s3 bucket, you can specify.

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Bedrock	`bedrock/us.anthropic.claude-opus-4-1-20250805-v1:0`	200k	$15	$75
Bedrock	`bedrock/openai.gpt-oss-20b-1:0`	200k	0.07	0.3
Bedrock	`bedrock/openai.gpt-oss-120b-1:0`	200k	0.15	0.6
Fireworks AI	`fireworks_ai/accounts/fireworks/models/glm-4p5`	128k	0.55	2.19
Fireworks AI	`fireworks_ai/accounts/fireworks/models/glm-4p5-air`	128k	0.22	0.88
Fireworks AI	`fireworks_ai/accounts/fireworks/models/gpt-oss-120b`	131072	0.15	0.6
Fireworks AI	`fireworks_ai/accounts/fireworks/models/gpt-oss-20b`	131072	0.05	0.2
Groq	`groq/openai/gpt-oss-20b`	131072	0.1	0.5
Groq	`groq/openai/gpt-oss-120b`	131072	0.15	0.75
OpenAI	`openai/gpt-5`	400k	1.25	10
OpenAI	`openai/gpt-5-2025-08-07`	400k	1.25	10
OpenAI	`openai/gpt-5-mini`	400k	0.25	2
OpenAI	`openai/gpt-5-mini-2025-08-07`	400k	0.25	2
OpenAI	`openai/gpt-5-nano`	400k	0.05	0.4
OpenAI	`openai/gpt-5-nano-2025-08-07`	400k	0.05	0.4
OpenAI	`openai/gpt-5-chat`	400k	1.25	10
OpenAI	`openai/gpt-5-chat-latest`	400k	1.25	10
Azure	`azure/gpt-5`	400k	1.25	10
Azure	`azure/gpt-5-2025-08-07`	400k	1.25	10
Azure	`azure/gpt-5-mini`	400k	0.25	2
Azure	`azure/gpt-5-mini-2025-08-07`	400k	0.25	2
Azure	`azure/gpt-5-nano-2025-08-07`	400k	0.05	0.4
Azure	`azure/gpt-5-nano`	400k	0.05	0.4
Azure	`azure/gpt-5-chat`	400k	1.25	10
Azure	`azure/gpt-5-chat-latest`	400k	1.25	10

Features

OCI
- New LLM provider - PR #13206
JinaAI
- support multimodal embedding models - PR #13181
GPT-5 (OpenAI/Azure)
- Support drop_params for temperature - PR #13390
- Map max_tokens to max_completion_tokens - PR #13390
Anthropic
- Add claude-opus-4-1 on model cost map - PR #13384
OpenRouter
- Add gpt-oss to model cost map - PR #13442
Cerebras
- Add gpt-oss to model cost map - PR #13442
Azure
- Support drop params for ‘temperature’ on o-series models - PR #13353
GradientAI
- New LLM Provider - PR #12169

Bugs

OpenAI
- Add ‘service_tier’ and ‘safety_identifier’ as supported responses api params - PR #13258
- Correct pricing for web search on 4o-mini - PR #13269
Mistral
- Handle $id and $schema fields when calling mistral - PR #13389

LLM API Endpoints

Features

/responses
- Responses API Session Handling w/ support for images - PR #13347
- failed if input containing ResponseReasoningItem - PR #13465
- Support custom tools - PR #13418

Bugs

/chat/completions
- Fix completion_token_details usage object missing ‘text’ tokens - PR #13234
- (SDK) handle tool being a pydantic object - PR #13274
- include cost in streaming usage object - PR #13418
- Exclude none fields on /chat/completion - allows usage with n8n - PR #13320
/responses
- Transform function call in response for non-openai models (gemini/anthropic) - PR #13260
- Fix unsupported operand error with model groups - PR #13293
- Responses api session management for streaming responses - PR #13396
/v1/messages
- Added litellm claude code count tokens - PR #13261
/vector_stores
- Fix create/search vector store errors - PR #13285

MCP Gateway

Features

Add route check for internal users - PR #13350
MCP Guardrails - docs - PR #13392

Bugs

Fix auth on UI for bearer token servers - PR #13312
allow access group on mcp tool retrieval - PR #13425

Management Endpoints / UI

Features

Teams
- Add team deletion check for teams with keys - PR #12953
Models
- Add ability to set model alias per key/team - PR #13276
- New button to reload model pricing from model cost map - PR #13464, PR #13470
Keys
- Make ‘team’ field required when creating service account keys - PR #13302
- Gray out key-based logging settings for non-enterprise users - prevents confusion on if ‘logging’ all up is supported - PR #13431
Navbar
- Add logo customization for LiteLLM admin UI - PR #12958
Logs
- Add token breakdowns on logs + session page - PR #13357
Usage
- Ensure Usage Page loads after the DB has large entries - PR #13400
Test Key Page
- allow uploading images for /chat/completions and /responses - PR #13445
MCP
- Add auth tokens to local storage auth - PR #13473

Bugs

Custom Root Path
- Fix login route when SSO is enabled - PR #13267
Customers/End-users
- Allow calling /v1/models when end user over budget - allows model listing to work on OpenWebUI when customer over budget - PR #13320
Teams
- Remove user - team membership, when user removed from team - PR #13433
Errors
- Bubble up network errors to user for Logging and Alerts page - PR #13427
Model Hub
- Show pricing for azure models, when base model is set - PR #13418

Logging / Guardrail Integrations

Features

Bedrock Guardrails
- Redacted sensitive information in bedrock guardrails error message - PR #13356
Standard Logging Payload
- Fix ‘can’t register atextexit’ bug - PR #13436

Bugs

Braintrust
- Allow setting of braintrust callback base url - PR #13368
OTEL
- Track pre_call hook latency - PR #13362

Performance / Loadbalancing / Reliability improvements

Features

Team-BYOK models
- Add wildcard model support - PR #13278
Caching
- GCP IAM auth support for caching - PR #13275
Latency
- reduce p99 latency w/ redis enabled by 50% - only updates model usage if tpm/rpm limits set - PR #13362

General Proxy Improvements

Features

Models
- Support /v1/models/{model_id} retrieval - PR #13268
Multi-instance
- Ensure disable_llm_api_endpoints works - PR #13278
Logs
- Add apscheduler log suppress - PR #13299
Helm
- Add labels to migrations job template - PR #13343 s/o @unique-jakub

Bugs

Non-root image
- Fix non-root image for migration - PR #13379
Get Routes
- Load get routes when using fastapi-offline - PR #13466
Health checks
- Generate unique trace IDs for Langfuse health checks - PR #13468
Swagger
- Allow using Swagger for /chat/completions - PR #13469
Auth
- Fix JWTs access not working with model access groups - PR #13474

New Contributors

@bbartels made their first contribution in https://github.com/BerriAI/litellm/pull/13244
@breno-aumo made their first contribution in https://github.com/BerriAI/litellm/pull/13206
@pascalwhoop made their first contribution in https://github.com/BerriAI/litellm/pull/13122
@ZPerling made their first contribution in https://github.com/BerriAI/litellm/pull/13045
@zjx20 made their first contribution in https://github.com/BerriAI/litellm/pull/13181
@edwarddamato made their first contribution in https://github.com/BerriAI/litellm/pull/13368
@msannan2 made their first contribution in https://github.com/BerriAI/litellm/pull/12169

Deploy this version​

Key Highlights​

Risk of Upgrade​

Redis Latency Improvements​

Responses API Session Management w/ Images​

New Models / Updated Models​

New Model Support​

Features​

Bugs​

LLM API Endpoints​

Features​

Bugs​

MCP Gateway​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

Logging / Guardrail Integrations​

Features​

Bugs​

Performance / Loadbalancing / Reliability improvements​

Features​

General Proxy Improvements​

Features​

Bugs​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

Risk of Upgrade

Redis Latency Improvements

Responses API Session Management w/ Images

New Models / Updated Models

New Model Support

Features

Bugs

LLM API Endpoints

Features

Bugs

MCP Gateway

Features

Bugs

Management Endpoints / UI

Features

Bugs

Logging / Guardrail Integrations

Features

Bugs

Performance / Loadbalancing / Reliability improvements

Features

General Proxy Improvements

Features

Bugs

New Contributors

Full Changelog