Skip to main content

v1.75.5-stable - Redis latency improvements

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.75.5-stable

Key Highlights​

  • Redis - Latency Improvements - Reduces P99 latency by 50% with Redis enabled.
  • Responses API Session Management - Support for managing responses API sessions with images.
  • Oracle Cloud Infrastructure - New LLM provider for calling models on Oracle Cloud Infrastructure.
  • Digital Ocean's Gradient AI - New LLM provider for calling models on Digital Ocean's Gradient AI platform.

Risk of Upgrade​

If you build the proxy from the pip package, you should hold off on upgrading. This version makes prisma migrate deploy our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual prisma generate step.

Users of our Docker image, are not affected by this change.


Redis Latency Improvements​


This release adds in-memory caching for Redis requests, enabling faster response times in high-traffic. Now, LiteLLM instances will check their in-memory cache for a cache hit, before checking Redis. This reduces caching-related latency from 100ms for LLM API calls to sub-1ms, on cache hits.


Responses API Session Management w/ Images​


LiteLLM now supports session management for Responses API requests with images. This is great for use-cases like chatbots, that are using the Responses API to track the state of a conversation. LiteLLM session management works across ALL LLM API's (including Anthropic, Bedrock, OpenAI, etc). LiteLLM session management works by storing the request and response content in an s3 bucket, you can specify.


New Models / Updated Models​

New Model Support​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)
Bedrockbedrock/us.anthropic.claude-opus-4-1-20250805-v1:0200k$15$75
Bedrockbedrock/openai.gpt-oss-20b-1:0200k0.070.3
Bedrockbedrock/openai.gpt-oss-120b-1:0200k0.150.6
Fireworks AIfireworks_ai/accounts/fireworks/models/glm-4p5128k0.552.19
Fireworks AIfireworks_ai/accounts/fireworks/models/glm-4p5-air128k0.220.88
Fireworks AIfireworks_ai/accounts/fireworks/models/gpt-oss-120b1310720.150.6
Fireworks AIfireworks_ai/accounts/fireworks/models/gpt-oss-20b1310720.050.2
Groqgroq/openai/gpt-oss-20b1310720.10.5
Groqgroq/openai/gpt-oss-120b1310720.150.75
OpenAIopenai/gpt-5400k1.2510
OpenAIopenai/gpt-5-2025-08-07400k1.2510
OpenAIopenai/gpt-5-mini400k0.252
OpenAIopenai/gpt-5-mini-2025-08-07400k0.252
OpenAIopenai/gpt-5-nano400k0.050.4
OpenAIopenai/gpt-5-nano-2025-08-07400k0.050.4
OpenAIopenai/gpt-5-chat400k1.2510
OpenAIopenai/gpt-5-chat-latest400k1.2510
Azureazure/gpt-5400k1.2510
Azureazure/gpt-5-2025-08-07400k1.2510
Azureazure/gpt-5-mini400k0.252
Azureazure/gpt-5-mini-2025-08-07400k0.252
Azureazure/gpt-5-nano-2025-08-07400k0.050.4
Azureazure/gpt-5-nano400k0.050.4
Azureazure/gpt-5-chat400k1.2510
Azureazure/gpt-5-chat-latest400k1.2510

Features​

Bugs​

  • OpenAI
    • Add ‘service_tier’ and ‘safety_identifier’ as supported responses api params - PR #13258
    • Correct pricing for web search on 4o-mini - PR #13269
  • Mistral
    • Handle $id and $schema fields when calling mistral - PR #13389

LLM API Endpoints​

Features​

  • /responses
    • Responses API Session Handling w/ support for images - PR #13347
    • failed if input containing ResponseReasoningItem - PR #13465
    • Support custom tools - PR #13418

Bugs​

  • /chat/completions
    • Fix completion_token_details usage object missing ‘text’ tokens - PR #13234
    • (SDK) handle tool being a pydantic object - PR #13274
    • include cost in streaming usage object - PR #13418
    • Exclude none fields on /chat/completion - allows usage with n8n - PR #13320
  • /responses
    • Transform function call in response for non-openai models (gemini/anthropic) - PR #13260
    • Fix unsupported operand error with model groups - PR #13293
    • Responses api session management for streaming responses - PR #13396
  • /v1/messages
    • Added litellm claude code count tokens - PR #13261
  • /vector_stores
    • Fix create/search vector store errors - PR #13285

MCP Gateway​

Features​

Bugs​

  • Fix auth on UI for bearer token servers - PR #13312
  • allow access group on mcp tool retrieval - PR #13425

Management Endpoints / UI​

Features​

  • Teams
    • Add team deletion check for teams with keys - PR #12953
  • Models
    • Add ability to set model alias per key/team - PR #13276
    • New button to reload model pricing from model cost map - PR #13464, PR #13470
  • Keys
    • Make ‘team’ field required when creating service account keys - PR #13302
    • Gray out key-based logging settings for non-enterprise users - prevents confusion on if ‘logging’ all up is supported - PR #13431
  • Navbar
    • Add logo customization for LiteLLM admin UI - PR #12958
  • Logs
    • Add token breakdowns on logs + session page - PR #13357
  • Usage
    • Ensure Usage Page loads after the DB has large entries - PR #13400
  • Test Key Page
    • allow uploading images for /chat/completions and /responses - PR #13445
  • MCP
    • Add auth tokens to local storage auth - PR #13473

Bugs​

  • Custom Root Path
    • Fix login route when SSO is enabled - PR #13267
  • Customers/End-users
    • Allow calling /v1/models when end user over budget - allows model listing to work on OpenWebUI when customer over budget - PR #13320
  • Teams
    • Remove user - team membership, when user removed from team - PR #13433
  • Errors
    • Bubble up network errors to user for Logging and Alerts page - PR #13427
  • Model Hub
    • Show pricing for azure models, when base model is set - PR #13418

Logging / Guardrail Integrations​

Features​

  • Bedrock Guardrails
    • Redacted sensitive information in bedrock guardrails error message - PR #13356
  • Standard Logging Payload
    • Fix ‘can’t register atextexit’ bug - PR #13436

Bugs​

  • Braintrust
    • Allow setting of braintrust callback base url - PR #13368
  • OTEL

Performance / Loadbalancing / Reliability improvements​

Features​

  • Team-BYOK models
  • Caching
    • GCP IAM auth support for caching - PR #13275
  • Latency
    • reduce p99 latency w/ redis enabled by 50% - only updates model usage if tpm/rpm limits set - PR #13362

General Proxy Improvements​

Features​

  • Models
    • Support /v1/models/{model_id} retrieval - PR #13268
  • Multi-instance
    • Ensure disable_llm_api_endpoints works - PR #13278
  • Logs
  • Helm

Bugs​

  • Non-root image
    • Fix non-root image for migration - PR #13379
  • Get Routes
    • Load get routes when using fastapi-offline - PR #13466
  • Health checks
    • Generate unique trace IDs for Langfuse health checks - PR #13468
  • Swagger
    • Allow using Swagger for /chat/completions - PR #13469
  • Auth
    • Fix JWTs access not working with model access groups - PR #13474

New Contributors​

Full Changelog​