Overview
Auto Caching represents a critical performance optimisation feature that stores AI model responses for reuse, eliminating redundant API calls to upstream providers. This caching system operates transparently across all 7 supported AI providers, ensuring consistent performance improvements regardless of the underlying model or provider.Automatic Response Caching
Zero-configuration caching for all non-streaming responses.
Automatic Cache Key Generation
Intelligent cache key generation based on request content.
Configurable TTL
Flexible time-to-live settings with 60 second default.
Provider Agnostic
Works seamlessly across all 7 AI providers.
Performance Metrics
Built-in cache hit/miss tracking and latency measurements.
Memory-Efficient Storage
Optimised in-memory cache with automatic expiration.
What Gets Cached
- Successful Responses Only: Only 2xx responses are cached
- Non-Streaming Content: Standard JSON responses (streaming excluded)
- Complete Request Context: Cache keys include full request body and path
- Provider-Specific Metadata: Model, provider, and latency information preserved
How Auto Caching Works
Cache Key Generation
The caching system generates unique cache keys by combining the request path with the validated request body, ensuring that identical requests receive cached responses while maintaining isolation between different queries.Cache Lifecycle Management
Write Path
- Request Reception: Incoming request processed by middleware stack
- Cache Lookup: Check for existing valid cache entry
- Cache Miss: Forward request to AI provider
- Response Storage: Store successful responses with TTL
- Client Return: Send response to client
Read Path
- Request Reception: Incoming request enters pipeline
- Key Generation: Create cache key from validated request
- Cache Hit: Retrieve stored response if valid
- Instant Return: Serve cached response with minimal latency
- Metrics Update: Record cache hit and time saved
Expiration Handling
- TTL Based Expiration: Entries expire after configured duration
- Lazy Cleanup: Expired entries removed on next access attempt
- Memory Protection: Prevents unbounded cache growth
- Graceful Degradation: Expired entries trigger fresh requests
Business Benefits
Cost Optimisation
Direct Cost Savings
- 70-95% Reduction in API Calls: Eliminate redundant requests to AI providers
- Token Usage Optimisation: Reuse responses without consuming additional tokens
- Bandwidth Savings: Reduce network transfer costs
- Provider Cost Reduction: Lower monthly bills from AI service providers
Example Cost Impact
Indirect Cost Benefits
- Reduced Infrastructure Needs: Lower compute requirements
- Decreased Operational Overhead: Fewer rate limit issues
- Improved Resource Utilisation: Better throughput per dollar spent
Performance Enhancement
Latency Reduction
- 95% Faster Response Times: Cache hits return in less than 5ms vs 500-2000ms
- Consistent Performance: Eliminate provider variability
- Predictable SLAs: Meet strict latency requirements
- Enhanced User Experience: Near-instant responses for common queries
Performance Metrics
Throughput Improvements
- 10x Higher Request Capacity: Handle more concurrent users
- Reduced Provider Dependencies: Less reliance on external services
- Smoother Traffic Patterns: Level out usage spikes
- Better Resource Allocation: CPU cycles for business logic
Operational Excellence
System Reliability
- Provider Outage Protection: Serve cached responses during downtime
- Rate Limit Mitigation: Reduce hitting provider limits
- Graceful Degradation: Fallback to cache when providers slow
- Improved Availability: Higher overall system uptime
Development Productivity
- Faster Testing Cycles: Instant responses during development
- Reduced Debugging Time: Consistent responses for testing
- Cost-Effective Development: No API costs for repeated tests
- Improved CI/CD Performance: Faster pipeline execution
Compliance and Governance
- Response Consistency: Identical responses for identical requests
- Audit Trail: Cache metrics for compliance reporting
- Data Residency: Responses stay within your infrastructure
- Security: No additional external data transmission
Scalability and Reliability
Horizontal Scaling
- Stateless Architecture: Each instance maintains its own cache
- Linear Performance: Add nodes for more cache capacity
- Geographic Distribution: Deploy caches close to users
- Load Balancing: Distribute cache hits across instances
Vertical Scaling
- Memory Optimisation: Efficient storage per cache entry
- Configurable Limits: Control maximum cache size
- Automatic Pruning: Remove least recently used entries
- Resource Management: Predictable memory consumption
High Availability
- Zero Single Points of Failure: Cache operates independently
- Automatic Failover: Seamless fallback to providers
- Self-Healing: Automatic cache rebuilding
- No Coordination Overhead: No distributed cache complexity
Use Cases
Enterprise Applications
Customer Support Automation
- Scenario: Chatbot handling repetitive customer queries
- Cache Benefit: 90% cache hit rate for FAQs
- Impact: 10x faster response times, 90% cost reduction
- Configuration: 5-minute TTL for support content
Documentation Assistant
- Scenario: AI-powered documentation search
- Cache Benefit: Consistent answers for documentation queries
- Impact: Instant responses for common questions
- Configuration: 30-minute TTL for stable content
Code Generation Platform
- Scenario: IDE plugin generating boilerplate code
- Cache Benefit: Reuse common code patterns
- Impact: Sub-second code suggestions
- Configuration: 1-hour TTL for code templates
Analytics Dashboard
- Scenario: AI-generated insights and summaries
- Cache Benefit: Cache computed analytics
- Impact: Instant dashboard loading
- Configuration: 5 minute TTL for near real-time data
Development Scenarios
API Testing
- Scenario: Automated testing of AI integrations
- Cache Benefit: Consistent test responses
- Impact: 100x faster test execution
- Configuration: Long TTL for deterministic testing
Load Testing
- Scenario: Performance testing with high request volumes
- Cache Benefit: Test infrastructure without provider limits
- Impact: Accurate performance baselines
- Configuration: Pre-warm cache with test data
Development Environment
- Scenario: Local development with AI features
- Cache Benefit: No API costs during development
- Impact: Faster iteration cycles
- Configuration: Extended TTL for development
Demo Environments
- Scenario: Product demonstrations and POCs
- Cache Benefit: Reliable, fast demos
- Impact: Impressive performance showcase
- Configuration: Pre-cached demo scenarios