Overview
The Multi-Model feature provides access to 50+ AI models across 7 major providers. This comprehensive model library includes various model sizes, capabilities, and price points, enabling organisations to select the optimal model for each use case while maintaining API consistency.Key Capabilities
50+ Production Models
From lightweight to state-of-the-art models across all AI providers.
Automatic Model Validation
Built-in controls ensure only supported models are used ensuring consistent AI output quality.
Model-Specific Optimisations
Automatic parameter adjustments based on model capabilities to ensure optimal performance.
Transparent Pricing
Real-time cost calculation and monitoring for all supported models in AI Gateway.
Business Benefits
1. Best of Breed Model Selection
-
Task-Optimised Performance
Choose the ideal model for each specific use case — GPT-4o for complex reasoning, Claude for long-context analysis, Gemini for multi-modal tasks etc.. -
Cost-Performance Optimisation
Select cost effective models for simple tasks (e.g., GPT-4o-mini, Claude Haiku etc.) and premium models for complex operations. -
Competitive Advantage
Leverage unique capabilities of different models to outperform competitors using single model approaches. -
Innovation Velocity
Immediately access new models as they are released without infrastructure changes.
2. Risk Mitigation & Reliability
-
Model Diversification
Avoid dependency on a single model’s availability, performance or pricing changes. -
Automatic Failover
Seamlessly switch to alternative models during outages or degraded performance. -
Compliance Flexibility
Use region specific or compliance certified models (Azure AI, AWS Bedrock, Google AI etc.) for regulated workloads. - Quality Assurance A/B test different models to ensure consistent quality across providers.
3. Cost Management & Optimisation
-
Dynamic Cost Control
Route requests to cheaper models based on complexity and budget constraints. -
Volume Discounts
Leverage pricing tiers across multiple providers simultaneously. -
Budget Allocation
Set model specific budgets and automatically switch when limits are reached. -
ROI Maximisation
Use premium models only where their advanced capabilities justify the cost.
4. Enterprise Scalability
-
Load Distribution
Distribute high volume workloads across multiple models to avoid rate limits. -
Geographic Optimisation
Use region specific models for lower latency and data residency compliance. -
Capacity Management
Access combined capacity of all providers during peak demand. -
Performance Benchmarking
Compare model performance in production with real workloads.
Supported Models by Provider
OpenAI Models (15 Models)
Model Name | Version | Context | Strengths | Use Cases |
---|---|---|---|---|
GPT-4o | gpt-4o | 128K | Multimodal understanding, complex reasoning, code generation | Complex analysis, creative tasks, multimodal applications |
gpt-4o-2024-11-20 | 128K | Latest November 2024 version | Production workloads | |
gpt-4o-2024-08-06 | 128K | August 2024 version | Stable deployments | |
gpt-4o-2024-05-13 | 128K | May 2024 version | Legacy compatibility | |
GPT-4o-mini | gpt-4o-mini | 128K | Fast responses, cost-effective, good for simple tasks | Chatbots, simple queries, high-volume processing |
gpt-4o-mini-2024-07-18 | 128K | July 2024 version | Cost-optimised workloads | |
GPT-4 Turbo | gpt-4-turbo | 128K | Enhanced GPT-4 with vision | Document analysis, complex reasoning with vision |
gpt-4-turbo-2024-04-09 | 128K | April 2024 version | Stable vision applications | |
gpt-4-turbo-preview | 128K | Preview version | Early access features | |
GPT-4 Classic | gpt-4 | 8K | Original GPT-4 model | Proven reliability for production workloads |
gpt-4-0613 | 8K | June 2023 stable version | Stable production deployments | |
gpt-4-0314 | 8K | March 2023 version | Legacy compatibility | |
GPT-3.5 Turbo | gpt-3.5-turbo | 16K | Fast, cost-effective model | High-speed responses, cost-sensitive applications |
gpt-3.5-turbo-0125 | 16K | January 2024 version | Latest GPT-3.5 features | |
gpt-3.5-turbo-1106 | 16K | November 2023 version | Stable GPT-3.5 deployment | |
O1 Series | o1-preview | Standard | Complex problem-solving, mathematical reasoning | Scientific research, complex analysis |
o1-mini | Standard | Faster reasoning at lower cost | Code debugging, logical problems |
Anthropic Claude Models (8 Models)
Model Name | Version | Context | Strengths | Use Cases |
---|---|---|---|---|
Claude 3.5 Sonnet | claude-3-5-sonnet-20241022 | 200K | Best balance of intelligence and speed, excellent coding | Code generation, complex analysis, creative writing |
claude-3-5-sonnet-20240620 | 200K | June 2024 version | Stable Claude 3.5 deployment | |
Claude 3.5 Haiku | claude-3-5-haiku-20241022 | 200K | Lightning-fast responses, cost-effective | Real-time applications, high-volume processing |
Claude 3 Opus | claude-3-opus-20240229 | 200K | Complex reasoning, nuanced understanding | Research, complex document analysis |
Claude 3 Sonnet | claude-3-sonnet-20240229 | 200K | Balanced Claude 3 model | General purpose, good price-performance |
Claude 3 Haiku | claude-3-haiku-20240307 | 200K | Fastest Claude 3 model | High-speed, cost-sensitive applications |
Amazon Bedrock Models (18 Models)
Model Name | Version | Context | Strengths | Use Cases |
---|---|---|---|---|
Anthropic Claude on Bedrock | anthropic.claude-3-5-sonnet-20241022-v2:0 | 200K | AWS integration, enterprise security, compliance | Enterprise Claude deployments |
anthropic.claude-3-5-sonnet-20240620-v1:0 | 200K | AWS-native Claude 3.5 | AWS-integrated applications | |
anthropic.claude-3-5-haiku-20241022-v1:0 | 200K | Fast Claude with AWS benefits | High-speed AWS applications | |
anthropic.claude-3-opus-20240229-v1:0 | 200K | Most capable Claude with AWS | Complex AWS workloads | |
anthropic.claude-3-sonnet-20240229-v1:0 | 200K | Balanced Claude with AWS | General AWS applications | |
anthropic.claude-3-haiku-20240307-v1:0 | 200K | Fast Claude with AWS | Cost-effective AWS applications | |
Meta Llama 3.1 | meta.llama3-1-70b-instruct-v1:0 | 8K | Large Llama 3.1 model, open-source heritage | Custom deployments, fine-tuning base |
meta.llama3-1-8b-instruct-v1:0 | 8K | Efficient Llama 3.1 model | Lightweight applications | |
Meta Llama 3 | meta.llama3-70b-instruct-v1:0 | 8K | Large Llama 3 model | High-performance open-source needs |
meta.llama3-8b-instruct-v1:0 | 8K | Small Llama 3 model | Cost-effective open-source | |
Amazon Titan | amazon.titan-text-premier-v1:0 | 8K | Premium Titan model, AWS-native | AWS-integrated applications, Amazon-specific tasks |
amazon.titan-text-express-v1 | 8K | Fast Titan model | High-speed AWS applications | |
Cohere Command | cohere.command-r-plus-v1:0 | Standard | Advanced Command model, retrieval-augmented generation | RAG applications, document search |
cohere.command-r-v1:0 | Standard | Standard Command model | Enterprise search applications | |
Mistral | mistral.mistral-large-2402-v1:0 | 32K | Large Mistral model, European model | Multilingual applications, European compliance |
mistral.mixtral-8x7b-instruct-v0:1 | 32K | MoE architecture, efficient inference | Efficient multilingual processing |
Azure OpenAI Models
Feature | Description | Use Cases |
---|---|---|
Model Availability | Same models as OpenAI with Azure enterprise capabilities | Enterprise applications, Microsoft ecosystem integration |
Regional Deployments | Deploy models in specific Azure regions | Data residency compliance, low latency applications |
Private Endpoints | Multi-authentication support for secure enterprise access | Secure enterprise deployments |
Enterprise SLAs | Guaranteed uptime and support | Mission critical applications |
Content Filtering | Built-in content moderation | Compliance and safety requirements |
Azure AI Inference Models
Model Type | Description | Use Cases |
---|---|---|
Custom Model Deployments | Deploy any model from Azure AI catalog | Custom ML pipelines, specialised applications |
Fine-tuned Models | Deploy your custom fine-tuned models | Domain specific applications |
Open-Source Models | Llama, Mistral, Falcon, and more | Open source AI development |
Specialised Models | Domain specific models for healthcare, finance, etc. | Industry specific applications |
Google AI Models (6+ Models)
Model Name | Version | Context | Strengths | Use Cases |
---|---|---|---|---|
Gemini 1.5 Pro | gemini-1.5-pro | 2M | Most capable Gemini model, massive context, multimodal, video understanding | Long document analysis, video processing |
Gemini 1.5 Flash | gemini-1.5-flash | 1M | Fast, efficient model, speed, cost-effectiveness, multimodal | Real time applications, high volume processing |
Gemini 1.0 Pro | gemini-1.0-pro | Standard | Previous generation Pro model | Stable, proven performance |
Gemini 1.0 Pro Vision | gemini-1.0-pro-vision | Standard | Vision enabled version | Image analysis applications |
Gemini Experimental | gemini-exp-1121 | Standard | November 2024 experimental | Testing cutting edge capabilities |
gemini-exp-1114 | Standard | November 2024 experimental | Early access to new features |
Google Vertex AI Models
Feature | Description | Use Cases |
---|---|---|
Model Availability | Same Gemini models as Google AI with enterprise features | GCP native applications, enterprise deployments |
Private Endpoints | VPC Service Controls | Secure enterprise deployments |
Regional Deployments | Data residency compliance | Compliance requirements |
Model Garden | Access to 100+ open source models | Open source AI development |
AutoML Integration | Custom model training | Custom ML model development |
Model Selection Guide
By Use Case
Complex Reasoning & Analysis
High Volume Processing
Long Context Applications
Multimodal Applications
Dynamic Model Selection (Example Script)
Cost Optimised Model Routing (Example Script)
A/B Testing Different Models
Model Comparison Matrix
Provider | Model | Context | Speed | Cost | Best For |
---|---|---|---|---|---|
OpenAI | gpt-4o | 128K | Medium | High | Complex reasoning, multimodal |
gpt-4o-mini | 128K | Fast | Low | Simple tasks, high volume | |
gpt-4-turbo | 128K | Medium | High | Vision tasks, analysis | |
gpt-3.5-turbo | 16K | Fast | Low | Quick responses, chatbots | |
o1-preview | Standard | Slow | High | Complex reasoning | |
Anthropic | claude-3-5-sonnet | 200K | Fast | Medium | Coding, analysis |
claude-3-5-haiku | 200K | Fastest | Very Low | Real-time apps | |
claude-3-opus | 200K | Slow | Very High | Complex research | |
gemini-1.5-pro | 2M | Medium | Medium | Massive documents | |
gemini-1.5-flash | 1M | Fast | Low | High-speed processing | |
Bedrock | llama3-1-70b | 8K | Medium | Low | Open-source needs |
titan-premier | 8K | Fast | Low | AWS integration | |
mistral-large | 32K | Medium | Medium | European compliance |