Multi-Model

Overview

The Multi-Model feature provides access to 55+ AI models across 7 major providers. This comprehensive model library includes various model sizes, capabilities, and price points, enabling organisations to select the optimal model for each use case while maintaining API consistency.

Key Capabilities

55+ Production Models

From lightweight to state-of-the-art models across all AI providers.

Automatic Model Validation

Built-in controls ensure only supported models are used ensuring consistent AI output quality.

Model-Specific Optimisations

Automatic parameter adjustments based on model capabilities to ensure optimal performance.

Transparent Pricing

Real-time cost calculation and monitoring for all supported models in AI Gateway.

Business Benefits

1. Best of Breed Model Selection

Task-Optimised Performance
Choose the ideal model for each specific use case — GPT-4o for complex reasoning, Claude for long-context analysis, Gemini for multi-modal tasks etc..
Cost-Performance Optimisation
Select cost effective models for simple tasks (e.g., GPT-4o-mini, Claude Haiku etc.) and premium models for complex operations.
Competitive Advantage
Leverage unique capabilities of different models to outperform competitors using single model approaches.
Innovation Velocity
Immediately access new models as they are released without infrastructure changes.

2. Risk Mitigation & Reliability

Model Diversification
Avoid dependency on a single model’s availability, performance or pricing changes.
Automatic Failover
Seamlessly switch to alternative models during outages or degraded performance.
Compliance Flexibility
Use region specific or compliance certified models (Azure AI, AWS Bedrock, Google AI etc.) for regulated workloads.
Quality Assurance A/B test different models to ensure consistent quality across providers.

3. Cost Management & Optimisation

Dynamic Cost Control
Route requests to cheaper models based on complexity and budget constraints.
Volume Discounts
Leverage pricing tiers across multiple providers simultaneously.
Budget Allocation
Set model specific budgets and automatically switch when limits are reached.
ROI Maximisation
Use premium models only where their advanced capabilities justify the cost.

4. Enterprise Scalability

Load Distribution
Distribute high volume workloads across multiple models to avoid rate limits.
Geographic Optimisation
Use region specific models for lower latency and data residency compliance.
Capacity Management
Access combined capacity of all providers during peak demand.
Performance Benchmarking
Compare model performance in production with real workloads.

Supported Models by Provider

OpenAI Models (20 Models)

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
GPT-5 Chat (`gpt-5-chat-latest`)	Snapshot used in ChatGPT. Recommended for testing latest improvements in chat use cases.	Latest	Text, Image	128,000	16,384	Sep 30, 2024
GPT-5 (`gpt-5-2025-08-07`)	Flagship model for coding, reasoning, and agentic tasks across domains.	2025-08-07	Text, Image	400,000	128,000	Sep 30, 2024
GPT-5 Mini (`gpt-5-mini-2025-08-07`)	Faster, more cost-efficient GPT-5 variant for well-defined tasks and precise prompts.	2025-08-07	Text, Image	400,000	128,000	May 31, 2024
GPT-5 Nano (`gpt-5-nano-2025-08-07`)	Cheapest, fastest GPT-5 variant. Ideal for summarization and classification.	2025-08-07	Text, Image	400,000	128,000	May 31, 2024
GPT-4.1 (`gpt-4.1-2025-04-14`)	Excels at instruction following and tool use. Supports 1M token context with low latency.	2025-04-14	Text, Image	1,047,576	32,768	Jun 01, 2024
GPT-4.1 Mini (`gpt-4.1-mini-2025-04-14`)	Smaller, faster GPT-4.1 variant. Maintains broad capabilities with 1M token context.	2025-04-14	Text, Image	1,047,576	32,768	Jun 01, 2024
GPT-4.1 Nano (`gpt-4.1-nano-2025-04-14`)	Ultra-light GPT-4.1 variant for efficiency with 1M token context.	2025-04-14	Text, Image	1,047,576	32,768	Jun 01, 2024
GPT-4 Preview (`gpt-4-0125-preview`)	Research preview of GPT-4 Turbo, an older high-intelligence model.	2024-01-25	Text	128,000	4,096	Dec 01, 2023
GPT-4 Legacy (`gpt-4-0613`)	Older GPT-4 model, still available for compatibility.	2023-06-13	Text	8,192	8,192	Dec 01, 2023
GPT-4 Turbo (`gpt-4-turbo-2024-04-09`)	Cheaper, faster variant of GPT-4. Superseded by GPT-4o.	2024-04-09	Text, Image	128,000	4,096	Dec 01, 2023
GPT-4o (`gpt-4o-2024-05-13`)	Versatile, high-intelligence flagship model. Multimodal (text + image).	2024-05-13	Text, Image	128,000	4,096	Oct 01, 2023
GPT-4o (`gpt-4o-2024-08-06`)	Updated GPT-4o snapshot.	2024-08-06	Text, Image	128,000	16,384	Oct 01, 2023
GPT-4o (`gpt-4o-2024-11-20`)	Updated GPT-4o snapshot.	2024-11-20	Text, Image	128,000	16,384	Oct 01, 2023
GPT-4o Latest (`chatgpt-4o-latest`)	Points to the GPT-4o snapshot used in ChatGPT.	Rolling	Text, Image	128,000	16,384	Oct 01, 2023
GPT-4o Mini (`gpt-4o-mini-2024-07-18`)	Lightweight GPT-4o variant. Fast, affordable, and fine-tuning friendly.	2024-07-18	Text, Image	128,000	16,384	Oct 01, 2023
O1 (`o1-2024-12-17`)	RL-trained reasoning model. Thinks step-by-step before answering.	2024-12-17	Text, Image	200,000	100,000	Oct 01, 2023
O3 (`o3-2025-04-16`)	High-performance reasoning model for math, science, coding, and multimodal analysis.	2025-04-16	Text, Image	200,000	100,000	Jun 01, 2024
O3 Mini (`o3-mini-2025-01-31`)	Small reasoning model. Supports structured outputs, function calling, and batch API.	2025-01-31	Text	200,000	100,000	Oct 01, 2023
O4 Mini (`o4-mini-2025-04-16`)	Latest small o-series model. Optimized for fast reasoning, coding, and visual tasks.	2025-04-16	Text, Image	200,000	100,000	Jun 01, 2024
GPT-3.5 Turbo (`gpt-3.5-turbo-0125`)	Legacy GPT-3.5 model. Still supported, but GPT-4o Mini is recommended instead.	2024-01-25	Text	16,385	4,096	Sep 01, 2021

Anthropic Claude Models (7 Models)

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`)	Best model for complex agents and coding.	2025-09-29	Text, Image (Vision), Multilingual	200K / 1M (beta)	64,000	Reliable: Jan 2025 · Training data: Jul 2025
Claude Sonnet 4 (`claude-sonnet-4-20250514`)	High-performance model.	2025-05-14	Text, Image (Vision), Multilingual	200K / 1M (beta)	64,000	Reliable: Jan 2025 · Training data: Mar 2025
Claude Sonnet 3.7 (`claude-3-7-sonnet-20250219`, alias: `claude-3-7-sonnet-latest`)	High-performance model with early extended thinking.	2025-02-19	Text, Image (Vision), Multilingual	200K	64,000	Reliable: Oct 2024 · Training data: Nov 2024
Claude Opus 4.1 (`claude-opus-4-1-20250805`)	Exceptional model for specialized complex tasks.	2025-08-05	Text, Image (Vision), Multilingual	200K	32,000	Reliable: Jan 2025 · Training data: Mar 2025
Claude Opus 4 (`claude-opus-4-20250514`)	Previous flagship model.	2025-05-14	Text, Image (Vision), Multilingual	200K	32,000	Reliable: Jan 2025 · Training data: Mar 2025
Claude Haiku 3.5 (`claude-3-5-haiku-20241022`, alias: `claude-3-5-haiku-latest`)	Fastest Claude model.	2024-10-22	Text, Image (Vision), Multilingual	200K	8,192	Reliable: Jul 2024 · Training data: Jul 2024
Claude Haiku 3 (`claude-3-haiku-20240307`)	Compact model for near-instant responsiveness.	2024-03-07	Text, Image (Vision), Multilingual	200K	4,096	Reliable: 2023 · Training data: Aug 2023

Amazon Bedrock Models (18 Models)

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
Claude Sonnet 4.5 (`anthropic.claude-sonnet-4-5-20250929-v1:0`)	Latest Claude Sonnet reasoning/chat model.	2025-09-29	Text, Image	200K	–	–
Claude Sonnet 4 (`anthropic.claude-sonnet-4-20250514-v1:0`)	Advanced Claude Sonnet v4.	2025-05-14	Text, Image	200K	–	–
Claude Sonnet 3.7 (`anthropic.claude-3-7-sonnet-20250219-v1:0`)	Claude 3.7 Sonnet generation model.	2025-02-19	Text, Image	200K	–	–
Claude Sonnet 3.5 v2 (`anthropic.claude-3-5-sonnet-20241022-v2:0`)	Updated Claude 3.5 Sonnet.	2024-10-22	Text, Image	200K	–	–
Claude Sonnet 3.5 (`anthropic.claude-3-5-sonnet-20240620-v1:0`)	Standard Claude 3.5 Sonnet.	2024-06-20	Text, Image	200K	–	–
Claude Haiku 3 (`anthropic.claude-3-haiku-20240307-v1:0`)	Lightweight Claude model optimized for speed/cost.	2024-03-07	Text	48K	–	–
Claude Sonnet 3 (`anthropic.claude-3-sonnet-20240229-v1:0`)	Claude 3 Sonnet general-purpose model.	2024-02-29	Text, Image	28K	–	–
Nova Lite (`amazon.nova-lite-v1:0`)	Amazon Nova lightweight model.	2025	Text	300K	–	–
Nova Micro (`amazon.nova-micro-v1:0`)	Amazon Nova smallest variant.	2025	Text	128K	–	–
Nova Pro (`amazon.nova-pro-v1:0`)	Amazon Nova flagship model.	2025	Text	300K	–	–
Titan Text G1 – Express (`amazon.titan-text-express-v1`)	Balanced Titan LLM for text generation.	2023	Text	8K	–	–
Titan Text G1 – Lite (`amazon.titan-text-lite-v1`)	Lightweight Titan model.	2023	Text	4K	–	–
IBM Granite 3.2 Instruct 8B (`ibm-granite-3-2-8b-instruct`)	General-purpose instruct model.	2025	Text	–	–	–
IBM Granite 3.0 Instruct 8B (`granite-3-0-8b-instruct`)	Earlier instruct model (8B params).	2024	Text	–	–	–
IBM Granite 20B Code Instruct (`ibm-granite-20b-code-instruct-8k`)	Code-focused model (20B params).	2024	Text (Code)	8K	–	–
IBM Granite 8B Code Instruct (`ibm-granite-8b-code-instruct-128k`)	Code instruct model with extended context.	2024	Text (Code)	128K	–	–
IBM Granite 34B Code Instruct (`ibm-granite-34b-code-instruct-8k`)	Large code instruct model (34B params).	2024	Text (Code)	8K	–	–
Llama 3 8B Instruct (`meta.llama3-8b-instruct-v1:0`)	Meta Llama 3 instruct-tuned model.	2024	Text	8K	–	–
Llama 3 70B Instruct (`meta.llama3-70b-instruct-v1:0`)	Larger Meta Llama 3 instruct model.	2024	Text	8K	–	–
DeepSeek-R1 (`deepseek-llm-r1`)	DeepSeek foundation model.	2025	Text	–	–	–
DeepSeek V3.1 (`deepseek.v3-v1:0`)	Latest DeepSeek v3.1 model.	2025	Text	163,840	–	–
Mistral 7B Instruct (`mistral.mistral-7b-instruct-v0:2`)	Instruction-tuned Mistral 7B.	2024-03-01	Text, Code, Classification	32K	–	–
Mistral Large 24.02 (`mistral.mistral-large-2402-v1:0`)	Large Mistral model for reasoning, text, code, RAG, and agents.	2024-04-02	Text, Code, RAG, Agents	32K	–	–
Mixtral 8x7B Instruct (`mistral.mixtral-8x7b-instruct-v0:1`)	Mixture-of-experts instruct model.	2024-03-01	Text, Code, Reasoning	32K	–	–

Azure OpenAI Models

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
GPT-5 (`gpt-5-2025-08-07`)	Flagship GPT-5 with reasoning, structured outputs, text + image processing, functions & tools.	2025-08-07	Text, Image	400,000 (272K in / 128K out)	128,000	Sep 30, 2024
GPT-5 Mini (`gpt-5-mini-2025-08-07`)	Smaller, faster GPT-5 variant.	2025-08-07	Text, Image	400,000 (272K in / 128K out)	128,000	May 31, 2024
GPT-5 Nano (`gpt-5-nano-2025-08-07`)	Optimized GPT-5 variant with smaller footprint.	2025-08-07	Text, Image	400,000 (272K in / 128K out)	128,000	May 31, 2024
GPT-5 Chat Preview (`gpt-5-chat-2025-08-07`)	Chat-optimized GPT-5 (preview).	2025-08-07	Text, Image	128,000	16,384	Sep 30, 2024
GPT-5 Chat Preview (`gpt-5-chat-2025-10-03`)	Updated chat-optimized GPT-5 (preview).	2025-10-03	Text, Image	128,000	16,384	Sep 30, 2024
GPT-5 Codex (`gpt-5-codex-2025-09-11`)	GPT-5 optimized for coding and structured outputs.	2025-09-11	Text, Image	400,000 (272K in / 128K out)	128,000	–
GPT-5 Pro (`gpt-5-pro-2025-10-06`)	GPT-5 Pro with advanced reasoning, structured outputs, functions & tools.	2025-10-06	Text, Image	400,000 (272K in / 128K out)	128,000	Sep 30, 2024
GPT-OSS 120B (`gpt-oss-120b`) Preview	Open-source style reasoning model.	2025	Text	131,072	131,072	May 31, 2024
GPT-OSS 20B (`gpt-oss-20b`) Preview	Smaller GPT-OSS variant.	2025	Text	131,072	131,072	May 31, 2024
GPT-4.1 (`gpt-4.1-2025-04-14`)	Multimodal model with streaming, function calling, and structured outputs.	2025-04-14	Text, Image	1,047,576 · 128K (managed) · 300K (batch)	32,768	May 31, 2024
GPT-4.1 Nano (`gpt-4.1-nano-2025-04-14`)	Lightweight GPT-4.1 variant.	2025-04-14	Text, Image	1,047,576 · 128K (managed) · 300K (batch)	32,768	May 31, 2024
GPT-4.1 Mini (`gpt-4.1-mini-2025-04-14`)	Smaller GPT-4.1 variant.	2025-04-14	Text, Image	1,047,576 · 128K (managed) · 300K (batch)	32,768	May 31, 2024
Codex Mini (`codex-mini-2025-05-16`)	Fine-tuned o4-mini optimized for code.	2025-05-16	Text, Image	200K in / 100K out	100,000	May 31, 2024
O3 Pro (`o3-pro-2025-06-10`)	Advanced reasoning model with enhanced capabilities.	2025-06-10	Text, Image	200K in / 100K out	100,000	May 31, 2024
O4 Mini (`o4-mini-2025-04-16`)	Reasoning model with efficient performance.	2025-04-16	Text, Image	200K in / 100K out	100,000	May 31, 2024
O3 (`o3-2025-04-16`)	Reasoning model with tool use.	2025-04-16	Text, Image	200K in / 100K out	100,000	May 31, 2024
O3 Mini (`o3-mini-2025-01-31`)	Text-only reasoning model.	2025-01-31	Text	200K in / 100K out	100,000	Oct 2023
O1 (`o1-2024-12-17`)	Reasoning model with structured outputs.	2024-12-17	Text, Image	200K in / 100K out	100,000	Oct 2023
O1 Preview (`o1-preview-2024-09-12`)	Early preview release of O1.	2024-09-12	Text	128K in / 32,768 out	32,768	Oct 2023
O1 Mini (`o1-mini-2024-09-12`)	Cost-efficient O1 variant.	2024-09-12	Text	128K in / 65,536 out	65,536	Oct 2023
GPT-4o (`gpt-4o-2024-11-20`)	Multimodal GPT-4o with JSON mode, function calling, and strong vision support.	2024-11-20	Text, Image	128,000	16,384	Oct 2023
GPT-4o (`gpt-4o-2024-08-06`)	Updated GPT-4o release.	2024-08-06	Text, Image	128,000	16,384	Oct 2023
GPT-4o (`gpt-4o-2024-05-13`)	Early GPT-4o release (Turbo Vision parity).	2024-05-13	Text, Image	128,000	4,096	Oct 2023
GPT-4o Mini (`gpt-4o-mini-2024-07-18`)	Smaller, fast GPT-4o variant.	2024-07-18	Text, Image	128,000	16,384	Oct 2023
GPT-4 Turbo (`gpt-4-turbo-2024-04-09`)	Multimodal GPT-4 Turbo, successor to preview models.	2024-04-09	Text, Image	128,000	4,096	Dec 2023
GPT-3.5 Turbo (`gpt-35-turbo-0125`)	JSON mode, function calling, reproducible outputs.	2024-01-25	Text	16,385 in / 4,096 out	4,096	Sep 2021
GPT-3.5 Turbo (`gpt-35-turbo-1106`)	Earlier GPT-3.5 Turbo variant.	2023-11-06	Text	16,385 in / 4,096 out	4,096	Sep 2021
GPT-3.5 Turbo Instruct (`gpt-35-turbo-instruct-0914`)	Replacement for legacy Completions models.	2023-09-14	Text	4,097	4,097	Sep 2021

Azure AI Inference Models

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
AI21 Jamba 1.5 Mini (`AI21-Jamba-1.5-Mini`)	Tool calling: Yes; supports text, JSON, structured outputs.	–	Text	262,144	4,096	–
AI21 Jamba 1.5 Large (`AI21-Jamba-1.5-Large`)	Tool calling: Yes; supports text, JSON, structured outputs.	–	Text	262,144	4,096	–
O3 Mini (`o3-mini`)	OpenAI O-series; tool calling: Yes; structured outputs.	–	Text, Image	200,000	100,000	–
O1 (`o1`)	OpenAI O-series; tool calling: Yes; structured outputs.	–	Text, Image	200,000	100,000	–
O1 Preview (`o1-preview`)	Early O1 preview; tool calling: Yes.	–	Text	128,000	32,768	–
O1 Mini (`o1-mini`)	Cost-efficient O1 variant; tool calling: No.	–	Text	128,000	65,536	–
GPT-4o (`gpt-4o`)	Multimodal GPT-4o; tool calling: Yes; supports structured outputs.	–	Text, Image, Audio	131,072	16,384	–
GPT-4o Mini (`gpt-4o-mini`)	Smaller GPT-4o variant; tool calling: Yes.	–	Text, Image, Audio	131,072	16,384	–
Cohere Command A (`Cohere-command-A`)	Cohere instruct model; tool calling: Yes.	–	Text	256,000	8,000	–
Cohere Command R+ (`Cohere-command-r-plus-08-2024`)	Optimized for reasoning and retrieval; tool calling: Yes.	2024-08	Text	131,072	4,096	–
Cohere Command R (`Cohere-command-r-08-2024`)	Earlier R-series model; tool calling: Yes.	2024-08	Text	131,072	4,096	–
JAIS 30B (`jais-30b-chat`)	Multilingual model; tool calling: Yes.	–	Text	8,192	4,096	–
DeepSeek V3 (`DeekSeek-V3-0324`)	Latest DeepSeek v3; tool calling: No.	2024-03	Text	131,072	131,072	–
DeepSeek V3 (Legacy) (`DeepSeek-V3-Legacy`)	Earlier DeepSeek v3.	–	Text	131,072	131,072	–
DeepSeek R1 (`DeepSeek-R1`)	Reasoning-focused model.	–	Text	163,840	163,840	–
Llama 4 Scout (`Llama-4-Scout-17B-16E-Instruct`)	Meta Llama 4 variant; tool calling: Yes.	–	Text, Image	128,000	8,192	–
Llama 4 Maverick (`Llama-4-Maverick-17B-128E-Instruct-FP8`)	Meta Llama 4 Maverick; tool calling: Yes.	–	Text, Image	128,000	8,192	–
Llama 3.3 70B (`Llama-3.3-70B-Instruct`)	Meta Llama 3.3 large model.	–	Text	128,000	8,192	–
Llama 3.2 Vision (`Llama-3.2-90B-Vision-Instruct`)	Meta Llama 3.2 multimodal vision model.	–	Text, Image	128,000	8,192	–
Llama 3.2 Vision (`Llama-3.2-11B-Vision-Instruct`)	Smaller Meta Llama 3.2 vision variant.	–	Text, Image	128,000	8,192	–
Llama 3.1 8B (`Meta-Llama-3.1-8B-Instruct`)	Meta Llama 3.1 instruct variant.	–	Text	131,072	8,192	–
Llama 3.1 405B (`Meta-Llama-3.1-405B-Instruct`)	Largest Meta Llama 3.1 instruct variant.	–	Text	131,072	8,192	–
MAI DS R1 (`MAI-DS-R1`)	Reasoning model.	–	Text	163,840	163,840	–
Phi-4 (`Phi-4`)	Microsoft Phi-4 general-purpose.	–	Text	16,384	16,384	–
Phi-4 Mini (`Phi-4-mini-instruct`)	Small Phi-4 variant.	–	Text	131,072	4,096	–
Phi-4 Multimodal (`Phi-4-multimodal-instruct`)	Multimodal Phi-4 (text, image, audio).	–	Text, Image, Audio	131,072	4,096	–
Phi-4 Reasoning (`Phi-4-reasoning`)	Phi-4 reasoning-focused model.	–	Text	32,768	32,768	–
Phi-4 Mini Reasoning (`Phi-4-mini-reasoning`)	Lightweight reasoning variant.	–	Text	128,000	128,000	–
Phi-3.5 Mini (`Phi-3.5-mini-instruct`)	Phi-3.5 small instruct model.	–	Text	131,072	4,096	–
Phi-3.5 MoE (`Phi-3.5-MoE-instruct`)	Phi-3.5 mixture-of-experts variant.	–	Text	131,072	4,096	–
Phi-3.5 Vision (`Phi-3.5-vision-instruct`)	Phi-3.5 multimodal variant.	–	Text, Image	131,072	4,096	–
Phi-3 Mini 128K (`Phi-3-mini-128k-instruct`)	Compact Phi-3 variant with 128K context.	–	Text	131,072	4,096	–
Phi-3 Mini 4K (`Phi-3-mini-4k-instruct`)	Compact Phi-3 with 4K context.	–	Text	4,096	4,096	–
Phi-3 Small 128K (`Phi-3-small-128k-instruct`)	Small Phi-3 with 128K context.	–	Text	131,072	4,096	–
Phi-3 Small 8K (`Phi-3-small-8k-instruct`)	Small Phi-3 with 8K context.	–	Text	131,072	4,096	–
Phi-3 Medium 128K (`Phi-3-medium-128k-instruct`)	Medium Phi-3 with 128K context.	–	Text	131,072	4,096	–
Phi-3 Medium 4K (`Phi-3-medium-4k-instruct`)	Medium Phi-3 with 4K context.	–	Text	4,096	4,096	–
Codestral 2501 (`Codestral-2501`)	Mistral Codestral code-focused model.	–	Text	262,144	4,096	–
Ministral 3B (`Ministral-3B`)	Lightweight Mistral model; tool calling: Yes.	–	Text	131,072	4,096	–
Mistral Nemo (`Mistral-Nemo`)	Mistral Nemo model; tool calling: Yes.	–	Text	131,072	4,096	–
Mistral Large 24.11 (`Mistral-Large-2411`)	Latest Mistral large model; tool calling: Yes.	–	Text	128,000	4,096	–
Mistral Medium 25.05 (`Mistral-medium-2505`)	Balanced medium model; tool calling: No.	–	Text, Image	128,000	128,000	–
Mistral Small 25.03 (`Mistral-small-2503`)	Newer small Mistral; tool calling: Yes.	–	Text, Image	131,072	4,096	–
Mistral Small (`Mistral-small`)	Earlier small Mistral variant.	–	Text	32,768	4,096	–
Tsuzumi 7B (`tsuzumi-7b`)	Lightweight Tsuzumi 7B model.	–	Text	8,192	8,192	–

Google AI Models (7 Models)

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
Gemini 2.5 Pro (`gemini-2.5-pro`)	Most advanced model for complex reasoning and multimodal tasks.	2025	Text, Image, Audio, Video	–	65,536	Jan 2025
Gemini 2.5 Flash (`gemini-2.5-flash`)	Balanced model optimized for speed and general use.	2025	Text, Image, Audio, Video	–	65,536	Jan 2025
Gemini 2.5 Flash (Preview) (`gemini-2.5-flash-preview-09-2025`)	Preview release of Gemini 2.5 Flash.	2025-09	Text, Image, Audio, Video	–	65,536	Jan 2025
Gemini 2.5 Flash-Lite (`gemini-2.5-flash-lite`)	Lightweight, cost-efficient variant.	2025	Text, Image, Audio, Video	–	65,536	Jan 2025
Gemini 2.5 Flash-Lite (Preview) (`gemini-2.5-flash-lite-preview-09-2025`)	Preview release of Gemini 2.5 Flash-Lite.	2025-09	Text, Image, Audio, Video	–	65,536	Jan 2025
Gemini 2.0 Flash (`gemini-2.0-flash`)	Earlier generation Flash model.	2024	Text, Image, Audio, Video	–	8,192	Aug 2024
Gemini 2.0 Flash-Lite (`gemini-2.0-flash-lite`)	Lightweight 2.0 Flash variant.	2024 / 2025	Text, Image, Audio	–	8,192	Aug 2024

Google Vertex AI Models

Model	Description	Release Date	Modalities	Context Window	Max Output Tokens	Knowledge Cut-Off
Gemini 2.5 Flash (Preview) (`gemini-2.5-flash`)	Balanced model optimized for speed.	2025	Text, Image, Audio, Video	1M	65,536	Jan 2025
Gemini 2.5 Pro (Preview) (`gemini-2.5-pro`)	Most advanced Gemini model.	2025	Text, Image, Audio, Video	1M	65,536	Jan 2025
Gemini 2.0 Flash (`gemini-2.0-flash`)	Previous Flash generation.	2024	Text, Image, Audio, Video	–	8,192	Aug 2024
Gemini 2.0 Flash-Lite (`gemini-2.0-flash-lite`)	Lightweight Flash variant.	2024	Text, Image, Audio	–	8,192	Aug 2024
Claude Opus 4.1 (`claude-opus-4-1`)	Exceptional reasoning model.	2025	Text, Image	200K	32,000	Jan 2025
Claude Opus 4 (`claude-opus-4`)	Previous flagship Claude model.	2025	Text, Image	200K	32,000	Jan 2025
Claude Sonnet 4.5 (`claude-sonnet-4-5`)	Best for complex agents and coding.	2025	Text, Image	200K / 1M (beta)	64,000	Jan 2025
Claude Sonnet 4 (`claude-sonnet-4`)	High-performance Claude Sonnet model.	2025	Text, Image	200K / 1M (beta)	64,000	Jan 2025
Claude 3.7 Sonnet (`claude-3-7-sonnet`)	High-performance with extended thinking.	2025	Text, Image	200K	64,000	Oct 2024
Claude 3.5 Sonnet v2 (`claude-3-5-sonnet-v2`)	Updated Claude 3.5 Sonnet.	2024	Text, Image	200K	64,000	2024
Claude 3.5 Haiku (`claude-3-5-haiku`)	Fastest Claude model.	2024	Text, Image	200K	8,192	Jul 2024
Claude 3 Haiku (`claude-3-haiku`)	Compact and fast Claude model.	2024	Text	200K	4,096	Aug 2023
Claude 3.5 Sonnet (`claude-3-5-sonnet`)	Standard Claude 3.5 Sonnet.	2024	Text, Image	200K	64,000	2024
Jamba 1.5 Large (Preview) (`jamba-1-5-large`)	Advanced AI21 Jamba model.	2025	Text	–	–	–
Jamba 1.5 Mini (Preview) (`jamba-1-5-mini`)	Smaller AI21 Jamba 1.5 variant.	2025	Text	–	–	–
Mistral Medium 3 (`mistral-medium-3`)	Medium-sized Mistral model.	2025	Text	–	–	–
Mistral Small 3.1 (`mistral-small-3-1-25-03`)	Smaller, faster Mistral.	2025-03	Text	–	–	–
Mistral Large (`mistral-large-24-11`)	Large Mistral model.	2024-11	Text	–	–	–
Mistral 7B (`mistral-7b`)	Base 7B model.	2023	Text	–	–	–
Mixtral (`mixtral`)	Mixture-of-experts Mistral model.	2024	Text	–	–	–
Llama 4 Maverick (`llama-4-maverick-17b-128e`)	Meta Llama 4 Maverick.	2025	Text	–	–	–
Llama 4 Scout (`llama-4-scout-17b-16e`)	Meta Llama 4 Scout.	2025	Text	–	–	–
Llama 4 (`llama-4`)	Core large Llama 4 model.	2025	Text	–	–	–
Llama 3.3 (`llama-3-3`)	Successor to Llama 3.2.	2025	Text	–	–	–
Llama 3.2 (Preview) (`llama-3-2-preview`)	Preview release of Llama 3.2.	2024	Text	–	–	–
Llama 3.2 (`llama-3-2`)	Stable release of Llama 3.2.	2024	Text	–	–	–
Llama 3.2 Vision (`llama-3-2-vision`)	Multimodal Llama 3.2.	2024	Text, Image	–	–	–
Llama 3.1 (`llama-3-1`)	Part of Llama 3 family.	2024	Text	–	–	–
Llama 3 (`llama-3`)	Base Llama 3 model.	2023	Text	–	–	–
Qwen3-Next 80B Thinking (`qwen3-next-80b-thinking`)	Reasoning-focused Qwen3 variant.	2025	Text	–	–	–
Qwen3-Next 80B Instruct (`qwen3-next-80b-instruct`)	Instruction-tuned Qwen3 variant.	2025	Text	–	–	–
Qwen3 Coder (`qwen3-coder`)	Qwen3 code-focused model.	2025	Text (Code)	–	–	–
Qwen3 235B (`qwen3-235b`)	Very large Qwen3 model.	2025	Text	–	–	–
Qwen2 (`qwen2`)	Earlier Qwen release.	2024	Text	–	–	–
DeepSeek V3.1 (`deepseek-v3-1`)	Advanced DeepSeek model.	2025	Text	–	–	–
DeepSeek R1 (`deepseek-r1-0528`)	Reasoning-focused DeepSeek model.	2025-05-28	Text	–	–	–
GPT-OSS 120B (`gpt-oss-120b`)	Open-weight GPT-OSS model.	2025	Text	–	–	–
GPT-OSS 20B (`gpt-oss-20b`)	Smaller open-weight GPT-OSS model.	2025	Text	–	–	–
Phi-3 (`phi-3`)	Microsoft Phi-3 model.	2024	Text	–	–	–
Gemma 3n (`gemma-3n`)	Google Gemma series model.	2025	Text	–	–	–
Gemma 3 (`gemma-3`)	Member of Gemma family.	2025	Text	–	–	–
Gemma 2 (`gemma-2`)	Earlier Gemma generation.	2024	Text	–	–	–
Gemma (`gemma`)	First Gemma release.	2023	Text	–	–	–

Model Selection Guide

By Use Case

Complex Reasoning & Analysis

models = {
    "premium": ["gpt-5-2025-08-07", "gpt-4.1-2025-04-14", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"],
    "balanced": ["gpt-4o", "gpt-4.1-mini-2025-04-14", "claude-3-sonnet-20240229"],
    "reasoning": ["o3-2025-04-16", "o1-2024-12-17", "o3-mini-2025-01-31"]
}

High Volume Processing

models = {
    "fastest": ["gpt-5-nano-2025-08-07", "gpt-4.1-nano-2025-04-14", "claude-3-5-haiku-20241022", "gemini-1.5-flash"],
    "cost_optimised": ["gpt-5-mini-2025-08-07", "gpt-4o-mini", "claude-3-haiku-20240307"],
    "open_source": ["meta.llama3-8b-instruct-v1:0"]
}

Long Context Applications

models = {
    "maximum_context": ["gemini-1.5-pro"],  # 2M tokens
    "large_context": ["gpt-4.1-2025-04-14", "gemini-1.5-flash"],  # 1M tokens
    "standard_large": ["gpt-5-2025-08-07", "claude-3-5-sonnet-20241022", "gpt-4o"]  # 400K/200K/128K
}

Multimodal Applications

models = {
    "vision": ["gpt-5-2025-08-07", "gpt-4o", "gemini-1.5-pro", "o3-2025-04-16"],
    "video": ["gemini-1.5-pro", "gemini-1.5-flash"],
    "images": ["gpt-5-2025-08-07", "gpt-4o", "o4-mini-2025-04-16", "gemini-1.0-pro-vision"]
}

Dynamic Model Selection (Example Script)

from openai import OpenAI
from typing import Dict, List

class ModelSelector:
    """Intelligent model selection based on requirements"""
    
    # Model capabilities and costs
    MODEL_PROFILES = {
        "gpt-5-2025-08-07": {
            "cost": "highest",
            "speed": "medium",
            "capability": "highest",
            "context": 400000
        },
        "gpt-5-mini-2025-08-07": {
            "cost": "medium",
            "speed": "fast",
            "capability": "highest",
            "context": 400000
        },
        "gpt-5-nano-2025-08-07": {
            "cost": "low",
            "speed": "fastest",
            "capability": "good",
            "context": 400000
        },
        "gpt-4.1-2025-04-14": {
            "cost": "high",
            "speed": "medium",
            "capability": "highest",
            "context": 1047576
        },
        "gpt-4o": {
            "cost": "high",
            "speed": "medium",
            "capability": "highest",
            "context": 128000
        },
        "gpt-4o-mini": {
            "cost": "low",
            "speed": "fast",
            "capability": "good",
            "context": 128000
        },
        "o3-2025-04-16": {
            "cost": "highest",
            "speed": "slow",
            "capability": "highest",
            "context": 200000
        },
        "claude-3-5-sonnet-20241022": {
            "cost": "medium",
            "speed": "fast",
            "capability": "highest",
            "context": 200000
        },
        "claude-3-5-haiku-20241022": {
            "cost": "very_low",
            "speed": "fastest",
            "capability": "good",
            "context": 200000
        },
        "gemini-1.5-pro": {
            "cost": "medium",
            "speed": "medium",
            "capability": "highest",
            "context": 2000000
        },
        "gemini-1.5-flash": {
            "cost": "low",
            "speed": "fast",
            "capability": "good",
            "context": 1000000
        }
    }
    
    def select_model(self, 
                     task_complexity: str,
                     context_size: int,
                     budget: str,
                     speed_requirement: str) -> str:
        """Select optimal model based on requirements"""
        
        suitable_models = []
        
        for model, profile in self.MODEL_PROFILES.items():
            # Check context size
            if context_size > profile["context"]:
                continue
                
            # Check budget constraints
            if budget == "low" and profile["cost"] in ["high", "medium"]:
                continue
                
            # Check speed requirements
            if speed_requirement == "real-time" and profile["speed"] == "slow":
                continue
                
            # Check capability requirements
            if task_complexity == "complex" and profile["capability"] != "highest":
                continue
                
            suitable_models.append(model)
        
        # Return best match or default
        return suitable_models[0] if suitable_models else "gpt-4o-mini"

# Usage example
selector = ModelSelector()

# Select model for different scenarios
model_for_chat = selector.select_model(
    task_complexity="simple",
    context_size=1000,
    budget="low",
    speed_requirement="real-time"
)  # Returns: gpt-4o-mini or claude-3-5-haiku-20241022

model_for_analysis = selector.select_model(
    task_complexity="complex",
    context_size=150000,
    budget="high",
    speed_requirement="normal"
)  # Returns: claude-3-5-sonnet-20241022 or gemini-1.5-pro

Cost Optimised Model Routing (Example Script)

import json
from typing import Dict, Optional
from openai import OpenAI

class CostOptimisedRouter:
    """Route requests to most cost-effective model"""
    
    # Cost per 1K tokens (input/output estimated)
    MODEL_COSTS = {
        "gpt-4o": {"input": 0.00250, "output": 0.01000},
        "gpt-4o-mini": {"input": 0.00015, "output": 0.00060},
        "claude-3-5-sonnet-20241022": {"input": 0.00300, "output": 0.01500},
        "claude-3-5-haiku-20241022": {"input": 0.00025, "output": 0.00125},
        "gemini-1.5-flash": {"input": 0.00010, "output": 0.00040},
        "gpt-3.5-turbo": {"input": 0.00050, "output": 0.00150}
    }
    
    def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        """Estimate cost for a request"""
        costs = self.MODEL_COSTS.get(model, {"input": 0, "output": 0})
        input_cost = (input_tokens / 1000) * costs["input"]
        output_cost = (output_tokens / 1000) * costs["output"]
        return input_cost + output_cost
    
    def select_cheapest_capable_model(self, 
                                      task_type: str,
                                      estimated_tokens: int) -> str:
        """Select cheapest model capable of the task"""
        
        # Define capable models by task type
        capable_models = {
            "simple": ["gpt-4o-mini", "claude-3-5-haiku-20241022", "gpt-3.5-turbo"],
            "moderate": ["gpt-4o-mini", "claude-3-5-haiku-20241022", "gemini-1.5-flash"],
            "complex": ["gpt-4o", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"]
        }
        
        models = capable_models.get(task_type, capable_models["simple"])
        
        # Calculate costs and select cheapest
        cheapest = min(models, key=lambda m: self.estimate_cost(
            m, estimated_tokens, estimated_tokens // 2
        ))
        
        return cheapest
    
    def route_request(self, prompt: str, complexity: str = "auto") -> Dict:
        """Route request to optimal model"""
        
        # Auto-detect complexity if needed
        if complexity == "auto":
            prompt_length = len(prompt)
            if prompt_length < 100:
                complexity = "simple"
            elif prompt_length < 500:
                complexity = "moderate"
            else:
                complexity = "complex"
        
        # Estimate tokens (rough estimate)
        estimated_tokens = len(prompt) // 4
        
        # Select model
        model = self.select_cheapest_capable_model(complexity, estimated_tokens)
        
        # Get provider for model
        provider_map = {
            "gpt-4o": "openai",
            "gpt-4o-mini": "openai",
            "gpt-3.5-turbo": "openai",
            "claude-3-5-sonnet-20241022": "anthropic",
            "claude-3-5-haiku-20241022": "anthropic",
            "gemini-1.5-pro": "google",
            "gemini-1.5-flash": "google"
        }
        
        return {
            "model": model,
            "provider": provider_map[model],
            "estimated_cost": self.estimate_cost(model, estimated_tokens, estimated_tokens // 2),
            "reasoning": f"Selected {model} as cheapest option for {complexity} task"
        }

# Usage
router = CostOptimisedRouter()

# Simple query - routes to cheapest model
result = router.route_request("What is 2+2?")
print(f"Model: {result['model']}, Cost: ${result['estimated_cost']:.6f}")

# Complex query - routes to capable but cost-effective model
result = router.route_request(
    "Analyse this 10-page legal document and identify key risks...",
    complexity="complex"
)
print(f"Model: {result['model']}, Cost: ${result['estimated_cost']:.4f}")

A/B Testing Different Models

import asyncio
import time
from typing import List, Dict
from openai import OpenAI

class ModelABTester:
    """A/B test different models for quality and performance"""
    
    def __init__(self, gateway_url: str):
        self.gateway_url = gateway_url
        self.results = []
    
    def create_client(self, provider: str, credentials: dict) -> OpenAI:
        """Create client for specific provider"""
        headers = {"x-provider-name": provider}
        headers.update(credentials)
        
        return OpenAI(
            api_key="dummy",
            base_url=self.gateway_url,
            default_headers=headers
        )
    
    async def test_model(self, 
                         model: str, 
                         provider: str,
                         credentials: dict,
                         prompt: str) -> Dict:
        """Test a single model"""
        
        client = self.create_client(provider, credentials)
        
        start_time = time.time()
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=500
            )
            
            end_time = time.time()
            
            return {
                "model": model,
                "provider": provider,
                "success": True,
                "latency": end_time - start_time,
                "response": response.choices[0].message.content,
                "tokens_used": response.usage.total_tokens if response.usage else 0
            }
        except Exception as e:
            return {
                "model": model,
                "provider": provider,
                "success": False,
                "error": str(e)
            }
    
    async def run_ab_test(self, 
                         test_configs: List[Dict],
                         prompt: str) -> List[Dict]:
        """Run A/B test across multiple models"""
        
        tasks = []
        for config in test_configs:
            task = self.test_model(
                config["model"],
                config["provider"],
                config["credentials"],
                prompt
            )
            tasks.append(task)
        
        results = await asyncio.gather(*tasks)
        return results
    
    def analyse_results(self, results: List[Dict]) -> Dict:
        """Analyse A/B test results"""
        
        successful = [r for r in results if r.get("success")]
        
        if not successful:
            return {"error": "All models failed"}
        
        # Find best by latency
        fastest = min(successful, key=lambda x: x["latency"])
        
        # Calculate averages
        avg_latency = sum(r["latency"] for r in successful) / len(successful)
        
        return {
            "models_tested": len(results),
            "successful": len(successful),
            "fastest_model": fastest["model"],
            "fastest_latency": fastest["latency"],
            "average_latency": avg_latency,
            "results": results
        }

# Usage example
async def main():
    tester = ModelABTester("https://gateway.altrum.ai/v1")
    
    # Configure models to test
    test_configs = [
        {
            "model": "gpt-4o-mini",
            "provider": "openai",
            "credentials": {"Authorization": "Bearer key"}
        },
        {
            "model": "claude-3-5-haiku-20241022",
            "provider": "anthropic",
            "credentials": {"x-api-key": "key"}
        },
        {
            "model": "gemini-1.5-flash",
            "provider": "google",
            "credentials": {"x-goog-api-key": "key"}
        }
    ]
    
    # Run test
    prompt = "Write a haiku about cloud computing"
    results = await tester.run_ab_test(test_configs, prompt)
    
    # Analyse
    analysis = tester.analyse_results(results)
    print(f"Fastest model: {analysis['fastest_model']}")
    print(f"Latency: {analysis['fastest_latency']:.2f}s")

# Run the test
asyncio.run(main())

Model Comparison Matrix

Provider	Model	Context	Speed	Cost	Best For
OpenAI	gpt-5-2025-08-07	400K	Medium	Highest	Flagship coding, reasoning, agentic tasks
	gpt-5-mini-2025-08-07	400K	Fast	Medium	Cost-efficient GPT-5 for defined tasks
	gpt-5-nano-2025-08-07	400K	Fastest	Low	Fastest, cheapest GPT-5 variant
	gpt-4.1-2025-04-14	1M	Medium	High	Instruction following, tool use
	gpt-4o	128K	Medium	High	Complex reasoning, multimodal
	gpt-4o-mini	128K	Fast	Low	Simple tasks, high volume
	o3-2025-04-16	200K	Slow	Highest	Math, science, coding, multimodal analysis
	o1-2024-12-17	200K	Slow	High	Step-by-step reasoning
Anthropic	claude-3-5-sonnet	200K	Fast	Medium	Coding, analysis
	claude-3-5-haiku	200K	Fastest	Very Low	Real-time apps
	claude-3-opus	200K	Slow	Very High	Complex research
Google	gemini-1.5-pro	2M	Medium	Medium	Massive documents
	gemini-1.5-flash	1M	Fast	Low	High-speed processing
Bedrock	llama3-1-70b	8K	Medium	Low	Open-source needs
	titan-premier	8K	Fast	Low	AWS integration
	mistral-large	32K	Medium	Medium	European compliance

Best Practices

1. Model Selection Strategy

def select_model_strategy(requirements):
    """Strategic model selection based on requirements"""
    
    strategies = {
        "quality_first": [
            "gpt-5-2025-08-07",
            "gpt-4.1-2025-04-14",
            "claude-3-5-sonnet-20241022",
            "gemini-1.5-pro"
        ],
        "speed_first": [
            "gpt-5-nano-2025-08-07",
            "claude-3-5-haiku-20241022",
            "gpt-4o-mini",
            "gemini-1.5-flash"
        ],
        "cost_first": [
            "gpt-5-nano-2025-08-07",
            "gpt-4o-mini",
            "claude-3-haiku-20240307",
            "meta.llama3-8b-instruct-v1:0"
        ],
        "context_first": [
            "gemini-1.5-pro",  # 2M tokens
            "gpt-4.1-2025-04-14",  # 1M tokens
            "gpt-5-2025-08-07",  # 400K tokens
            "claude-3-5-sonnet-20241022"  # 200K tokens
        ]
    }
    
    return strategies.get(requirements["priority"], strategies["quality_first"])

2. Fallback Chains

class ModelFallbackChain:
    """Implement fallback chains for reliability"""
    
    def __init__(self):
        self.fallback_chains = {
            "premium": [
                "gpt-5-2025-08-07",
                "gpt-4.1-2025-04-14",
                "claude-3-5-sonnet-20241022",
                "gemini-1.5-pro",
                "gpt-4o"
            ],
            "efficient": [
                "gpt-5-nano-2025-08-07",
                "gpt-5-mini-2025-08-07",
                "claude-3-5-haiku-20241022",
                "gpt-4o-mini",
                "gemini-1.5-flash"
            ]
        }
    
    def execute_with_fallback(self, chain_type, prompt):
        """Execute with automatic fallback"""
        
        chain = self.fallback_chains[chain_type]
        
        for model in chain:
            try:
                return self.call_model(model, prompt)
            except Exception as e:
                print(f"Model {model} failed: {e}")
                continue
        
        raise Exception("All models in fallback chain failed")

3. Cost Monitoring

class CostMonitor:
    """Monitor and control model costs"""
    
    def __init__(self, monthly_budget: float):
        self.monthly_budget = monthly_budget
        self.current_spend = 0.0
        self.model_usage = {}
    
    def track_usage(self, model: str, tokens_in: int, tokens_out: int):
        """Track model usage and costs"""
        
        cost = self.calculate_cost(model, tokens_in, tokens_out)
        self.current_spend += cost
        
        if model not in self.model_usage:
            self.model_usage[model] = {"calls": 0, "cost": 0}
        
        self.model_usage[model]["calls"] += 1
        self.model_usage[model]["cost"] += cost
        
        # Alert if approaching budget
        if self.current_spend > self.monthly_budget * 0.8:
            self.send_budget_alert()
    
    def get_usage_report(self):
        """Generate usage report"""
        
        return {
            "total_spend": self.current_spend,
            "budget_remaining": self.monthly_budget - self.current_spend,
            "model_breakdown": self.model_usage
        }

Migration Guide

From Single Model to Multi-Model

# Before: Single model deployment
client = OpenAI(api_key="key")
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...]
)

# After: Multi-model with intelligent selection
class MultiModelClient:
    def __init__(self):
        self.gateway = "https://gateway.altrum.ai/v1"
    
    def create_completion(self, messages, requirements=None):
        # Select model based on requirements
        model = self.select_optimal_model(requirements)
        
        # Create client for selected model
        client = self.create_client_for_model(model)
        
        # Execute with automatic fallback
        return self.execute_with_fallback(client, model, messages)

Conclusion

The Multi-Model Support feature transforms AI deployment from a single model dependency to a flexible, optimised multi-model strategy in your Production AI Stack. With access to 55+ models across 7 providers, organisations can select the perfect model for each use case, optimise costs, ensure reliability through redundancy, and stay at the forefront of AI innovation.

Introduction

Product

Release Notes

Overview

Key Capabilities

55+ Production Models

Automatic Model Validation

Model-Specific Optimisations

Transparent Pricing

Business Benefits

1. Best of Breed Model Selection

2. Risk Mitigation & Reliability

3. Cost Management & Optimisation

4. Enterprise Scalability

Supported Models by Provider

OpenAI Models (20 Models)

Anthropic Claude Models (7 Models)

Amazon Bedrock Models (18 Models)

Azure OpenAI Models

Azure AI Inference Models

Google AI Models (7 Models)

Google Vertex AI Models

Model Selection Guide

By Use Case

Complex Reasoning & Analysis

High Volume Processing

Long Context Applications

Multimodal Applications

Dynamic Model Selection (Example Script)

Cost Optimised Model Routing (Example Script)

A/B Testing Different Models

Model Comparison Matrix

Best Practices

1. Model Selection Strategy

2. Fallback Chains

3. Cost Monitoring

Migration Guide

From Single Model to Multi-Model

Conclusion

Introduction

Product

Release Notes

​Overview

​Key Capabilities

55+ Production Models

Automatic Model Validation

Model-Specific Optimisations

Transparent Pricing

​Business Benefits

​1. Best of Breed Model Selection

​2. Risk Mitigation & Reliability

​3. Cost Management & Optimisation

​4. Enterprise Scalability

​Supported Models by Provider

​OpenAI Models (20 Models)

​Anthropic Claude Models (7 Models)

​Amazon Bedrock Models (18 Models)

​Azure OpenAI Models

​Azure AI Inference Models

​Google AI Models (7 Models)

​Google Vertex AI Models

​Model Selection Guide

​By Use Case

​Complex Reasoning & Analysis

​High Volume Processing

​Long Context Applications

​Multimodal Applications

​Dynamic Model Selection (Example Script)

​Cost Optimised Model Routing (Example Script)

​A/B Testing Different Models

​Model Comparison Matrix

​Best Practices

​1. Model Selection Strategy

​2. Fallback Chains

​3. Cost Monitoring

​Migration Guide

​From Single Model to Multi-Model

​Conclusion

Overview

Key Capabilities

Business Benefits

1. Best of Breed Model Selection

2. Risk Mitigation & Reliability

3. Cost Management & Optimisation

4. Enterprise Scalability

Supported Models by Provider

OpenAI Models (20 Models)

Anthropic Claude Models (7 Models)

Amazon Bedrock Models (18 Models)

Azure OpenAI Models

Azure AI Inference Models

Google AI Models (7 Models)

Google Vertex AI Models

Model Selection Guide

By Use Case

Complex Reasoning & Analysis

High Volume Processing

Long Context Applications

Multimodal Applications

Dynamic Model Selection (Example Script)

Cost Optimised Model Routing (Example Script)

A/B Testing Different Models

Model Comparison Matrix

Best Practices

1. Model Selection Strategy

2. Fallback Chains

3. Cost Monitoring

Migration Guide

From Single Model to Multi-Model

Conclusion