Models | chuizi.ai

K

211 modelsFinal prices shown

⭐ Recommended

Gpt

GPT 5

POPULAR

$0.66/M·$5.25/M

Anthropic

Claude Sonnet 4.6

POPULARHOT

$3.15/M·$15.75/M

Gemini

Gemini 2.5 Flash

POPULARBEST VALUE

$0.32/M·$2.63/M

DeepSeek

Deepseek V3.2

FLAGSHIPHOT

$0.29/M·$0.44/M

Qwen

Qwen3.6 Plus

HOT

$0.30/M·$1.83/M

Qwen

Wan2.7 Image

BEST VALUE

$0/M·$0/M

Gpt

O4 Mini

HOT

$1.16/M·$4.62/M

Anthropic

Claude Opus 4.6

FLAGSHIP

$5.25/M·$26.25/M

Gpt3x failover

GPT 5.4 Mini

NEWHOT

openai/gpt-5.4-mini

GPT-5.4 Mini — latest gen fast and affordable

Context200KMax Output33KInput$0.79/MOutput$4.73/MCache Read$0.08/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt3x failover

O4 Mini

HOT

openai/o4-mini

Newest compact reasoning model with tool use support

Context200KMax Output100KInput$1.16/MOutput$4.62/MCache Read$0.29/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt2x failover

GPT 5.1

POPULAR

openai/gpt-5.1

GPT-5.1 — improved reasoning and longer context window

Context272KMax Output128KInput$1.31/MOutput$10.50/MCache Read$0.21/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt2x failover

GPT 5

POPULAR

openai/gpt-5

GPT-5 flagship model with advanced reasoning and longer context

Context272KMax Output128KInput$0.66/MOutput$5.25/MCache Read$0.26/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt2x failover

GPT 5.4 Nano

openai/gpt-5.4-nano

GPT-5.4 Nano — ultra-fast, low-latency for classification, extraction, and sub-agents

Context400KMax Output128KInput$0.21/MOutput$1.31/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Gpt2x failover

GPT 5.3 Chat

openai/gpt-5.3-chat

GPT-5.3 Chat — previous generation flagship chat model

Context128KMax Output33KInput$1.84/MOutput$14.70/MCache Read$0/M|Cache Write$0/M

ChatVisionCodepdftools

Gpt2x failover

GPT 5.1 Chat

openai/gpt-5.1-chat

GPT-5.1 chat-optimized variant

Context272KMax Output128KInput$1.31/MOutput$10.50/MCache Read$0.21/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt

GPT 5.1 Codex

openai/gpt-5.1-codex

GPT-5.1 Codex — advanced code generation and analysis

Context272KMax Output128KInput$1.31/MOutput$10.50/MCache Read$0.14/M|Cache Write$0/M

ChatReasoningCodetoolscache

Gpt

GPT 5.1 Codex Mini

openai/gpt-5.1-codex-mini

GPT-5.1 Codex Mini — fast and affordable code generation

Context200KMax Output33KInput$0.26/MOutput$2.10/MCache Read$0.03/M|Cache Write$0/M

ChatCodetoolscache

Gpt2x failover

GPT 5 Chat

openai/gpt-5-chat

GPT-5 chat-optimized variant

Context272KMax Output128KInput$0.66/MOutput$5.25/MCache Read$0.26/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt2x failover

GPT 5 Mini

openai/gpt-5-mini

GPT-5 Mini — affordable with strong reasoning

Context200KMax Output66KInput$0.26/MOutput$2.10/MCache Read$0.03/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt2x failover

GPT 5 Nano

BEST VALUE

openai/gpt-5-nano

GPT-5 Nano — fastest and cheapest in the GPT-5 family

Context200KMax Output33KInput$0.05/MOutput$0.42/MCache Read$0.01/M|Cache Write$0/M

ChatVisionCodetoolscache

Gpt

GPT 5 Pro

openai/gpt-5-pro

GPT-5 Pro — maximum capability, best for complex tasks

Context272KMax Output128KInput$15.75/MOutput$126.00/MCache Read$1.58/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt

GPT 5 Codex

openai/gpt-5-codex

GPT-5 Codex — advanced code generation and analysis

Context272KMax Output128KInput$1.31/MOutput$10.50/MCache Read$0.19/M|Cache Write$0/M

ChatReasoningCodetoolscache

Gpt

O3

openai/o3

Latest reasoning model with improved speed and accuracy

Context200KMax Output100KInput$2.10/MOutput$8.40/MCache Read$0.53/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gpt3x failover

GPT 4.1

openai/gpt-4.1

Flagship model with 200K context, best for complex reasoning and coding

Context200KMax Output33KInput$2.10/MOutput$8.40/MCache Read$0.53/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt3x failover

GPT 4.1 Mini

BEST VALUE

openai/gpt-4.1-mini

Fast and affordable, great balance of speed and intelligence

Context200KMax Output33KInput$0.42/MOutput$1.68/MCache Read$0.10/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt2x failover

GPT 4.1 Nano

FASTEST

openai/gpt-4.1-nano

Fastest and cheapest, ideal for simple tasks and classification

Context200KMax Output33KInput$0.10/MOutput$0.42/MCache Read$0.03/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt3x failover

GPT 4o

openai/gpt-4o

Previous flagship with vision, strong all-around performance

Context128KMax Output16KInput$2.63/MOutput$10.50/MCache Read$1.31/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt3x failover

GPT 4o Mini

openai/gpt-4o-mini

Compact model optimized for speed and cost efficiency

Context128KMax Output16KInput$0.16/MOutput$0.63/MCache Read$0.08/M|Cache Write$0/M

ChatVisionCodepdftoolscache

Gpt

GPT 4o Audio Preview

openai/gpt-4o-audio-preview

Multimodal model supporting audio input and output

Context128KMax Output16KInput$2.63/MOutput$10.50/MCache Read$0/M|Cache Write$0/M

ChatVisionAudiotools

Gpt2x failover

GPT 4o Mini Tts

openai/gpt-4o-mini-tts

GPT-4o Mini TTS — expressive, natural speech with emotion and tone control

Input$0.63/MOutput$2.52/MCache Read$0/M|Cache Write$0/M

TTS

Gpt2x failover

GPT 4o Transcribe

openai/gpt-4o-transcribe

GPT-4o powered transcription — more accurate than Whisper, supports structured output

Input$6.30/MOutput$10.50/MCache Read$0/M|Cache Write$0/M

STT

Gpt2x failover

GPT 4o Mini Transcribe

openai/gpt-4o-mini-transcribe

Fast and affordable GPT-4o Mini transcription

Input$3.15/MOutput$5.25/MCache Read$0/M|Cache Write$0/M

STT

Gpt

GPT 4o Transcribe Diarize

openai/gpt-4o-transcribe-diarize

GPT-4o transcription with speaker diarization — identifies who said what

Input$6.30/MOutput$10.50/MCache Read$0/M|Cache Write$0/M

STT

OpenAI2x failover

Text Embedding 3 Small

openai/text-embedding-3-small

Compact embedding model, 1536 dimensions

Context8KMax Output—Input$0.02/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

OpenAI2x failover

Text Embedding 3 Large

openai/text-embedding-3-large

Large embedding model, up to 3072 dimensions

Context8KMax Output—Input$0.14/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

OpenAI2x failover

Text Embedding Ada 002

openai/text-embedding-ada-002

Legacy embedding model, 1536 dimensions

Context8KMax Output—Input$0.10/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

OpenAI

TTS 1

openai/tts-1

Standard text-to-speech, fast and natural sounding

Input$15.75/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

OpenAI

TTS 1 Hd

openai/tts-1-hd

High-definition text-to-speech with premium voice quality

Input$31.50/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

OpenAI

Whisper 1

openai/whisper-1

Industry-leading speech-to-text transcription

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

STT

Gpt

GPT Oss 120b

openai/gpt-oss-120b

GPT-OSS 120B — open-source large language model

Context128KMax Output16KInput$0.63/MOutput$1.89/MCache Read$0.16/M|Cache Write$0/M

ChatReasoningCodetoolscache

Gpt

GPT Oss 20b

openai/gpt-oss-20b

GPT-OSS 20B — compact open-source model

Context128KMax Output16KInput$0.10/MOutput$0.32/MCache Read$0.03/M|Cache Write$0/M

ChatReasoningCodetoolscache

Anthropic3x failover

Claude Opus 4.6

FLAGSHIP

anthropic/claude-opus-4-6

Latest Opus with improved coding and reduced cost

Context200KMax Output32KInput$5.25/MOutput$26.25/MCache Read$0.53/M|Cache Write$6.56/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Sonnet 4.6

POPULARHOT

anthropic/claude-sonnet-4-6

Latest Sonnet, top choice for Claude Code and Cursor

Context200KMax Output16KInput$3.15/MOutput$15.75/MCache Read$0.32/M|Cache Write$3.94/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Haiku 4.5

BEST VALUE

anthropic/claude-haiku-4-5

Fast and capable, great for real-time applications

Context200KMax Output8KInput$1.05/MOutput$5.25/MCache Read$0.10/M|Cache Write$1.31/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Sonnet 4 (Thinking)

anthropic/claude-sonnet-4-thinking

Sonnet 4 with extended thinking for deeper reasoning

Context200KMax Output16KInput$3.15/MOutput$15.75/MCache Read$0.32/M|Cache Write$3.94/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Opus 4.5

anthropic/claude-opus-4.5

Claude Opus 4.5 via Bedrock, top-tier reasoning and creativity

Context200KMax Output32KInput$5.25/MOutput$26.25/MCache Read$1.58/M|Cache Write$19.69/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Sonnet 4.5

anthropic/claude-sonnet-4.5

Claude Sonnet 4.5 via Bedrock, strong balanced performance

Context200KMax Output16KInput$3.15/MOutput$15.75/MCache Read$0.32/M|Cache Write$3.94/M

ChatVisionReasoningCodepdftoolscache

Anthropic4x failover

Claude Opus 4 7

anthropic/claude-opus-4-7

Most capable Claude yet (2026-04-16): 1M context, 128K output, agentic workflows

Context1MMax Output128KInput$5.25/MOutput$26.25/MCache Read$0.53/M|Cache Write$6.56/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Opus 4.1

anthropic/claude-opus-4.1

Claude Opus 4.1 via Bedrock, premium reasoning model

Context200KMax Output32KInput$15.75/MOutput$78.75/MCache Read$1.58/M|Cache Write$19.69/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Sonnet 4

LEGACY

anthropic/claude-sonnet-4

Balanced Claude with strong coding and reasoning abilities

Context200KMax Output16KInput$3.15/MOutput$15.75/MCache Read$0.32/M|Cache Write$3.94/M

ChatVisionReasoningCodepdftoolscache

Anthropic3x failover

Claude Haiku 3.5

LEGACY

anthropic/claude-haiku-3.5

Previous Haiku generation, compact and efficient

Context200KMax Output8KInput$0.84/MOutput$4.20/MCache Read$0.08/M|Cache Write$1.05/M

ChatVisionCodepdftoolscache

Gemini2x failover

Gemini 2.5 Pro

POPULAR

google/gemini-2.5-pro

Google most capable model with 1M context window

Context1MMax Output66KInput$1.31/MOutput$10.50/MCache Read$0.14/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gemini2x failover

Gemini 2.5 Flash

POPULARBEST VALUE

google/gemini-2.5-flash

Fast Gemini with strong reasoning at low cost

Context1MMax Output66KInput$0.32/MOutput$2.63/MCache Read$0.03/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gemini

Gemini 3 Pro Preview

google/gemini-3-pro-preview

Gemini 3 Pro Preview — next-gen reasoning

Context1MMax Output66KInput$2.10/MOutput$12.60/MCache Read$0.14/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gemini

Gemini 3 Pro Image Preview

google/gemini-3-pro-image-preview

Gemini 3 Pro Image — highest quality image generation & editing, up to 4K

Context1MMax Output66KInput$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

ChatVisionChat Imageimage-edit

Gemini

Gemini 3 Flash Preview

google/gemini-3-flash-preview

Gemini 3 Flash Preview — fast next-gen model

Context1MMax Output66KInput$0.53/MOutput$3.15/MCache Read$0.03/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gemini

Gemini 2.5 Flash Lite

BEST VALUEFASTEST

google/gemini-2.5-flash-lite

Gemini 2.5 Flash Lite — lightweight and fast

Context1MMax Output66KInput$0.10/MOutput$0.42/MCache Read$0.01/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gemini

Gemini 2.5 Flash Image

google/gemini-2.5-flash-image

Gemini 2.5 Flash Image — vision and image generation

Context1MMax Output66KInput$0.32/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatVisionChat Imageimage-editCode

Gemini

Gemini 3.1 Pro Preview

google/gemini-3.1-pro-preview

Gemini 3.1 Pro Preview — latest capabilities

Context1MMax Output66KInput$2.10/MOutput$12.60/MCache Read$0.14/M|Cache Write$0/M

ChatVisionReasoningCodepdftoolscache

Gemini

Gemini 3.1 Flash Image Preview

google/gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image — vision and image generation

Context1MMax Output66KInput$0.32/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatVisionChat Imageimage-editCode

Gemini

Gemini 3.1 Flash Lite Preview

google/gemini-3.1-flash-lite-preview

Gemini 3.1 Flash Lite — ultra-low cost for high-throughput tasks

Context1MMax Output66KInput$0.26/MOutput$1.58/MCache Read$0/M|Cache Write$0/M

ChatVisionCodepdftools

Gemini

Gemini Embedding 001

google/gemini-embedding-001

Gemini embedding model for text retrieval

Context8KMax Output—Input$0.01/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

DeepSeek4x failover

Deepseek V3.2

FLAGSHIPHOT

deepseek/deepseek-v3.2

Latest DeepSeek V3.2 with hybrid thinking mode

Context131KMax Output66KInput$0.29/MOutput$0.44/MCache Read$0.03/M|Cache Write$0/M

ChatReasoningCodetoolscache

DeepSeek3x failover

Deepseek R1 0528

HOT

deepseek/deepseek-r1-0528

DeepSeek R1 snapshot from May 2025

Context131KMax Output16KInput$0.58/MOutput$2.30/MCache Read$0.06/M|Cache Write$0/M

ChatReasoningCodetoolscache

DeepSeek3x failover

Deepseek Reasoner

POPULAR

deepseek/deepseek-reasoner

Chain-of-thought reasoning model rivaling o1

Context131KMax Output16KInput$0.58/MOutput$2.30/MCache Read$0.06/M|Cache Write$0/M

ChatReasoningCodetoolscache

DeepSeek3x failover

Deepseek V3.1

deepseek/deepseek-v3.1

DeepSeek V3.1 with hybrid thinking mode

Context131KMax Output66KInput$0.29/MOutput$0.44/MCache Read$0.03/M|Cache Write$0/M

ChatReasoningCodetoolscache

DeepSeek3x failover

Deepseek V3

deepseek/deepseek-v3

DeepSeek V3 text generation model

Context131KMax Output16KInput$0.29/MOutput$0.44/MCache Read$0.03/M|Cache Write$0/M

ChatCodetoolscache

DeepSeek3x failover

Deepseek R1

deepseek/deepseek-r1

DeepSeek R1 reasoning-only model

Context131KMax Output16KInput$0.58/MOutput$2.30/MCache Read$0.06/M|Cache Write$0/M

ChatReasoningCodecache

DeepSeek4x failover

Deepseek Chat

LEGACY

deepseek/deepseek-chat

Open-source powerhouse, strong coding and math skills

Context131KMax Output16KInput$0.29/MOutput$0.44/MCache Read$0.03/M|Cache Write$0/M

ChatCodetoolscache

Qwen2x failover

Qwen3.6 Plus

HOT

qwen/qwen3.6-plus

Qwen 3.6 Plus — latest flagship, rivals Claude Opus 4.5 on benchmarks

Context1.0MMax Output66KInput$0.30/MOutput$1.83/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen3x failover

Qwen3 Coder Plus

qwen/qwen3-coder-plus

Specialized coding model by Qwen

Context262KMax Output66KInput$0.59/MOutput$2.31/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen3x failover

Qwen3 Coder Next

qwen/qwen3-coder-next

Next-gen Qwen coding model, top tier for code tasks

Context262KMax Output66KInput$0.29/MOutput$2.94/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen4x failover

Qwen3 Coder 30b

qwen/qwen3-coder-30b

Qwen3 Coder 30B, compact coding specialist via Bedrock

Context131KMax Output33KInput$0.21/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen3x failover

Qwen3 Coder Flash

qwen/qwen3-coder-flash

Qwen3 Coder Flash — fastest, cheapest coding model

Context131KMax Output16KInput$0.21/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen3x failover

Qwen3 Coder 480b

qwen/qwen3-coder-480b

Qwen3 Coder 480B — flagship coding model, largest in the series

Context262KMax Output66KInput$4.20/MOutput$12.60/MCache Read$0/M|Cache Write$0/M

ChatCodeReasoningtools

Qwen2x failover

Qwq Plus

qwen/qwq-plus

Qwen reasoning model, rivals DeepSeek-R1

Context131KMax Output16KInput$0.23/MOutput$0.59/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen3x failover

Qwen3 Max

qwen/qwen3-max

Qwen 3 flagship, top reasoning and coding

Context262KMax Output66KInput$0.37/MOutput$1.47/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen3x failover

Qwen3.5 Plus

qwen/qwen3.5-plus

Qwen 3.5 Plus with 1M context

Context1.0MMax Output66KInput$0.12/MOutput$0.70/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen3x failover

Qwen3.5 Flash

qwen/qwen3.5-flash

Ultra-fast Qwen 3.5 with 1M context

Context1.0MMax Output66KInput$0.03/MOutput$0.22/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen2x failover

Qwen3.5 Omni

qwen/qwen3.5-omni

Qwen 3.5 Omni native multimodal — text, image, audio, video

Context1.0MMax Output66KInput$0.27/MOutput$1.64/MCache Read$0/M|Cache Write$0/M

ChatVisionAudioReasoningCodetools

Qwen3x failover

Qwen3.5 397b

qwen/qwen3.5-397b

Qwen 3.5 397B MoE flagship, top-tier reasoning

Context131KMax Output33KInput$0.41/MOutput$2.46/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen3x failover

Qwen3 VL Plus

qwen/qwen3-vl-plus

Qwen vision-language model

Context131KMax Output8KInput$0.23/MOutput$0.59/MCache Read$0/M|Cache Write$0/M

ChatVisiontoolspdf

Qwen4x failover

Qwen3 Next 80b

qwen/qwen3-next-80b

Qwen3 Next 80B via Bedrock, efficient MoE architecture

Context131KMax Output33KInput$0.37/MOutput$1.47/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen4x failover

Qwen3 VL 235b

qwen/qwen3-vl-235b

Qwen3 VL 235B, large vision-language model via Bedrock

Context131KMax Output8KInput$0.59/MOutput$2.31/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetoolspdf

Qwen5x failover

Qwen3 32b

qwen/qwen3-32b

Qwen3 32B dense model via Bedrock, strong all-rounder

Context131KMax Output33KInput$0.21/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Qwen

Qwen3 TTS Flash

qwen/qwen3-tts-flash

Qwen3 TTS Flash — fast text-to-speech with natural voice

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

Qwen

Qwen3 TTS Instruct Flash

qwen/qwen3-tts-instruct-flash

Qwen3 TTS Instruct — instruction-controlled speech synthesis

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

Qwen2x failover

Qwen3.5 Omni Flash

qwen/qwen3.5-omni-flash

Qwen 3.5 Omni Flash — fast multimodal (text+image+audio)

Context131KMax Output8KInput$0.21/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatVisionAudiotools

Qwen2x failover

Qwen3.5 Omni Plus

qwen/qwen3.5-omni-plus

Qwen 3.5 Omni Plus — premium multimodal understanding

Context131KMax Output8KInput$0.84/MOutput$2.10/MCache Read$0/M|Cache Write$0/M

ChatVisionAudiotools

Qwen2x failover

Qwen3 VL Flash

qwen/qwen3-vl-flash

Qwen3 VL Flash — fast and cheap visual understanding

Context131KMax Output8KInput$0.21/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatVisiontoolspdf

Qwen3x failover

Qwen Plus

qwen/qwen-plus

Balanced Qwen model via direct API

Context128KMax Output8KInput$0.42/MOutput$1.26/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen2x failover

Qwen Plus Latest

qwen/qwen-plus-latest

Always-latest Qwen Plus snapshot

Context131KMax Output16KInput$0.12/MOutput$0.29/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen2x failover

Qwen Turbo

qwen/qwen-turbo

Fast Qwen model, deprecated in favor of Qwen Flash

Context128KMax Output8KInput$0.32/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen

Qwen Long

qwen/qwen-long

10M ultra-long context for massive documents

Context10MMax Output6KInput$0.07/MOutput$0.29/MCache Read$0/M|Cache Write$0/M

Chattoolspdf

Qwen2x failover

Qwen Max

qwen/qwen-max

Qwen flagship via direct API

Context32KMax Output8KInput$0.82/MOutput$4.09/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen

Qwen2.5 Coder 32b

qwen/qwen2.5-coder-32b

Specialized coding model with 32B parameters

Context32KMax Output8KInput$0.21/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen2x failover

Qwen VL Max

qwen/qwen-vl-max

Qwen vision-language model via direct API

Context32KMax Output4KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatVisiontools

Qwen3x failover

Qwen Flash

qwen/qwen-flash

Ultra-fast Qwen Flash, upgraded to Qwen3.5

Context1.0MMax Output66KInput$0.03/MOutput$0.29/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen2x failover

Qwen Max Latest

qwen/qwen-max-latest

Always-latest Qwen Max snapshot

Context131KMax Output16KInput$0.35/MOutput$1.40/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Qwen2x failover

Qwen VL Plus

qwen/qwen-vl-plus

Qwen VL Plus for vision-language tasks

Context131KMax Output8KInput$0.12/MOutput$0.59/MCache Read$0/M|Cache Write$0/M

ChatVisiontoolspdf

Qwen2x failover

Wan2.7 Image

BEST VALUE

qwen/wan2.7-image

Wan 2.7 Image — affordable image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Qwen2x failover

Wan2.7 Image Pro

qwen/wan2.7-image-pro

Wan 2.7 Image Pro — high-quality image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Qwen

Cosyvoice V2

qwen/cosyvoice-v2

CosyVoice v2 text-to-speech via Bailian

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

Qwen

Sensevoice V1

qwen/sensevoice-v1

SenseVoice speech-to-text via Bailian

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

STT

Qwen

Paraformer V2

qwen/paraformer-v2

Paraformer v2 speech recognition via Bailian

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

STT

Qwen

Wan2.7 T2v

qwen/wan2.7-t2v

Wan 2.7 Text-to-Video

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Qwen

Wan2.6 T2v

qwen/wan2.6-t2v

Wan 2.6 Text-to-Video

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Qwen

Wan2.7 I2v

qwen/wan2.7-i2v

Wan 2.7 Image-to-Video

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Qwen

Wan2.6 I2v

qwen/wan2.6-i2v

Wan 2.6 Image-to-Video

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Qwen

Wan2.6 I2v Flash

qwen/wan2.6-i2v-flash

Wan 2.6 Image-to-Video Flash — fast generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Qwen2x failover

Qvq Max

qwen/qvq-max

QVQ-Max — flagship visual reasoning model, deep visual understanding

Context131KMax Output33KInput$2.10/MOutput$8.40/MCache Read$0/M|Cache Write$0/M

ChatVisionReasoningtools

Qwen

Qvq Plus

qwen/qvq-plus

QVQ-Plus — balanced visual reasoning, cost-effective

Context131KMax Output33KInput$0.84/MOutput$2.10/MCache Read$0/M|Cache Write$0/M

ChatVisionReasoningtools

Qwen2x failover

Qwen VL Ocr

qwen/qwen-vl-ocr

Qwen VL OCR — specialized document OCR, table/form extraction

Context32KMax Output8KInput$2.10/MOutput$5.25/MCache Read$0/M|Cache Write$0/M

ChatVision

Qwen2x failover

Qwen Image 2.0

qwen/qwen-image-2.0

Qwen Image 2.0 — text-to-image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Qwen2x failover

Qwen Image 2.0 Pro

qwen/qwen-image-2.0-pro

Qwen Image 2.0 Pro — high-quality text-to-image

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Qwen2x failover

Qwen Image Max

qwen/qwen-image-max

Qwen Image Max — best quality text-to-image

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Qwen

Qwen Image Edit Plus

qwen/qwen-image-edit-plus

Qwen Image Edit Plus — conversational image editing, Chinese & English prompts

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Genimage-edit

Qwen

Qwen Image Edit Max

qwen/qwen-image-edit-max

Qwen Image Edit Max — premium conversational image editing

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Genimage-edit

Qwen

Qwen Math Plus

qwen/qwen-math-plus

Qwen Math Plus — specialized mathematical reasoning

Context32KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatReasoningtools

Qwen

Qwen Math Turbo

qwen/qwen-math-turbo

Qwen Math Turbo — fast math reasoning at lower cost

Context32KMax Output8KInput$1.05/MOutput$2.10/MCache Read$0/M|Cache Write$0/M

ChatReasoningtools

Qwen2x failover

Qwen Omni Turbo

qwen/qwen-omni-turbo

Qwen Omni Turbo — text, image, and audio understanding

Context32KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatVisionAudiotools

Qwen2x failover

Z Image Turbo

qwen/z-image-turbo

Z-Image Turbo — lightweight fast image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Qwen3x failover

Qwen Mt Plus

qwen/qwen-mt-plus

Qwen MT Plus — professional machine translation

Context8KMax Output4KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

Chat

Qwen2x failover

Qwen Mt Turbo

qwen/qwen-mt-turbo

Qwen MT Turbo — fast translation at lower cost

Context8KMax Output4KInput$1.05/MOutput$2.10/MCache Read$0/M|Cache Write$0/M

Chat

Zhipu2x failover

GLM 5.1

zhipu/glm-5.1

GLM-5.1 latest — improved coding and reasoning, 94% of Claude Opus 4.6 coding

Context131KMax Output16KInput$1.05/MOutput$3.36/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Zhipu3x failover

GLM 5

zhipu/glm-5

GLM-5 744B open-source flagship with thinking mode

Context131KMax Output16KInput$0.59/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Zhipu

GLM 5 Turbo

zhipu/glm-5-turbo

GLM-5 Turbo — fast and cost-effective coding model

Context131KMax Output16KInput$0.59/MOutput$2.31/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Zhipu

GLM 5v Turbo

zhipu/glm-5v-turbo

GLM-5V Turbo — vision-language model for design-to-code and image understanding

Context131KMax Output16KInput$1.26/MOutput$4.20/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Zhipu3x failover

GLM 4.7

zhipu/glm-4.7

GLM-4.7 with hybrid thinking mode

Context131KMax Output16KInput$0.59/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Zhipu

GLM 4.6

zhipu/glm-4.6

GLM-4.6 with hybrid thinking mode

Context131KMax Output16KInput$0.59/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Zhipu

GLM 4.5

zhipu/glm-4.5

GLM-4.5 hybrid thinking model

Context131KMax Output16KInput$0.59/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Zhipu

GLM 4.5 Air

zhipu/glm-4.5-air

Lightweight GLM-4.5 for fast inference

Context131KMax Output16KInput$0.29/MOutput$1.16/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Zhipu2x failover

GLM 4.7 Flash

zhipu/glm-4.7-flash

GLM-4.7 Flash, fast and affordable via Bedrock

Context131KMax Output16KInput$0.15/MOutput$0.59/MCache Read$0/M|Cache Write$0/M

ChatCodetools

xAI2x failover

Grok 4.1 Fast

xai/grok-4.1-fast

Grok 4.1 Fast — latest xAI reasoning model

Context256KMax Output33KInput$0.21/MOutput$0.53/MCache Read$1.58/M|Cache Write$0/M

ChatVisionReasoningCodecachetools

xAI2x failover

Grok 4.1 Fast Non Reasoning

xai/grok-4.1-fast-non-reasoning

Grok 4.1 Fast — latest model, quick responses

Context256KMax Output33KInput$0.21/MOutput$0.53/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

xAI2x failover

Grok 4 Fast

xai/grok-4-fast

Grok 4 Fast — high-speed reasoning

Context256KMax Output33KInput$0.21/MOutput$0.53/MCache Read$1.58/M|Cache Write$0/M

ChatVisionReasoningCodecachetools

xAI2x failover

Grok 4 Fast Non Reasoning

xai/grok-4-fast-non-reasoning

Grok 4 Fast — quick responses without deep reasoning

Context256KMax Output33KInput$0.21/MOutput$0.53/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

xAI

Grok 3

xai/grok-3

xAI flagship with deep reasoning and real-time knowledge

Context128KMax Output16KInput$3.15/MOutput$15.75/MCache Read$1.58/M|Cache Write$0/M

ChatReasoningCodecachetools

xAI

Grok 3 Mini

xai/grok-3-mini

Fast and affordable Grok for everyday tasks

Context128KMax Output16KInput$0.32/MOutput$0.53/MCache Read$0.16/M|Cache Write$0/M

ChatReasoningCodecachetools

xAI

Grok 2

xai/grok-2

Previous generation Grok model

Context128KMax Output8KInput$2.10/MOutput$10.50/MCache Read$1.05/M|Cache Write$0/M

ChatCodecachetools

Moonshot4x failover

Kimi K2.5

moonshot/kimi-k2.5

Kimi K2.5, multimodal flagship with 262K context

Context262KMax Output16KInput$0.59/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatVisionReasoningCodetools

Moonshot2x failover

Kimi K2 Thinking

moonshot/kimi-k2-thinking

Kimi K2 thinking model with deep reasoning

Context131KMax Output16KInput$0.59/MOutput$0.37/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Doubao

Doubao 1.5 Pro 256k

doubao/doubao-1.5-pro-256k

ByteDance Doubao with 256K context

Context256KMax Output8KInput$0.58/MOutput$1.16/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Doubao

Doubao 1.5 Pro 32k

doubao/doubao-1.5-pro-32k

Doubao Pro with standard 32K context

Context32KMax Output8KInput$0.12/MOutput$0.28/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Doubao

Doubao 1.5 Lite 32k

doubao/doubao-1.5-lite-32k

Ultra-affordable Doubao for basic tasks

Context32KMax Output8KInput$0.03/MOutput$0.06/MCache Read$0/M|Cache Write$0/M

Chat

Meta4x failover

Llama 4 Maverick

meta/llama-4-maverick

Latest Llama with 1M context and multimodal support

Context1MMax Output33KInput$0.28/MOutput$0.89/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Meta3x failover

Llama 4 Scout

meta/llama-4-scout

Efficient Llama 4 variant with 512K context

Context512KMax Output33KInput$0.62/MOutput$0.92/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Meta2x failover

Llama 3.3 70b

meta/llama-3.3-70b

Strong open-source model for general tasks

Context128KMax Output8KInput$0.76/MOutput$0.76/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Meta2x failover

Llama 3.1 405b

meta/llama-3.1-405b

Largest open-source model, near-frontier performance

Context128KMax Output8KInput$0.68/MOutput$0.84/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Meta2x failover

Llama 3.1 70b

meta/llama-3.1-70b

Versatile 70B model with good cost-performance ratio

Context128KMax Output8KInput$0.37/MOutput$0.47/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Meta2x failover

Llama 3.1 8b

meta/llama-3.1-8b

Lightweight and fast, ideal for simple tasks

Context128KMax Output8KInput$0.19/MOutput$0.19/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Meta2x failover

Llama 3.2 90b

meta/llama-3.2-90b

Llama 3.2 vision model with 90B parameters

Context128KMax Output4KInput$2.10/MOutput$2.10/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Meta2x failover

Llama 3.2 11b

meta/llama-3.2-11b

Llama 3.2 vision model with 11B parameters

Context128KMax Output4KInput$0.37/MOutput$0.37/MCache Read$0/M|Cache Write$0/M

ChatVisiontools

Meta2x failover

Llama 3.2 3b

meta/llama-3.2-3b

Compact Llama 3.2 for lightweight tasks

Context128KMax Output4KInput$0.16/MOutput$0.16/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Meta2x failover

Llama 3.2 1b

meta/llama-3.2-1b

Smallest Llama 3.2, ultra-fast and ultra-cheap

Context128KMax Output4KInput$0.10/MOutput$0.10/MCache Read$0/M|Cache Write$0/M

Chat

Mistral2x failover

Mistral Large

mistral/mistral-large

Mistral flagship, strong multilingual and reasoning

Context262KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Mistral

Pixtral Large

mistral/pixtral-large

Multimodal model with vision capabilities

Context128KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Mistral3x failover

Mistral Large 3

mistral/mistral-large-3

Mistral Large 3 675B, flagship model for complex tasks

Context131KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatVisionReasoningCodetools

Mistral2x failover

Devstral 2

mistral/devstral-2

Devstral 2 123B, purpose-built for software engineering

Context131KMax Output8KInput$0.42/MOutput$1.05/MCache Read$0/M|Cache Write$0/M

ChatVisionCodetools

Mistral2x failover

Magistral Small

mistral/magistral-small

Magistral Small with strong reasoning at low cost

Context131KMax Output16KInput$0.53/MOutput$1.58/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

Mistral2x failover

Ministral 14b

mistral/ministral-14b

Ministral 14B, balanced small model

Context128KMax Output8KInput$0.21/MOutput$0.21/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Mistral2x failover

Codestral

mistral/codestral

Specialized code generation model by Mistral

Context256KMax Output16KInput$0.32/MOutput$0.94/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Mistral

Voxtral Small

mistral/voxtral-small

Voxtral Small — speech-to-text

Context32KMax Output4KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatAudioSTT

Mistral

Voxtral Mini

mistral/voxtral-mini

Voxtral Mini — compact speech-to-text

Context32KMax Output4KInput$0.19/MOutput$0.58/MCache Read$0/M|Cache Write$0/M

ChatAudioSTT

MiniMax3x failover

Minimax M2.5

minimax/minimax-m2.5

MiniMax M2.5, fast output with reasoning

Context196KMax Output16KInput$0.30/MOutput$1.26/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

MiniMax2x failover

Minimax M2.1

minimax/minimax-m2.1

MiniMax M2.1 with web search support

Context196KMax Output16KInput$0.30/MOutput$1.26/MCache Read$0/M|Cache Write$0/M

ChatCodetools

MiniMax3x failover

Minimax M2

minimax/minimax-m2

MiniMax M2, solid general-purpose model via Bedrock

Context196KMax Output16KInput$0.23/MOutput$0.93/MCache Read$0/M|Cache Write$0/M

ChatCodetools

MiniMax3x failover

Minimax M2.7

minimax/minimax-m2.7

MiniMax M2.7 — latest reasoning and code capabilities

Context196KMax Output16KInput$0.30/MOutput$1.26/MCache Read$0/M|Cache Write$0/M

ChatReasoningCodetools

MiniMax

Speech 2.8 Hd

minimax/speech-2.8-hd

MiniMax Speech 2.8 HD — high-quality TTS

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

MiniMax

Speech 2.8 Turbo

minimax/speech-2.8-turbo

MiniMax Speech 2.8 Turbo — fast TTS

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTS

Nova

Nova Micro

amazon/nova-micro

Amazon fastest text-only model, ultra-low cost

Context128KMax Output4KInput$0.04/MOutput$0.15/MCache Read$0/M|Cache Write$0/M

Chat

Nova

Nova Lite

amazon/nova-lite

Multimodal model for image, video and text at low cost

Context300KMax Output4KInput$0.06/MOutput$0.25/MCache Read$0/M|Cache Write$0/M

ChatVision

Nova

Nova Pro

amazon/nova-pro

Amazon most capable Nova for accuracy and complex tasks

Context300KMax Output4KInput$0.84/MOutput$3.36/MCache Read$0/M|Cache Write$0/M

ChatVisionCode

Nova

Nova Premier

amazon/nova-premier

Amazon flagship model for complex reasoning with 1M context

Context1MMax Output4KInput$2.63/MOutput$13.13/MCache Read$0/M|Cache Write$0/M

ChatVisionReasoningCode

Nova

Nova 2 Lite

amazon/nova-2-lite

Amazon Nova 2 Lite — fast and affordable

Context300KMax Output8KInput$0.06/MOutput$0.25/MCache Read$0/M|Cache Write$0/M

ChatVision

Nova

Nova 2 Pro

amazon/nova-2-pro

Amazon Nova 2 Pro — advanced reasoning and vision

Context300KMax Output8KInput$0.84/MOutput$3.36/MCache Read$0/M|Cache Write$0/M

ChatVisionCode

Nova

Nova Embed Multimodal

amazon/nova-embed-multimodal

Amazon Nova Embed — text and image embedding

Context16KMax Output—Input$0.19/MOutput$0/MCache Read$0/M|Cache Write$0/M

EmbeddingVision

Nova

Nova Sonic

amazon/nova-sonic

Amazon Nova Sonic — speech and audio model

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

TTSSTTAudio

Nova

Nova Reel 1.1

amazon/nova-reel-1.1

Amazon Nova Reel 1.1 — video generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Nova

Nova Reel 1.0

amazon/nova-reel-1.0

Amazon Nova Reel 1.0 — video generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Cohere2x failover

Command R Plus

cohere/command-r-plus

Enterprise-grade RAG and tool use specialist

Context128KMax Output4KInput$2.63/MOutput$10.50/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Cohere2x failover

Command R

cohere/command-r

Efficient model optimized for retrieval tasks

Context128KMax Output4KInput$0.16/MOutput$0.63/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Cohere2x failover

Command A

cohere/command-a

Latest Command model with improved reasoning

Context256KMax Output8KInput$2.63/MOutput$10.50/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Cohere

Embed V4

cohere/embed-v4

Cohere Embed v4 — state-of-the-art embedding model

Context128KMax Output—Input$0.06/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

Cohere

Embed Multilingual V3

cohere/embed-multilingual-v3

Cohere multilingual embedding, 100+ languages

Context1KMax Output—Input$0.10/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

Cohere

Rerank 3.5

cohere/rerank-3.5

Cohere Rerank 3.5 — search result reranking

Context4KMax Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

rerank

NVIDIA

Nemotron Super 3 120b

nvidia/nemotron-super-3-120b

NVIDIA Nemotron Super 3 120B, top-tier open model

Context131KMax Output16KInput$0.68/MOutput$2.63/MCache Read$0/M|Cache Write$0/M

ChatReasoningCode

NVIDIA

Nemotron Nano 3 30b

nvidia/nemotron-nano-3-30b

NVIDIA Nemotron Nano 3 30B, efficient and fast

Context131KMax Output16KInput$0.19/MOutput$0.58/MCache Read$0/M|Cache Write$0/M

ChatCode

Gemma

Gemma 3 27b

google/gemma-3-27b

Google Gemma 3 27B, capable open model with vision

Context128KMax Output8KInput$0.21/MOutput$0.58/MCache Read$0/M|Cache Write$0/M

ChatVisionCode

Gemma

Gemma 3 12b

google/gemma-3-12b

Google Gemma 3 12B, balanced open model

Context128KMax Output8KInput$0.10/MOutput$0.29/MCache Read$0/M|Cache Write$0/M

ChatVisionCode

Gemma

Gemma 3 4b

google/gemma-3-4b

Google Gemma 3 4B, ultra-compact and fast

Context128KMax Output8KInput$0.05/MOutput$0.13/MCache Read$0/M|Cache Write$0/M

ChatVision

Jamba

Jamba 1.5 Large

ai21/jamba-1.5-large

AI21 hybrid SSM-Transformer with 256K context window

Context256KMax Output4KInput$2.10/MOutput$8.40/MCache Read$0/M|Cache Write$0/M

ChatCode

Jamba

Jamba 1.5 Mini

ai21/jamba-1.5-mini

Compact Jamba model, fast and affordable with 256K context

Context256KMax Output4KInput$0.21/MOutput$0.42/MCache Read$0/M|Cache Write$0/M

ChatCode

Phi

Phi 4

microsoft/phi-4

Phi-4 — efficient small language model

Context16KMax Output4KInput$0.08/MOutput$0.16/MCache Read$0/M|Cache Write$0/M

ChatCodetools

Phi

Phi 4 Reasoning

microsoft/phi-4-reasoning

Phi-4 Reasoning — enhanced chain-of-thought

Context16KMax Output4KInput$0.08/MOutput$0.16/MCache Read$0/M|Cache Write$0/M

ChatReasoning

Phi

Phi 4 Mini

microsoft/phi-4-mini

Phi-4 Mini — compact and efficient

Context16KMax Output4KInput$0.05/MOutput$0.10/MCache Read$0/M|Cache Write$0/M

Chattools

Phi

Phi 4 Multimodal

microsoft/phi-4-multimodal

Phi-4 Multimodal — vision-capable small model

Context16KMax Output4KInput$0.08/MOutput$0.16/MCache Read$0/M|Cache Write$0/M

ChatVisionAudiotools

Writer

Palmyra X5

writer/palmyra-x5

Palmyra X5 — enterprise AI writing and analysis

Context128KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatCode

Writer

Palmyra X4

writer/palmyra-x4

Palmyra X4 — versatile enterprise model

Context128KMax Output8KInput$1.58/MOutput$5.25/MCache Read$0/M|Cache Write$0/M

ChatCode

Writer

Palmyra Vision

writer/palmyra-vision

Palmyra Vision — multimodal document understanding

Context128KMax Output8KInput$2.10/MOutput$6.30/MCache Read$0/M|Cache Write$0/M

ChatVision

TwelveLabs

Marengo Embed 3.0

twelvelabs/marengo-embed-3.0

Marengo Embed 3.0 — multimodal video embedding

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

EmbeddingVision

Kling

Kling V3 Video

kling/kling-v3-video

Kling V3 — high-quality AI video generation by Kuaishou

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Kling

Kling V3 Omni Video

kling/kling-v3-omni-video

Kling V3 Omni — multi-modal video generation with enhanced quality

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

PixVerse

Pixverse V6

pixverse/pixverse-v6

PixVerse V6 — general-purpose AI video generation with multi-shot support

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

PixVerse

Pixverse C1

pixverse/pixverse-c1

PixVerse C1 — dynamic scenes, fighting and magic effects

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Vidu

Viduq3 Pro

vidu/viduq3-pro

Vidu Q3 Pro — professional text-to-video generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Vidu

Viduq3 Turbo

vidu/viduq3-turbo

Vidu Q3 Turbo — fast text-to-video at lower cost

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Azure

Mai Image 2

azure/mai-image-2

Microsoft MAI-Image-2 — photorealistic image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Azure

Flux Kontext Pro

azure/flux-kontext-pro

FLUX.1 Kontext Pro — context-aware image editing via Azure

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Google

Lyria 3 Pro Preview

google/lyria-3-pro-preview

Lyria 3 Pro — full-length AI music generation by Google DeepMind

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Audio

Google

Lyria 3 Clip Preview

google/lyria-3-clip-preview

Lyria 3 Clip — 30-second AI music clips by Google DeepMind

Context—Max Output—Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Audio

Google

Imagen 4.0

google/imagen-4.0

Imagen 4.0 — high-quality image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Google

Imagen 4.0 Fast

google/imagen-4.0-fast

Imagen 4.0 Fast — quick affordable image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Bedrock

Titan Embed Text V2

amazon/titan-embed-text-v2

Amazon Titan Text Embed v2

Context8KMax Output—Input$0.21/MOutput$0/MCache Read$0/M|Cache Write$0/M

Embedding

Azure

Flux 2 Pro

azure/flux-2-pro

FLUX.2 Pro — professional image generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Google

Imagen 4.0 Ultra

google/imagen-4.0-ultra

Imagen 4.0 Ultra — highest quality generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

Image Gen

Google

Veo 3.1

google/veo-3.1

Veo 3.1 — high-quality video generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Google

Veo 3.1 Fast

google/veo-3.1-fast

Veo 3.1 Fast — quick video generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen

Google

Veo 3.1 Lite

google/veo-3.1-lite

Veo 3.1 Lite — affordable video generation

Input$0/MOutput$0/MCache Read$0/M|Cache Write$0/M

video-gen