GCP Vertex AI

Google's 1P (Gemini) + 3P (Claude / Llama / Mistral) multi-model token entry — the hyperscaler platform with the most varied model menu, eating enterprise market via Gemini + Anthropic dual lines.

1. Core Product / Service

Vertex AI Generative AI is GCP's flagship product at L3b; Vertex AI also has an ML Ops part, but this page focuses on Generative AI / token API:

Gemini series: Google's own 1P — Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, Gemini 1.5 Pro, Gemini 1.5 Flash, Imagen 3 / 4 (image gen), Veo 3 (video gen), Lyria (music).
Anthropic Claude: Vertex is the second-largest Claude hosting channel (after aws-bedrock) — Claude Sonnet 4 / Opus 4 / Haiku 4.5 all callable on Vertex, prices generally parity with Anthropic direct.
Meta Llama 3 / 3.1 / 3.3 + Mistral Large / Mixtral + AI21 Jamba + DeepSeek R1 / V3 and other open-source / third-party — accessed via Model Garden.
Imagen 3 / 4: Google 1P image generation.
Inference modes: Standard pay-per-token, Provisioned Throughput, Batch (async, ~50% discount).
Vertex AI Agents / Agent Builder: upper-layer agent / RAG framework, integrating Google Search / BigQuery / Drive.
Enterprise features: VPC Service Controls, CMEK, IAM, Data residency commitments, HIPAA / FedRAMP / SOC 2 / ISO 27001.

2. Target Users & Pain Points

GCP large customers: already using BigQuery / GCS — keeping Gemini / Claude in GCP is smoother.
Google Workspace integration scenarios: Gemini in Workspace is same-source; pulling data from Workspace via IAM is easy.
Multi-model hedging customers: Vertex is one of the few "all-top-models-except-GPT" sets — Gemini + Claude together is Vertex's biggest advantage vs Bedrock.
Pain points: same as Bedrock — data compliance / VPC deployment / enterprise contract processes.

3. Competitive Landscape

Competitor	Positioning	Vs. Vertex
aws-bedrock	Anthropic Claude main venue + Nova in-house	Bedrock has no Gemini; Vertex has both Gemini + Claude
azure-openai-service	OpenAI GPT exclusive	Vertex has no GPT; Vertex's only obvious gap
Anthropic direct API	1P direct	Price parity; Vertex provides enterprise compliance
Google AI Studio	Gemini direct (developer-first)	AI Studio is simple / free-tier-friendly; Vertex is the enterprise version
together-ai / fireworks-ai	3P open-source token API	Lower prices; but weak enterprise IAM

Differentiation: The only hyperscaler entry holding both top-tier 1P (Gemini) + top-tier 3P (Claude) models. Most "AI-neutral" model menu.

4. Unique Observations

Per-token pricing (Standard, 2026-05):
- Gemini 2.5 Pro: $1.25/M input + $10/M output (≤200K context; 2× surcharge over 200K)
- Gemini 2.5 Flash: $0.30/M input + $2.50/M output
- Gemini 2.5 Flash-Lite: $0.10/M input + $0.40/M output (ultra-low-end budget tier)
- Claude Sonnet 4: $3/M input + $15/M output — complete parity with Anthropic / Bedrock
- Claude Opus 4: $15/M input + $75/M output — parity
- Claude Haiku 4.5: $1/M input + $5/M output — parity
- Llama 3.3 70B: ~$0.72/M (on par with Bedrock, slightly higher than Together / Fireworks)
- Imagen 4: ~$0.04/image standard, ~$0.08/image high-quality [1]
vs 1P price gap (take rate): Claude on Vertex strictly parity with Anthropic direct, take rate nominally 0; same as Bedrock, recouping via ecosystem lock-in + investment returns (Google also holds significant Anthropic equity — reportedly cumulative $3B+ investment).
- Gemini is Google 1P: vs OpenAI / Claude direct, Gemini 2.5 Pro $1.25/M input is on par with GPT-5 $1.25/M input; Flash-Lite $0.10/M is market floor on par with GPT-5-Nano. Google prices Gemini Flash aggressively, aiming to capture long-tail from deepseek / Llama open-source budget tier.
vs 3P peers: Vertex's Llama / Mistral / DeepSeek prices are typically slightly higher than Together / Fireworks (5-15%), but enterprise customers pay the premium for GCP one-stop + IAM integration.
Inference engine: not publicly disclosed. Gemini runs on Google's own TPU v5e / v5p / Trillium + partial NVIDIA H100; Claude / Llama / Mistral mainly NVIDIA H100 / H200 + small TPU adaptation. Vertex is one of the few players with proprietary chips (TPU) amortizing inference costs — together with Trainium (AWS), Maia (Microsoft) forming a "hyperscaler proprietary chip trio."
Compute sourcing: 100% GCP first-party data centers; TPU is Google's accumulated weapon from 14 years of continuous capex. Gemini inference marginal cost on TPU is structurally lower than GPT / Claude running on NVIDIA — this is the physical basis for Google pricing Gemini Flash-Lite at $0.10/M and still having gross margin.
Anthropic relationship: Google and Amazon are both major investors in Anthropic — Anthropic's "double hyperscaler binding" strategy lets Claude be first-class available on both Bedrock + Vertex, a different hyperscaler relationship paradigm from OpenAI / GPT's deep Microsoft binding path.
Strategic risk: if Gemini model capability falls too far behind GPT-5 / Claude 4, Vertex becomes a me-too Bedrock "primarily distributing Claude"; Vertex also lacks GPT, so "customers using GPT" cannot be served at all — Google's main weakness in the hyperscaler model-alliance war.

5. Financials / Business Scale

GA date: Vertex AI 2021-05; Generative AI on Vertex 2023-06.
Google investment in Anthropic: cumulative ~$3B+ (including cloud compute commitment); together with Amazon forming "dual hyperscaler invest in Anthropic" pattern.
Google Cloud AI revenue: Vertex not separately disclosed; GCP overall 2025-Q4 revenue ~$12B/quarter, AI is major growth contributor.
Customers: Wendy's, Wayfair, Mercedes-Benz, Best Buy, Spotify, Shopify, Verizon, Mayo Clinic — public Vertex cases lean retail / media / health.

6. People & Relationships

Parent: Google Cloud Platform — Thomas Kurian (CEO GCP), Sundar Pichai (CEO Alphabet).
Vertex AI business lead: multiple reorganizations; Burak Gokturk (former GM AI products), etc.
Google AI / DeepMind: Demis Hassabis (CEO DeepMind) — Gemini model team, after DeepMind merged with Google AI, producing the full Gemini series.
Strategic partners: Anthropic (investment + Claude on Vertex), Meta (Llama), Mistral, AI21, DeepSeek.
Competes with: aws-bedrock, azure-openai-service, OpenAI direct API, together-ai / fireworks-ai (in open-source tier).
Hosts models from: Google (Gemini, Imagen, Veo, Lyria), Anthropic (Claude), Meta (Llama), Mistral, AI21, deepseek, Cohere, Stability AI, etc. — Model Garden 200+ models.

Sources

[1] https://cloud.google.com/vertex-ai/generative-ai/pricing (2026-05-10)
[2] https://cloud.google.com/vertex-ai (2026-05-10)
[3] https://cloud.google.com/blog/products/ai-machine-learning/anthropic-claude-on-vertex-ai (2026-05-10)
[4] https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models (2026-05-10)