Vertex AI
Configure Vertex AI as an LLM provider in agentgateway.
Before you begin
Set up an agentgateway proxy.
Set up access to Vertex AI
-
Set up authentication for Vertex AI. Make sure to have your:
- Google Cloud Project ID
- Project location, such as
us-central1 - API key or service account credentials
-
Save your Vertex AI API key as an environment variable.
export VERTEX_AI_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Vertex AI API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: vertex-ai-secret namespace: kgateway-system type: Opaque stringData: Authorization: $VERTEX_AI_API_KEY EOF
Create an AgentgatewayBackend resource to configure an LLM provider that references the AI API key secret.
kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: vertex-ai
namespace: kgateway-system
spec:
ai:
provider:
vertexai:
model: gemini-pro
projectId: "my-gcp-project"
region: "us-central1"
policies:
auth:
secretRef:
name: vertex-ai-secret
EOFCreate an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. The following example sets up a route on the /openai path to the AgentgatewayBackend that you previously created. The URLRewrite filter rewrites the path from /openai to the path of the API in the LLM provider that you want to use, /v1/chat/completions.
kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: vertex-ai
namespace: kgateway-system
spec:
parentRefs:
- name: agentgateway
namespace: kgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /vertex
backendRefs:
- name: vertex-ai
namespace: kgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
EOF-
Send a request to the LLM provider API. Verify that the request succeeds and that you get back a response from the API.
curl "$INGRESS_GW_ADDRESS/vertex" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "user", "content": "Write me a short poem about Kubernetes and clouds." } ] }' | jqcurl "localhost:8080/vertex" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "user", "content": "Write me a short poem about Kubernetes and clouds." } ] }' | jqExample output:
{ "id": "chatcmpl-vertex-12345", "object": "chat.completion", "created": 1727967462, "model": "gemini-pro", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "In the cloud, Kubernetes reigns,\nOrchestrating pods with great care,\nContainers float like clouds,\nScaling up and down,\nAutomation everywhere." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 28, "total_tokens": 40 } }
Next steps
- Want to use other endpoints than chat completions, such as embeddings or models? Check out the multiple endpoints guide.
- Explore other guides for LLM consumption, such as function calling, model failover, and prompt guards.