Saltearse al contenido

Authentication

Esta página aún no está disponible en tu idioma.

InferenceKey uses two kinds of token, one per plane, so that the credentials your application ships with can only do what that application actually needs. This guide covers the token formats, what each can and cannot do, the scopes on the control token, how a token is resolved from your environment, how tokens are redacted in logs and errors, and the typed errors you get when a token lands on the wrong client.

If you just want to get a key and make a call, start with Tokens. For the API surface, see the Python and TypeScript references.

Two tokens, two planes

InferenceKey separates the control plane (provision and reconcile workloads) from the data plane (call inference). Each plane has its own token type and its own client, and neither token can do the other’s job.

ik_sdk_ — control plane

Held by the ManagementClient. Scoped to one project. Used to declare workloads and reconcile them with ensure(). Cannot call inference.

ik_live_ — data plane

Passed per workload to the DataClient endpoint. Used to call the OpenAI-compatible inference routes. Cannot provision workloads.

One application typically holds one ik_sdk_ token (it manages a project’s workloads) and many ik_live_ tokens — one per workload, so that the credential for your support-bot cannot call your billing embedder. This is the least-privilege model: a leaked data key exposes exactly one workload’s inference, never your provisioning.

Token formats

Both tokens share the same shape: a fixed prefix that names the plane, an 8-hex public identifier, and a 64-hex secret.

TokenFormatPlaneExample (redacted)
Controlik_sdk_<8hex>_<64hex>Controlik_sdk_…
Dataik_live_<8hex>_<64hex>Dataik_live_…
  • The <8hex> segment is a public token identifier — safe to show in dashboards and logs, useful for telling two keys apart.
  • The <64hex> segment is the secret. Treat the whole string as a secret and never log it; see Redaction.
  • The prefix is load-bearing: the SDK inspects ik_sdk_ vs ik_live_ to route the token to the correct plane and to raise a wrong_credential_type error early if it is on the wrong client.
Token anatomy
ik_sdk_ a1b2c3d4 _ e5f6...<64 hex chars total>...9a0b
└─prefix─┘└─8 hex─┘ └──────────── 64 hex secret ─────────────┘
plane id never log this

Can and cannot

ik_sdk_ — control plane only
from inferencekey import ManagementClient, WorkloadSpec, Backend
mgmt = ManagementClient.from_env(project="acme") # reads INFERENCEKEY_SDK_TOKEN
# CAN: provision / reconcile workloads in the acme project
ref = mgmt.ensure(WorkloadSpec(
name="support-bot", slug="support-bot",
model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM,
command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
))
# CANNOT: call inference. An ik_sdk_ token has no data-plane access.
ik_live_ — data plane only
from inferencekey import DataClient
data = DataClient.from_env(project="acme")
# CAN: call inference on a workload you hold the ik_live_ key for
ep = data.endpoint("support-bot", api_key="ik_live_...")
out = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)
print(out.text, out.model)
# CANNOT: create or patch workloads. An ik_live_ token has no control-plane access.
Capabilityik_sdk_ (control)ik_live_ (data)
Create / reconcile workloads (ensure)YesNo
Patch / list workloadsYesNo
Call inference (generate_text, embed, …)NoYes
ScopeOne projectOne workload
Held byManagementClientDataClient endpoint (per call)

Control token scopes

An ik_sdk_ token carries scopes that bound what it may do inside its project. A token minted for the SDK provisioning flow gets the two write scopes by default:

ScopeGrantsDefault on new SDK tokens
workload:writeCreate and patch workloads (POST /workloads, PATCH /workloads/:id).Yes
assignment:writeAssign workloads to workers / GPU resources during reconcile.Yes
workload:readList and read workloads (GET list).Read-only tokens

The defaults exist so that mgmt.ensure(...) works end to end: it needs workload:write to create or patch the workload and assignment:write to bind it to where it runs. If you mint a read-only token — for an audit job or a dashboard that only lists workloads — give it workload:read alone. Calling a write route with a read-only token raises scope_insufficient.

Resolution precedence

Both clients resolve their token and project from two sources, in this order. Explicit argument wins, then environment variable.

  1. Explicit — a value you pass in code (api_key="ik_live_...", project="acme"). Highest precedence; always wins.

  2. Environment variableINFERENCEKEY_SDK_TOKEN, INFERENCEKEY_API_KEY, INFERENCEKEY_PROJECT, INFERENCEKEY_BASE_URL. This is what from_env / fromEnv read.

This is why DataClient.from_env(project="acme") then data.endpoint(slug, api_key="ik_live_...") works cleanly: the per-call api_key is explicit and overrides any INFERENCEKEY_API_KEY in the environment, so one DataClient can fan out to many workloads each with its own key.

Precedence in practice
import os
from inferencekey import DataClient
os.environ["INFERENCEKEY_API_KEY"] = "ik_live_default..." # env default
data = DataClient.from_env(project="acme")
# Uses the env default (INFERENCEKEY_API_KEY)
data.endpoint("support-bot").generate_text(prompt="hi")
# Explicit api_key overrides the env default for this workload
data.endpoint("billing", api_key="ik_live_billing...").embed(input=["a", "b"])

Environment variables

The four environment variables and which client reads each. The control client never reads INFERENCEKEY_API_KEY; the data client never reads INFERENCEKEY_SDK_TOKEN.

VariableManagementClient (control)DataClient (data)Purpose
INFERENCEKEY_SDK_TOKENYesThe ik_sdk_ control token.
INFERENCEKEY_API_KEYYesThe default ik_live_ data token (overridable per call).
INFERENCEKEY_PROJECTYesYesDefault project slug when project= is not passed.
INFERENCEKEY_BASE_URLYesYesAPI base URL (point at staging or self-hosted).
.env — set once, resolve everywhere
export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"
export INFERENCEKEY_PROJECT="acme"
export INFERENCEKEY_SDK_TOKEN="ik_sdk_a1b2c3d4_..." # control client
export INFERENCEKEY_API_KEY="ik_live_e5f6a7b8_..." # data client default

Redaction

The SDK and the platform redact tokens to their prefix only in logs, error messages, and diagnostics. The 8-hex identifier and the 64-hex secret are never emitted together.

  • A token that appears in an error or log is shown as its prefix — ik_sdk_… or ik_live_… — so you can tell which plane the offending credential belongs to without leaking the secret.
  • This applies to the typed wrong-token errors below: the message tells you the credential type that was rejected, not the credential itself.
  • You should follow the same rule in your own code: never log a raw token. If you must identify a key, log only its prefix (and, if you must, its public <8hex> id from the dashboard — never the secret).

Wrong-token errors

When a token reaches the wrong client or lacks the scope for a route, the platform returns a 403 with a typed reason. The three reasons:

ReasonMeaningTypical cause
wrong_credential_typeRight format, wrong plane.An ik_live_ token on the ManagementClient, or an ik_sdk_ token on a DataClient endpoint.
project_scope_mismatchThe token is valid but scoped to a different project than the one the call targets.ManagementClient.from_env(project="acme") with an ik_sdk_ token minted for another project.
scope_insufficientRight plane and project, but the token’s scopes don’t cover the route.A read-only (workload:read) ik_sdk_ token calling ensure().

How they surface per language

Both bindings map these 403s onto the typed PermissionDenied exception (a subclass of InferenceKeyError), so you can catch them and branch on the reason. Related auth failures surface as AuthError (bad or expired token) and validation problems as ValidationError.

Catching wrong-token errors
from inferencekey import (
ManagementClient, DataClient, WorkloadSpec, Backend,
PermissionDenied, AuthError,
)
mgmt = ManagementClient.from_env(project="acme")
try:
mgmt.ensure(WorkloadSpec(
name="support-bot", slug="support-bot",
model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM,
command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
))
except PermissionDenied as e:
# e carries the typed reason: wrong_credential_type,
# project_scope_mismatch, or scope_insufficient.
print("rejected:", e)
except AuthError:
print("token is missing, malformed, or expired")

Where to next


New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.