ik_sdk_ — control plane
Held by the ManagementClient. Scoped to one project. Used to declare workloads and reconcile them with ensure(). Cannot call inference.
Esta página aún no está disponible en tu idioma.
InferenceKey uses two kinds of token, one per plane, so that the credentials your application ships with can only do what that application actually needs. This guide covers the token formats, what each can and cannot do, the scopes on the control token, how a token is resolved from your environment, how tokens are redacted in logs and errors, and the typed errors you get when a token lands on the wrong client.
If you just want to get a key and make a call, start with Tokens. For the API surface, see the Python and TypeScript references.
InferenceKey separates the control plane (provision and reconcile workloads) from the data plane (call inference). Each plane has its own token type and its own client, and neither token can do the other’s job.
ik_sdk_ — control plane
Held by the ManagementClient. Scoped to one project. Used to declare workloads and reconcile them with ensure(). Cannot call inference.
ik_live_ — data plane
Passed per workload to the DataClient endpoint. Used to call the OpenAI-compatible inference routes. Cannot provision workloads.
One application typically holds one ik_sdk_ token (it manages a project’s workloads) and many ik_live_ tokens — one per workload, so that the credential for your support-bot cannot call your billing embedder. This is the least-privilege model: a leaked data key exposes exactly one workload’s inference, never your provisioning.
Both tokens share the same shape: a fixed prefix that names the plane, an 8-hex public identifier, and a 64-hex secret.
| Token | Format | Plane | Example (redacted) |
|---|---|---|---|
| Control | ik_sdk_<8hex>_<64hex> | Control | ik_sdk_… |
| Data | ik_live_<8hex>_<64hex> | Data | ik_live_… |
<8hex> segment is a public token identifier — safe to show in dashboards and logs, useful for telling two keys apart.<64hex> segment is the secret. Treat the whole string as a secret and never log it; see Redaction.ik_sdk_ vs ik_live_ to route the token to the correct plane and to raise a wrong_credential_type error early if it is on the wrong client.ik_sdk_ a1b2c3d4 _ e5f6...<64 hex chars total>...9a0b└─prefix─┘└─8 hex─┘ └──────────── 64 hex secret ─────────────┘ plane id never log thisfrom inferencekey import ManagementClient, WorkloadSpec, Backend
mgmt = ManagementClient.from_env(project="acme") # reads INFERENCEKEY_SDK_TOKEN
# CAN: provision / reconcile workloads in the acme projectref = mgmt.ensure(WorkloadSpec( name="support-bot", slug="support-bot", model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM, command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",))
# CANNOT: call inference. An ik_sdk_ token has no data-plane access.from inferencekey import DataClient
data = DataClient.from_env(project="acme")
# CAN: call inference on a workload you hold the ik_live_ key forep = data.endpoint("support-bot", api_key="ik_live_...")out = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)print(out.text, out.model)
# CANNOT: create or patch workloads. An ik_live_ token has no control-plane access.import { ManagementClient, Backend } from "@inferencekey/sdk";
const mgmt = ManagementClient.fromEnv({ project: "acme" }); // reads INFERENCEKEY_SDK_TOKEN
// CAN: provision / reconcile workloads in the acme projectconst ref = await mgmt.ensure({ name: "support-bot", slug: "support-bot", model: "meta-llama/Llama-3.1-8B-Instruct", backend: Backend.Vllm, command: "vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",});
// CANNOT: call inference. An ik_sdk_ token has no data-plane access.import { DataClient } from "@inferencekey/sdk";
const data = DataClient.fromEnv({ project: "acme" });
// CAN: call inference on a workload you hold the ik_live_ key forconst ep = data.endpoint("support-bot", { apiKey: process.env.SUPPORT_IK_LIVE });const out = await ep.generateText({ prompt: "Hola", temperature: 0.2, maxTokens: 300 });console.log(out.text, out.model);
// CANNOT: create or patch workloads. An ik_live_ token has no control-plane access.| Capability | ik_sdk_ (control) | ik_live_ (data) |
|---|---|---|
Create / reconcile workloads (ensure) | Yes | No |
| Patch / list workloads | Yes | No |
Call inference (generate_text, embed, …) | No | Yes |
| Scope | One project | One workload |
| Held by | ManagementClient | DataClient endpoint (per call) |
An ik_sdk_ token carries scopes that bound what it may do inside its project. A token minted for the SDK provisioning flow gets the two write scopes by default:
| Scope | Grants | Default on new SDK tokens |
|---|---|---|
workload:write | Create and patch workloads (POST /workloads, PATCH /workloads/:id). | Yes |
assignment:write | Assign workloads to workers / GPU resources during reconcile. | Yes |
workload:read | List and read workloads (GET list). | Read-only tokens |
The defaults exist so that mgmt.ensure(...) works end to end: it needs workload:write to create or patch the workload and assignment:write to bind it to where it runs. If you mint a read-only token — for an audit job or a dashboard that only lists workloads — give it workload:read alone. Calling a write route with a read-only token raises scope_insufficient.
Both clients resolve their token and project from two sources, in this order. Explicit argument wins, then environment variable.
Explicit — a value you pass in code (api_key="ik_live_...", project="acme"). Highest precedence; always wins.
Environment variable — INFERENCEKEY_SDK_TOKEN, INFERENCEKEY_API_KEY, INFERENCEKEY_PROJECT, INFERENCEKEY_BASE_URL. This is what from_env / fromEnv read.
This is why DataClient.from_env(project="acme") then data.endpoint(slug, api_key="ik_live_...") works cleanly: the per-call api_key is explicit and overrides any INFERENCEKEY_API_KEY in the environment, so one DataClient can fan out to many workloads each with its own key.
import osfrom inferencekey import DataClient
os.environ["INFERENCEKEY_API_KEY"] = "ik_live_default..." # env default
data = DataClient.from_env(project="acme")
# Uses the env default (INFERENCEKEY_API_KEY)data.endpoint("support-bot").generate_text(prompt="hi")
# Explicit api_key overrides the env default for this workloaddata.endpoint("billing", api_key="ik_live_billing...").embed(input=["a", "b"])import { DataClient } from "@inferencekey/sdk";
process.env.INFERENCEKEY_API_KEY = "ik_live_default..."; // env default
const data = DataClient.fromEnv({ project: "acme" });
// Uses the env default (INFERENCEKEY_API_KEY)await data.endpoint("support-bot").generateText({ prompt: "hi" });
// Explicit apiKey overrides the env default for this workloadawait data.endpoint("billing", { apiKey: "ik_live_billing..." }).embed({ input: ["a", "b"] });The four environment variables and which client reads each. The control client never reads INFERENCEKEY_API_KEY; the data client never reads INFERENCEKEY_SDK_TOKEN.
| Variable | ManagementClient (control) | DataClient (data) | Purpose |
|---|---|---|---|
INFERENCEKEY_SDK_TOKEN | Yes | — | The ik_sdk_ control token. |
INFERENCEKEY_API_KEY | — | Yes | The default ik_live_ data token (overridable per call). |
INFERENCEKEY_PROJECT | Yes | Yes | Default project slug when project= is not passed. |
INFERENCEKEY_BASE_URL | Yes | Yes | API base URL (point at staging or self-hosted). |
export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"export INFERENCEKEY_PROJECT="acme"export INFERENCEKEY_SDK_TOKEN="ik_sdk_a1b2c3d4_..." # control clientexport INFERENCEKEY_API_KEY="ik_live_e5f6a7b8_..." # data client defaultThe SDK and the platform redact tokens to their prefix only in logs, error messages, and diagnostics. The 8-hex identifier and the 64-hex secret are never emitted together.
ik_sdk_… or ik_live_… — so you can tell which plane the offending credential belongs to without leaking the secret.<8hex> id from the dashboard — never the secret).When a token reaches the wrong client or lacks the scope for a route, the platform returns a 403 with a typed reason. The three reasons:
| Reason | Meaning | Typical cause |
|---|---|---|
wrong_credential_type | Right format, wrong plane. | An ik_live_ token on the ManagementClient, or an ik_sdk_ token on a DataClient endpoint. |
project_scope_mismatch | The token is valid but scoped to a different project than the one the call targets. | ManagementClient.from_env(project="acme") with an ik_sdk_ token minted for another project. |
scope_insufficient | Right plane and project, but the token’s scopes don’t cover the route. | A read-only (workload:read) ik_sdk_ token calling ensure(). |
Both bindings map these 403s onto the typed PermissionDenied exception (a subclass of InferenceKeyError), so you can catch them and branch on the reason. Related auth failures surface as AuthError (bad or expired token) and validation problems as ValidationError.
from inferencekey import ( ManagementClient, DataClient, WorkloadSpec, Backend, PermissionDenied, AuthError,)
mgmt = ManagementClient.from_env(project="acme")
try: mgmt.ensure(WorkloadSpec( name="support-bot", slug="support-bot", model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM, command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192", ))except PermissionDenied as e: # e carries the typed reason: wrong_credential_type, # project_scope_mismatch, or scope_insufficient. print("rejected:", e)except AuthError: print("token is missing, malformed, or expired")import { ManagementClient, Backend, PermissionDenied, AuthError,} from "@inferencekey/sdk";
const mgmt = ManagementClient.fromEnv({ project: "acme" });
try { await mgmt.ensure({ name: "support-bot", slug: "support-bot", model: "meta-llama/Llama-3.1-8B-Instruct", backend: Backend.Vllm, command: "vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192", });} catch (e) { if (e instanceof PermissionDenied) { // e carries the typed reason: wrong_credential_type, // project_scope_mismatch, or scope_insufficient. console.error("rejected:", e.message); } else if (e instanceof AuthError) { console.error("token is missing, malformed, or expired"); } else { throw e; }}New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.