Tokens

Esta página aún no está disponible en tu idioma.

InferenceKey uses two kinds of tokens, one per plane, so that the code that provisions workloads never holds the credential that can call them — and vice versa. This is least privilege by construction: a leaked data token cannot create or delete workloads, and a leaked control token cannot run inference or read prompts.

ik_sdk_ — control plane

Provisions and reconciles workloads. Held by your ManagementClient, scoped to one project. Cannot call inference.

ik_live_ — data plane

Calls inference. Passed per workload to the DataClient. One app can use many ik_live_ keys. Cannot provision.

At a glance

	`ik_live_`	`ik_sdk_`
Purpose	Call inference (generate, embed, …)	Provision & reconcile workloads
Plane	Data	Control
Scope	Per workload, or whole project (`projectSlug/*`)	Per project
Stored in	`INFERENCEKEY_API_KEY` (default) or passed per call	`INFERENCEKEY_SDK_TOKEN`
Can call inference?	Yes	No
Can provision?	No	Yes
SDK client	`DataClient`	`ManagementClient`
SDK placement	`data.endpoint(slug, api_key=…)`	`ManagementClient.from_env(...)`
Env var	`INFERENCEKEY_API_KEY`	`INFERENCEKEY_SDK_TOKEN`

Where each token goes in the SDK

The control token is read once, from the environment, when you build the ManagementClient. The data token is passed per workload when you resolve an endpoint — it is never baked into the management client.

Python
TypeScript

from inferencekey import ManagementClient, DataClient, WorkloadSpec, Backend

# Control plane: ik_sdk_ comes from INFERENCEKEY_SDK_TOKEN.
mgmt = ManagementClient.from_env(project="acme")
ref = mgmt.ensure(WorkloadSpec(
    name="support-bot", slug="support-bot",
    model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM,
    command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
))

# Data plane: ik_live_ is passed per workload.
data = DataClient.from_env(project="acme")
ep = data.endpoint(ref.workload_slug, api_key="ik_live_...")
out = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)
print(out.text)

import { ManagementClient, DataClient, Backend } from "@inferencekey/sdk";

// Control plane: ik_sdk_ comes from INFERENCEKEY_SDK_TOKEN.
const mgmt = ManagementClient.fromEnv({ project: "acme" });
const ref = await mgmt.ensure({
  name: "support-bot", slug: "support-bot",
  model: "meta-llama/Llama-3.1-8B-Instruct", backend: Backend.Vllm,
  command: "vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
});

// Data plane: ik_live_ is passed per workload.
const data = DataClient.fromEnv({ project: "acme" });
const ep = data.endpoint(ref.workloadSlug, { apiKey: process.env.SUPPORT_IK_LIVE });
const out = await ep.generateText({ prompt: "Hola", temperature: 0.2, maxTokens: 300 });
console.log(out.text);

Environment variables

Resolution order is explicit argument > environment variable.

export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"
export INFERENCEKEY_PROJECT="acme"
export INFERENCEKEY_SDK_TOKEN="ik_sdk_..."   # control plane (ManagementClient)
export INFERENCEKEY_API_KEY="ik_live_..."    # data plane default (DataClient)

INFERENCEKEY_API_KEY is the default ik_live_ key. When an app calls several workloads with different keys, pass each one explicitly to data.endpoint(slug, api_key=…) and the explicit value wins over the env default.

Scopes for `ik_sdk_`

The control token is project-scoped: it can act only inside the single project it was issued for, and only through the control routes. Its capabilities map to the workload lifecycle:

Create workloads — POST /api/projects/:project_id/workloads
Update workloads — PATCH /api/workloads/:id
Delete workloads — DELETE /api/workloads/:id
List / read workloads — GET the workload list

What an ik_sdk_ token can not do:

Call inference (/endpoint/...) — that is the data plane’s job.
Act outside its project — a token issued for acme cannot touch beta.

Lifecycle

Create in the dashboard. Both ik_sdk_ and ik_live_ tokens are minted in the dashboard. Control tokens are issued for a project; data tokens are issued for a workload.
Copy it once. The full secret is shown only at creation time. The dashboard stores a hash, not the raw token — if you lose it, you cannot read it back, you rotate it.
Store it as a secret. Put ik_sdk_ in INFERENCEKEY_SDK_TOKEN and your ik_live_ keys in your secret manager / INFERENCEKEY_API_KEY (or pass them per workload). Never commit a token to source control.
Rotate or revoke. Generate a replacement and update the environment, then revoke the old token in the dashboard. Revoking takes effect immediately: control calls start returning 403, data calls start returning AuthError.

Authentication guide End-to-end setup: which token where, env precedence, and per-workload keys.

Get your tokens The quickstart path: create a project, mint both tokens, set your env.

Architecture Why control and data planes are split, and how the two clients map to them.

Common errors 403 wrong_credential_type, project_scope_mismatch, scope_insufficient, and AuthError.

New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.