Tokens & API keys

InferenceKey draws a hard line between provisioning workloads and calling them. Each side has its own token, and neither can do the other’s job. This is the single most important security property of the SDK: a leaked inference key can never reconfigure your infrastructure, and a management token can never run up an inference bill.

Before you start

You need an InferenceKey account and the SDK installed.

Python
TypeScript

pip install inferencekey

npm install @inferencekey/sdk

Then grab two tokens from the dashboard — that’s what the rest of this page covers.

The two tokens

ik_sdk_ — Control plane

Provisions and reconciles workloads. Held by your ManagementClient and scoped to one project. Use it from deploy scripts, CI, or a backend that declares what should exist. Cannot call inference.

ik_live_ — Data plane

Calls inference against a workload’s OpenAI-compatible endpoint. Passed per workload — one app can hold many of them, one per workload, each with its own scope. Cannot provision.

The prefix tells you which is which: control-plane tokens start with ik_sdk_, data-plane keys start with ik_live_. They are not interchangeable, and using one where the other is expected returns a 403 (see Least privilege below).

Where to get each one

Both live in the dashboard, but on different screens because they have different blast radius.

ik_sdk_ — the Tokens screen. Open your project in the dashboard and go to Tokens. Create an SDK token; it is bound to that project. This is the value your ManagementClient reads.
ik_live_ — the API Keys screen. Go to API Keys and mint a key. Choose its access scope: either specific endpoints (one or more workloads — the narrowest, recommended choice) or the entire project. Hand each app the narrowest key it needs. This is the value you pass to data.endpoint(...).

Project-wide keys

When you pick “Entire project” for a key’s scope, it is granted a projectSlug/* wildcard and can call every workload in the project, including ones created later. This is handy when you don’t want to re-issue or reassign a key every time you add a workload — for example minting the key before the workload exists, then ensure()-ing the workload from code.

Environment variables

The SDK loads tokens from the environment so secrets never have to appear in your source.

Variable	Token	Used by	Default for
`INFERENCEKEY_SDK_TOKEN`	`ik_sdk_…`	`ManagementClient`	The control plane
`INFERENCEKEY_API_KEY`	`ik_live_…`	`DataClient`	The default `ik_live_` key
`INFERENCEKEY_PROJECT`	—	both clients	The project slug
`INFERENCEKEY_BASE_URL`	—	both clients	The API base URL

Precedence

When a value can come from more than one place, the SDK resolves it in this order:

Explicit argument — anything you pass in code (e.g. project="acme", api_key="ik_live_…") always wins.
Environment variable — the INFERENCEKEY_* vars above.

So an api_key passed straight to data.endpoint(...) overrides INFERENCEKEY_API_KEY, and a project passed to from_env/fromEnv overrides INFERENCEKEY_PROJECT.

Per-language setup

Export the tokens, then construct each client with from_env / fromEnv. The management client reads INFERENCEKEY_SDK_TOKEN; the data client reads INFERENCEKEY_API_KEY unless you pass an api_key per endpoint.

Python
TypeScript

export INFERENCEKEY_SDK_TOKEN="ik_sdk_your_control_plane_token"
export INFERENCEKEY_API_KEY="ik_live_your_default_inference_key"
export INFERENCEKEY_PROJECT="acme"
# optional: export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"

from inferencekey import ManagementClient, DataClient

# Reads INFERENCEKEY_SDK_TOKEN (control plane).
mgmt = ManagementClient.from_env(project="acme")

# Reads INFERENCEKEY_API_KEY by default (data plane).
data = DataClient.from_env(project="acme")

# Override per workload with an explicit ik_live_ key — this wins over the env var.
ep = data.endpoint("support-bot", api_key="ik_live_...")

export INFERENCEKEY_SDK_TOKEN="ik_sdk_your_control_plane_token"
export INFERENCEKEY_API_KEY="ik_live_your_default_inference_key"
export INFERENCEKEY_PROJECT="acme"
# optional: export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"

import { ManagementClient, DataClient } from "@inferencekey/sdk";

// Reads INFERENCEKEY_SDK_TOKEN (control plane).
const mgmt = ManagementClient.fromEnv({ project: "acme" });

// Reads INFERENCEKEY_API_KEY by default (data plane).
const data = DataClient.fromEnv({ project: "acme" });

// Override per workload with an explicit ik_live_ key — this wins over the env var.
const ep = data.endpoint("support-bot", { apiKey: process.env.SUPPORT_IK_LIVE });

Least privilege: why two tokens

The split is enforced on the wire, not just by convention. Hand a token to the wrong client and the API rejects it with a 403:

Error code	What happened
`wrong_credential_type`	An `ik_live_` key was used to provision, or an `ik_sdk_` token was used to call inference.
`project_scope_mismatch`	The token is valid but scoped to a different project than the one you addressed.
`scope_insufficient`	The token’s scope does not cover the requested action or workload.

Keep it simple:

Provisioning code (deploy scripts, CI, reconcilers) holds only ik_sdk_.
Application code that serves users holds only the ik_live_ keys for the workloads it actually calls.

If a piece of code needs to do both, that’s a sign it should be split into a deploy step and a runtime step.

Next steps

Provision your first workload Use your ik_sdk_ token to ensure a workload exists.

Make your first call Use an ik_live_ key to hit the OpenAI-compatible endpoint.

Tokens reference Scopes, rotation, and the full credential model.

Common errors Decode 403s and the rest of the error surface.

New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.