Saltearse al contenido

Tokens

Esta página aún no está disponible en tu idioma.

InferenceKey uses two kinds of tokens, one per plane, so that the code that provisions workloads never holds the credential that can call them — and vice versa. This is least privilege by construction: a leaked data token cannot create or delete workloads, and a leaked control token cannot run inference or read prompts.

ik_sdk_ — control plane

Provisions and reconciles workloads. Held by your ManagementClient, scoped to one project. Cannot call inference.

ik_live_ — data plane

Calls inference. Passed per workload to the DataClient. One app can use many ik_live_ keys. Cannot provision.

At a glance

ik_live_ik_sdk_
PurposeCall inference (generate, embed, …)Provision & reconcile workloads
PlaneDataControl
ScopePer workload, or whole project (projectSlug/*)Per project
Stored inINFERENCEKEY_API_KEY (default) or passed per callINFERENCEKEY_SDK_TOKEN
Can call inference?YesNo
Can provision?NoYes
SDK clientDataClientManagementClient
SDK placementdata.endpoint(slug, api_key=…)ManagementClient.from_env(...)
Env varINFERENCEKEY_API_KEYINFERENCEKEY_SDK_TOKEN

Where each token goes in the SDK

The control token is read once, from the environment, when you build the ManagementClient. The data token is passed per workload when you resolve an endpoint — it is never baked into the management client.

provision.py
from inferencekey import ManagementClient, DataClient, WorkloadSpec, Backend
# Control plane: ik_sdk_ comes from INFERENCEKEY_SDK_TOKEN.
mgmt = ManagementClient.from_env(project="acme")
ref = mgmt.ensure(WorkloadSpec(
name="support-bot", slug="support-bot",
model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM,
command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
))
# Data plane: ik_live_ is passed per workload.
data = DataClient.from_env(project="acme")
ep = data.endpoint(ref.workload_slug, api_key="ik_live_...")
out = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)
print(out.text)

Environment variables

Resolution order is explicit argument > environment variable.

.env
export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"
export INFERENCEKEY_PROJECT="acme"
export INFERENCEKEY_SDK_TOKEN="ik_sdk_..." # control plane (ManagementClient)
export INFERENCEKEY_API_KEY="ik_live_..." # data plane default (DataClient)

INFERENCEKEY_API_KEY is the default ik_live_ key. When an app calls several workloads with different keys, pass each one explicitly to data.endpoint(slug, api_key=…) and the explicit value wins over the env default.

Scopes for ik_sdk_

The control token is project-scoped: it can act only inside the single project it was issued for, and only through the control routes. Its capabilities map to the workload lifecycle:

  • Create workloadsPOST /api/projects/:project_id/workloads
  • Update workloadsPATCH /api/workloads/:id
  • Delete workloadsDELETE /api/workloads/:id
  • List / read workloadsGET the workload list

What an ik_sdk_ token can not do:

  • Call inference (/endpoint/...) — that is the data plane’s job.
  • Act outside its project — a token issued for acme cannot touch beta.

Lifecycle

  1. Create in the dashboard. Both ik_sdk_ and ik_live_ tokens are minted in the dashboard. Control tokens are issued for a project; data tokens are issued for a workload.

  2. Copy it once. The full secret is shown only at creation time. The dashboard stores a hash, not the raw token — if you lose it, you cannot read it back, you rotate it.

  3. Store it as a secret. Put ik_sdk_ in INFERENCEKEY_SDK_TOKEN and your ik_live_ keys in your secret manager / INFERENCEKEY_API_KEY (or pass them per workload). Never commit a token to source control.

  4. Rotate or revoke. Generate a replacement and update the environment, then revoke the old token in the dashboard. Revoking takes effect immediately: control calls start returning 403, data calls start returning AuthError.


New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.