Glossary
A quick reference for the vocabulary used across the SDK and the dashboard. Terms are grouped so related ideas sit next to each other; jump to the deeper pages linked at the bottom when you need more than a sentence.
Resources
- Workload
A single deployable inference unit: one model, one backend, one configuration. An app can own many workloads (for example a chat model, an embedding model, and a reranker), each with its own slug and its own
ik_live_key.- Slug
The stable, URL-safe identifier for a workload (and for its project). It is what makes
ensure()idempotent and what appears in the endpoint path, so pick it once and keep it:support-bot, not a name that might change.- Project
The container that groups workloads, members, and keys. A
ik_sdk_token is scoped to exactly one project; both clients take aprojectslug (ManagementClient.from_env(project=“acme”)).- Tenant
The account/organization that owns one or more projects. Tenancy is the platform’s isolation boundary — keys, workers, and billing never cross it.
Tokens & planes
ik_live_A data-plane key. It calls inference and nothing else — it cannot provision. Pass it per workload (
data.endpoint(slug, api_key=“ik_live_…”)), so each workload can carry its own least-privilege key.ik_sdk_A control-plane token. It provisions and reconciles workloads for one project and is held by the
ManagementClient; it cannot call inference. Read fromINFERENCEKEY_SDK_TOKEN.- Control plane
The provisioning side: creating, listing, and reconciling workloads via the
ManagementClient. Authenticated withik_sdk_.- Data plane
The serving side: the OpenAI-compatible endpoints you call for text, embeddings, and other modalities via the
DataClient. Authenticated withik_live_.
Configuration
- Execution policy
How the workload is kept running:
fixed(always on),scheduled(on a time window), orautoscaling(scales with load). Set viaexecution_policyon theWorkloadSpec.- Modality / task_type
What the workload does — one of 12 task types (
text2textis the default; others includeembedding,text2image,audio2text,reranker,classification,reward).reranker,classification, andrewardare async-only (no synchronous OpenAI route).- Backend
The serving engine that runs the model:
Backend.Ollama,Vllm,VllmOmni, orSglang(wire:ollama/vllm/vllm-omni/sglang). The vLLM backends take acommandand an optionalvllm_version.- Worker (cloud / private)
The compute host where a workload actually runs. A cloud worker is platform-managed capacity; a private worker is your own machine attached to the platform. Target a specific one with
worker_id— but placement is normally the platform’s job, so you usually leave it unset.
SDK behavior
ensure()The single control-plane call that declares a workload and makes the platform match it: it creates the workload if it is missing and reconciles it if it has drifted. Idempotent by the explicit slug, so re-running with the same spec is a no-op.
- Drift / OnDrift
Drift is any difference between your declared
WorkloadSpecand what exists on the platform.on_driftdecides whatensure()does about it:OnDrift.Reconcile(default — fix it),Fail,DryRun,Warn, orIgnore.- EndpointRef
The handle returned by
data.endpoint(slug, …). You call inference on it —generate_text(…),embed(…)— and the result carries the output plus its model (out.text,out.model,emb.embeddings).
Keep reading
New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.