Aller au contenu

Glossary

Ce contenu n’est pas encore disponible dans votre langue.

A quick reference for the vocabulary used across the SDK and the dashboard. Terms are grouped so related ideas sit next to each other; jump to the deeper pages linked at the bottom when you need more than a sentence.

Resources

Workload

A single deployable inference unit: one model, one backend, one configuration. An app can own many workloads (for example a chat model, an embedding model, and a reranker), each with its own slug and its own ik_live_ key.

Slug

The stable, URL-safe identifier for a workload (and for its project). It is what makes ensure() idempotent and what appears in the endpoint path, so pick it once and keep it: support-bot, not a name that might change.

Project

The container that groups workloads, members, and keys. A ik_sdk_ token is scoped to exactly one project; both clients take a project slug (ManagementClient.from_env(project=“acme”)).

Tenant

The account/organization that owns one or more projects. Tenancy is the platform’s isolation boundary — keys, workers, and billing never cross it.

Tokens & planes

ik_live_

A data-plane key. It calls inference and nothing else — it cannot provision. Pass it per workload (data.endpoint(slug, api_key=“ik_live_…”)), so each workload can carry its own least-privilege key.

ik_sdk_

A control-plane token. It provisions and reconciles workloads for one project and is held by the ManagementClient; it cannot call inference. Read from INFERENCEKEY_SDK_TOKEN.

Control plane

The provisioning side: creating, listing, and reconciling workloads via the ManagementClient. Authenticated with ik_sdk_.

Data plane

The serving side: the OpenAI-compatible endpoints you call for text, embeddings, and other modalities via the DataClient. Authenticated with ik_live_.

Configuration

Execution policy

How the workload is kept running: fixed (always on), scheduled (on a time window), or autoscaling (scales with load). Set via execution_policy on the WorkloadSpec.

Modality / task_type

What the workload does — one of 12 task types (text2text is the default; others include embedding, text2image, audio2text, reranker, classification, reward). reranker, classification, and reward are async-only (no synchronous OpenAI route).

Backend

The serving engine that runs the model: Backend.Ollama, Vllm, VllmOmni, or Sglang (wire: ollama / vllm / vllm-omni / sglang). The vLLM backends take a command and an optional vllm_version.

Worker (cloud / private)

The compute host where a workload actually runs. A cloud worker is platform-managed capacity; a private worker is your own machine attached to the platform. Target a specific one with worker_id — but placement is normally the platform’s job, so you usually leave it unset.

SDK behavior

ensure()

The single control-plane call that declares a workload and makes the platform match it: it creates the workload if it is missing and reconciles it if it has drifted. Idempotent by the explicit slug, so re-running with the same spec is a no-op.

Drift / OnDrift

Drift is any difference between your declared WorkloadSpec and what exists on the platform. on_drift decides what ensure() does about it: OnDrift.Reconcile (default — fix it), Fail, DryRun, Warn, or Ignore.

EndpointRef

The handle returned by data.endpoint(slug, …). You call inference on it — generate_text(…), embed(…) — and the result carries the output plus its model (out.text, out.model, emb.embeddings).

Keep reading


New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.