Saltearse al contenido

API reference

Esta página aún no está disponible en tu idioma.

The InferenceKey SDK is one Rust core (inferencekey-core) exposed through a C ABI, with thin native bindings on top. Every language binds the same core, so the concepts are identical everywhere: two tokens, two clients, and one ensure() reconcile call.

  • Control plane — a ManagementClient authenticated with an ik_sdk_ token. Scoped to one project. Provisions and reconciles workloads. It cannot call inference.
  • Data plane — a DataClient that hands you an endpoint per workload, each authenticated with an ik_live_ token. It cannot provision.

See Tokens for the least-privilege model and Architecture for how the core, ABI, and bindings fit together.

Pick your language

Python and TypeScript ship today. Go and Java bind the same C ABI and are on the way.

The shape of the API

Whatever language you bind, you work with the same handful of types. The snippets below are the canonical entry points; the language pages document every parameter and return type.

quickstart.py
from inferencekey import ManagementClient, DataClient, WorkloadSpec, Backend, OnDrift
# Control plane: reads INFERENCEKEY_SDK_TOKEN (ik_sdk_...), scoped to one project.
mgmt = ManagementClient.from_env(project="acme")
ref = mgmt.ensure(WorkloadSpec(
name="support-bot",
slug="support-bot",
model="meta-llama/Llama-3.1-8B-Instruct",
backend=Backend.VLLM,
command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
)) # on_drift defaults to OnDrift.RECONCILE
# Data plane: one client, many workloads, an ik_live_ key passed per workload.
data = DataClient.from_env(project="acme")
ep = data.endpoint(ref.workload_slug, api_key="ik_live_...")
out = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)
print(out.text, out.model)
emb = data.endpoint("billing", api_key="ik_live_...").embed(input=["a", "b"])
print(emb.embeddings)

Core types at a glance

  • ManagementClient — control-plane client. from_env / fromEnv reads INFERENCEKEY_SDK_TOKEN. The key method is ensure(spec), which is idempotent by the explicit slug and returns a workload reference (project_slug / projectSlug, workload_slug / workloadSlug).
  • DataClient — data-plane client. from_env / fromEnv reads INFERENCEKEY_API_KEY as the default ik_live_ key. Call endpoint(slug, { apiKey }) to get a per-workload endpoint, then generate_text / generateText or embed.
  • WorkloadSpec — the desired state you declare: name, slug, model, backend, plus optional project, description, command, vllm_version, task_type, config, execution_policy, execution_policy_config, worker_id, gpu_resource_id. Placement is the platform’s job — there is no provider and no min_vram_gb.
  • BackendOllama, Vllm, VllmOmni, Sglang. See Backends and policies.
  • OnDriftReconcile (default), Fail, DryRun, Warn, Ignore. See OnDrift.
  • ErrorsPermissionDenied, AuthError, ValidationError, ConfigurationError, ApiError, all subclasses of InferenceKeyError. See Common errors.

New to the SDK?

  1. Get your tokens and learn the two-token model — Tokens quickstart.

  2. Declare your first workload with ensure()First ensure.

  3. Call the endpoint it created — First call.

You’ll need a project and an ik_sdk_ token to follow along. Create an account or open the dashboard to provision them.


New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.