API reference
Esta página aún no está disponible en tu idioma.
The InferenceKey SDK is one Rust core (inferencekey-core) exposed through a C ABI, with thin native bindings on top. Every language binds the same core, so the concepts are identical everywhere: two tokens, two clients, and one ensure() reconcile call.
- Control plane — a
ManagementClientauthenticated with anik_sdk_token. Scoped to one project. Provisions and reconciles workloads. It cannot call inference. - Data plane — a
DataClientthat hands you an endpoint per workload, each authenticated with anik_live_token. It cannot provision.
See Tokens for the least-privilege model and Architecture for how the core, ABI, and bindings fit together.
Pick your language
Python and TypeScript ship today. Go and Java bind the same C ABI and are on the way.
The shape of the API
Whatever language you bind, you work with the same handful of types. The snippets below are the canonical entry points; the language pages document every parameter and return type.
from inferencekey import ManagementClient, DataClient, WorkloadSpec, Backend, OnDrift
# Control plane: reads INFERENCEKEY_SDK_TOKEN (ik_sdk_...), scoped to one project.mgmt = ManagementClient.from_env(project="acme")
ref = mgmt.ensure(WorkloadSpec( name="support-bot", slug="support-bot", model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM, command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",)) # on_drift defaults to OnDrift.RECONCILE
# Data plane: one client, many workloads, an ik_live_ key passed per workload.data = DataClient.from_env(project="acme")ep = data.endpoint(ref.workload_slug, api_key="ik_live_...")out = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)print(out.text, out.model)
emb = data.endpoint("billing", api_key="ik_live_...").embed(input=["a", "b"])print(emb.embeddings)import { ManagementClient, DataClient, Backend, OnDrift } from "@inferencekey/sdk";
// Control plane: reads INFERENCEKEY_SDK_TOKEN (ik_sdk_...), scoped to one project.const mgmt = ManagementClient.fromEnv({ project: "acme" });
const ref = await mgmt.ensure({ name: "support-bot", slug: "support-bot", model: "meta-llama/Llama-3.1-8B-Instruct", backend: Backend.Vllm, command: "vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",}); // onDrift defaults to OnDrift.Reconcile
// Data plane: one client, many workloads, an ik_live_ key passed per workload.const data = DataClient.fromEnv({ project: "acme" });const ep = data.endpoint(ref.workloadSlug, { apiKey: process.env.SUPPORT_IK_LIVE });const out = await ep.generateText({ prompt: "Hola", temperature: 0.2, maxTokens: 300 });console.log(out.text, out.model);
const emb = await data.endpoint("billing", { apiKey: "ik_live_..." }).embed({ input: ["a", "b"] });console.log(emb.embeddings);Core types at a glance
ManagementClient— control-plane client.from_env/fromEnvreadsINFERENCEKEY_SDK_TOKEN. The key method isensure(spec), which is idempotent by the explicitslugand returns a workload reference (project_slug/projectSlug,workload_slug/workloadSlug).DataClient— data-plane client.from_env/fromEnvreadsINFERENCEKEY_API_KEYas the defaultik_live_key. Callendpoint(slug, { apiKey })to get a per-workload endpoint, thengenerate_text/generateTextorembed.WorkloadSpec— the desired state you declare:name,slug,model,backend, plus optionalproject,description,command,vllm_version,task_type,config,execution_policy,execution_policy_config,worker_id,gpu_resource_id. Placement is the platform’s job — there is noproviderand nomin_vram_gb.Backend—Ollama,Vllm,VllmOmni,Sglang. See Backends and policies.OnDrift—Reconcile(default),Fail,DryRun,Warn,Ignore. See OnDrift.- Errors —
PermissionDenied,AuthError,ValidationError,ConfigurationError,ApiError, all subclasses ofInferenceKeyError. See Common errors.
New to the SDK?
-
Get your tokens and learn the two-token model — Tokens quickstart.
-
Declare your first workload with
ensure()— First ensure. -
Call the endpoint it created — First call.
You’ll need a project and an ik_sdk_ token to follow along. Create an account or open the dashboard to provision them.
New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.