API reference

Esta página aún no está disponible en tu idioma.

The InferenceKey SDK is one Rust core (inferencekey-core) exposed through a C ABI, with thin native bindings on top. Every language binds the same core, so the concepts are identical everywhere: two tokens, two clients, and one ensure() reconcile call.

Control plane — a ManagementClient authenticated with an ik_sdk_ token. Scoped to one project. Provisions and reconciles workloads. It cannot call inference.
Data plane — a DataClient that hands you an endpoint per workload, each authenticated with an ik_live_ token. It cannot provision.

See Tokens for the least-privilege model and Architecture for how the core, ABI, and bindings fit together.

Pick your language

Python and TypeScript ship today. Go and Java bind the same C ABI and are on the way.

Python pyo3 wheel. pip install inferencekey. ManagementClient, DataClient, WorkloadSpec, Backend, OnDrift.

TypeScript / Node napi addon. npm i @inferencekey/sdk. Fully typed, async/Promise-based API.

Go Coming soon — native bindings over the C ABI.

Java Coming soon — native bindings over the C ABI.

The shape of the API

Whatever language you bind, you work with the same handful of types. The snippets below are the canonical entry points; the language pages document every parameter and return type.

Python
TypeScript

from inferencekey import ManagementClient, DataClient, WorkloadSpec, Backend, OnDrift

# Control plane: reads INFERENCEKEY_SDK_TOKEN (ik_sdk_...), scoped to one project.
mgmt = ManagementClient.from_env(project="acme")

ref = mgmt.ensure(WorkloadSpec(
    name="support-bot",
    slug="support-bot",
    model="meta-llama/Llama-3.1-8B-Instruct",
    backend=Backend.VLLM,
    command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
))  # on_drift defaults to OnDrift.RECONCILE

# Data plane: one client, many workloads, an ik_live_ key passed per workload.
data = DataClient.from_env(project="acme")
ep   = data.endpoint(ref.workload_slug, api_key="ik_live_...")
out  = ep.generate_text(prompt="Hola", temperature=0.2, max_tokens=300)
print(out.text, out.model)

emb = data.endpoint("billing", api_key="ik_live_...").embed(input=["a", "b"])
print(emb.embeddings)

import { ManagementClient, DataClient, Backend, OnDrift } from "@inferencekey/sdk";

// Control plane: reads INFERENCEKEY_SDK_TOKEN (ik_sdk_...), scoped to one project.
const mgmt = ManagementClient.fromEnv({ project: "acme" });

const ref = await mgmt.ensure({
  name: "support-bot",
  slug: "support-bot",
  model: "meta-llama/Llama-3.1-8B-Instruct",
  backend: Backend.Vllm,
  command: "vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",
}); // onDrift defaults to OnDrift.Reconcile

// Data plane: one client, many workloads, an ik_live_ key passed per workload.
const data = DataClient.fromEnv({ project: "acme" });
const ep   = data.endpoint(ref.workloadSlug, { apiKey: process.env.SUPPORT_IK_LIVE });
const out  = await ep.generateText({ prompt: "Hola", temperature: 0.2, maxTokens: 300 });
console.log(out.text, out.model);

const emb = await data.endpoint("billing", { apiKey: "ik_live_..." }).embed({ input: ["a", "b"] });
console.log(emb.embeddings);

Core types at a glance

ManagementClient — control-plane client. from_env / fromEnv reads INFERENCEKEY_SDK_TOKEN. The key method is ensure(spec), which is idempotent by the explicit slug and returns a workload reference (project_slug / projectSlug, workload_slug / workloadSlug).
DataClient — data-plane client. from_env / fromEnv reads INFERENCEKEY_API_KEY as the default ik_live_ key. Call endpoint(slug, { apiKey }) to get a per-workload endpoint, then generate_text / generateText or embed.
WorkloadSpec — the desired state you declare: name, slug, model, backend, plus optional project, description, command, vllm_version, task_type, config, execution_policy, execution_policy_config, worker_id, gpu_resource_id. Placement is the platform’s job — there is no provider and no min_vram_gb.
Backend — Ollama, Vllm, VllmOmni, Sglang. See Backends and policies.
OnDrift — Reconcile (default), Fail, DryRun, Warn, Ignore. See OnDrift.
Errors — PermissionDenied, AuthError, ValidationError, ConfigurationError, ApiError, all subclasses of InferenceKeyError. See Common errors.

New to the SDK?

Get your tokens and learn the two-token model — Tokens quickstart.
Declare your first workload with ensure() — First ensure.
Call the endpoint it created — First call.

You’ll need a project and an ik_sdk_ token to follow along. Create an account or open the dashboard to provision them.

New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.