Provisioned from code
ensure() declared support-bot and reconciled it on the platform — idempotent by slug, with OnDrift.RECONCILE keeping it in spec.
Ce contenu n’est pas encore disponible dans votre langue.
This tutorial takes you from an empty terminal to your first chat response. You will install the SDK, grab two tokens, provision an inference workload from code with ensure(), and call the resulting OpenAI-compatible endpoint with generate_text().
It should take a few minutes. By the end you will have a support-bot workload running and a script that talks to it.
You need:
Install the package for your language.
pip install inferencekeynpm i @inferencekey/sdkOpen the dashboard and create two tokens for the same project:
ik_sdk_. Scoped to one project. Used to provision and reconcile workloads. It cannot call inference.ik_live_. Used to call inference. You pass it per workload, so one app can hold several ik_live_ keys with different scopes.The SDK reads configuration from environment variables. Precedence is explicit argument > environment variable, so anything you pass in code wins.
export INFERENCEKEY_BASE_URL="https://api.inferencekey.com"export INFERENCEKEY_PROJECT="acme"export INFERENCEKEY_SDK_TOKEN="ik_sdk_xxxxxxxxxxxxxxxxxxxxxxxx" # control planeexport INFERENCEKEY_API_KEY="ik_live_xxxxxxxxxxxxxxxxxxxxxxxx" # data plane (default ik_live_)INFERENCEKEY_SDK_TOKEN is read by ManagementClient.from_env().INFERENCEKEY_API_KEY is the default ik_live_ key for the DataClient. You can still override it per endpoint, which is the recommended pattern when you run many workloads.ensure()ensure() is declarative and idempotent: you describe the workload you want, and the platform creates it or reconciles it to match. Idempotency is keyed on the explicit slug, so running this twice is safe — the second run reconciles instead of creating a duplicate.
from inferencekey import ManagementClient, WorkloadSpec, Backend
mgmt = ManagementClient.from_env(project="acme") # reads INFERENCEKEY_SDK_TOKEN
ref = mgmt.ensure(WorkloadSpec( name="support-bot", slug="support-bot", model="meta-llama/Llama-3.1-8B-Instruct", backend=Backend.VLLM, command="vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",))
print("ready:", ref.project_slug, "/", ref.workload_slug)import { ManagementClient, Backend } from "@inferencekey/sdk";
const mgmt = ManagementClient.fromEnv({ project: "acme" }); // reads INFERENCEKEY_SDK_TOKEN
const ref = await mgmt.ensure({ name: "support-bot", slug: "support-bot", model: "meta-llama/Llama-3.1-8B-Instruct", backend: Backend.Vllm, command: "vllm serve meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192",});
console.log("ready:", ref.projectSlug, "/", ref.workloadSlug);generate_text()Now switch to the data plane. Build a DataClient, get an endpoint for the workload slug, and pass your ik_live_ token. Then call generate_text().
from inferencekey import DataClient
data = DataClient.from_env(project="acme")
ep = data.endpoint("support-bot", api_key="ik_live_xxxxxxxxxxxxxxxxxxxxxxxx")
out = ep.generate_text( prompt="Hola, ¿en qué puedes ayudarme?", temperature=0.2, max_tokens=300,)
print(out.text) # the replyprint(out.model) # which model produced itimport { DataClient } from "@inferencekey/sdk";
const data = DataClient.fromEnv({ project: "acme" });
const ep = data.endpoint("support-bot", { apiKey: process.env.SUPPORT_IK_LIVE, // your ik_live_ token});
const out = await ep.generateText({ prompt: "Hola, ¿en qué puedes ayudarme?", temperature: 0.2, maxTokens: 300,});
console.log(out.text); // the replypython provision.py # create / reconcile the workloadpython chat.py # get your first replynode provision.ts # create / reconcile the workloadnode chat.ts # get your first replyYou should see a chat reply printed to your terminal. That response came from the support-bot workload you just provisioned — running on the OpenAI-compatible endpoint at /endpoint/acme/support-bot/v1/....
Provisioned from code
ensure() declared support-bot and reconciled it on the platform — idempotent by slug, with OnDrift.RECONCILE keeping it in spec.
Called inference
generate_text() hit the OpenAI-compatible data plane with your ik_live_ token and returned out.text / out.model.
Kept tokens least-privilege
Control (ik_sdk_) provisions; data (ik_live_) calls. Neither can do the other’s job.
Stayed config-light
Env vars carried base URL, project, and tokens — overridable in code when you need to.
New to InferenceKey? Create an account or open the dashboard · Learn more at inferencekey.com.