DeepInfra Joins Hugging Face Inference Providers: What You Actually Need to Know

Hugging Face just added DeepInfra to its growing list of Inference Providers. If you’ve been using the Hub for model inference, this is actually a pretty big deal.

DeepInfra isn’t new — they’ve been around, quietly offering some of the cheapest per-token pricing in the serverless AI inference space. But now they’re integrated directly into the Hugging Face ecosystem, which means you can use them without jumping through hoops.

What’s actually different?

DeepInfra brings over 100 models to the table, covering everything from LLMs to text-to-image, text-to-video, and embeddings. The initial integration focuses on conversational and text-generation tasks, so you get access to popular open-weight models like DeepSeek V4, Kimi-K2.6, and GLM-5.1 right out of the gate. They’ve promised support for other tasks like image generation and video soon, but for now, if you’re doing chat or text completion, you’re covered.

How it works in practice

You’ve got two ways to use this:

Bring your own DeepInfra API key — requests go directly to DeepInfra, you get billed on your DeepInfra account. This is the straightforward path if you already have an account there.

Route through Hugging Face — no need for a separate token. Just use your Hugging Face token, and the request gets forwarded to DeepInfra automatically. You pay standard provider rates with zero markup. Hugging Face says they might add revenue-sharing with providers later, but for now, it’s a straight pass-through.

You can also set your provider preferences in your account settings, so the widget and code snippets on model pages will default to your preferred provider. Handy if you’re juggling multiple providers.

The SDK stuff

Both Python and JavaScript SDKs support DeepInfra now. For Python, you need huggingface_hub >= 1.11.2. For JS, it’s @huggingface/inference. The integration is also baked into agent harnesses like Pi, OpenCode, Hermes Agents, and OpenClaw, so you can plug DeepInfra-hosted models into your existing toolchain without extra glue code.

Here’s the Python example — nothing fancy, just works:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that returns the nth Fibonacci number using memoization."
        }
    ],
)

print(completion.choices[0].message)

And the JS equivalent:

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://router.huggingface.co/v1",
    apiKey: process.env.HF_TOKEN,
});

const chatCompletion = await client.chat.completions.create({
    model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages: [
        {
            role: "user",
            content: "Write a Python function that returns the nth Fibonacci number using memoization.",
        },
    ],
});

console.log(chatCompletion.choices[0].message);

What about cost?

If you’re using your own DeepInfra key, you get billed by DeepInfra directly. If you route through Hugging Face, you pay the same rates — no markup. PRO users get $2 worth of inference credits every month, usable across providers. Free users get a small quota too, but honestly, if you’re doing anything serious, upgrade to PRO.

My take

This is a solid move. DeepInfra has been under the radar for a while, but their pricing is genuinely competitive. Having them integrated into the Hub means less friction for developers who want to try different models without managing multiple API endpoints. The fact that you can set provider preferences and the widget auto-selects your favorite is a nice touch.

One thing I’d like to see: faster rollout of the additional task types. Text-to-image and embeddings are huge use cases, and leaving them for “soon” feels like a missed opportunity. But for now, if you’re working with LLMs, this integration is worth checking out.

If you want to dive deeper, check out the dedicated documentation and the full list of supported models.