DeepSeek API with SillyTavern: A No-GPU Setup

There are two common routes for using DeepSeek in SillyTavern.

The first route is local deployment: run a model on your own computer with Ollama, KoboldCPP, or LM Studio, then let SillyTavern connect to that local service. This is good for people who have a GPU, want offline use, or enjoy comparing local models.

The second route is the API route: SillyTavern still runs on your machine, but model inference happens through the official DeepSeek API. This does not require a local GPU, and it avoids downloading tens of gigabytes of model files. As long as the network and API account are usable, an ordinary laptop can run the SillyTavern side.

This article covers the second route. If you want the local Ollama setup instead, see DeepSeek R1 with SillyTavern: Local Ollama Setup on Windows.

A local SillyTavern interface connected to a cloud model service through a secure path

Who This Is For

The API route fits these situations:

The computer has no dedicated GPU, or the VRAM is not enough.
You do not want to download and manage local model files.
You want model behavior closer to the official online DeepSeek experience.
Token-based billing is acceptable.
You mainly use SillyTavern while connected to the internet.

It is not for fully offline use. Prompts, role cards, chat content, and system messages are sent to the model provider. If a conversation contains private or sensitive information, treat the cloud API boundary as part of the risk model.

Prepare a DeepSeek API Key

Go to the official DeepSeek API platform first:

DeepSeek API Platform

After registration, login, and any required account setup, create an API key. The full key is usually shown only once. Save it in a password manager or another safe local place. Do not commit it to Git and do not paste it into chat logs.

Model names and pricing should be checked on the official documentation:

DeepSeek Models & Pricing

As of May 5, 2026, DeepSeek’s documentation listed the newer model names:

deepseek-v4-flash
deepseek-v4-pro

The official page also noted that the older model names deepseek-chat and deepseek-reasoner were scheduled to retire on July 24, 2026. For new SillyTavern configurations, prefer deepseek-v4-flash or deepseek-v4-pro.

For everyday role chat, I would start with deepseek-v4-flash. It is a better default for cost and speed. If response quality matters more, compare it with deepseek-v4-pro.

Install and Start SillyTavern

If SillyTavern is not installed yet, install these on Windows:

Then run:

git clone https://github.com/SillyTavern/SillyTavern -b release

Enter the SillyTavern folder and double-click Start.bat. Once the browser opens, the SillyTavern application itself is running.

Official installation guide: SillyTavern Windows Installation

Configure the DeepSeek API

In SillyTavern, click the plug icon at the top to open API connection settings. The exact UI labels may change between versions, but the idea is stable: choose Chat Completion, then connect DeepSeek either through the built-in DeepSeek provider or through an OpenAI-compatible custom provider.

The configuration path made of API key, network endpoint, model choice, and test response

Option 1: Use the Built-In DeepSeek Provider

If your current SillyTavern version already includes DeepSeek as a Chat Completion source, start there:

API type: Chat Completion
Source / Provider: DeepSeek
API Key: the key created in the DeepSeek platform
Model: start with deepseek-v4-flash, or use deepseek-v4-pro when quality matters more

Save the settings and test the connection. If SillyTavern returns model information or a test reply, the connection is working.

If the model list still only shows older names such as deepseek-chat and deepseek-reasoner, SillyTavern’s built-in list may not have caught up with DeepSeek’s documentation. In that case, use the OpenAI-compatible route and type the model name manually.

Option 2: Use an OpenAI-Compatible Configuration

DeepSeek’s API is compatible with the OpenAI API shape, so SillyTavern can also connect through a Custom or OpenAI-compatible provider.

A common configuration looks like this:

API type: Chat Completion
Source / Provider: Custom or OpenAI-compatible
API Key: sk-...
Base URL: https://api.deepseek.com
Model: deepseek-v4-flash

If the current SillyTavern version expects an OpenAI-style /v1 base URL, use:

https://api.deepseek.com/v1

Do not set the base URL to https://api.deepseek.com/chat/completions. SillyTavern builds the final endpoint path itself. In the configuration field, it usually needs only the base address.

Suggested Starting Parameters

For role chat, there is no single perfect parameter set. The first goal is to make the connection stable, then tune the behavior.

A conservative starting point:

Model: deepseek-v4-flash
Temperature: 0.8 to 1.0
Top P: 0.9
Max response length: start medium, then increase after checking speed and cost
Streaming: enabled, so responses appear while being generated

If the character becomes too unfocused, lower Temperature. If replies are too short, increase max response length. If context cost grows too quickly, keep fewer chat messages or shorten the character card.

Common Questions

Why does the API route not need a GPU?

The model runs on DeepSeek’s servers. Your computer only runs the SillyTavern interface, sends requests, and displays responses. The bottlenecks become network access, API availability, and cost, not local GPU power.

What is the difference between DeepSeek API and local Ollama?

Ollama runs a local model. It gives you more control, can work offline, and does not bill by token. The tradeoff is hardware pressure: larger models need more VRAM and RAM.

DeepSeek API runs a cloud model. It needs no local GPU and is usually more stable. The tradeoffs are network dependency, usage-based billing, and the fact that conversation data is sent to the provider.

What if I entered the API key but it still cannot connect?

Check four things first:

Whether the API key was copied completely, without extra spaces.
Whether the Base URL is only the base address, not the full endpoint path.
Whether the model name is currently available in DeepSeek’s official documentation.
Whether the network can reach the DeepSeek API.

If the built-in DeepSeek provider fails, try the OpenAI-compatible configuration. If https://api.deepseek.com does not work in your SillyTavern version, try https://api.deepseek.com/v1.

How can I control cost?

Role chat can consume more tokens over time because the conversation context keeps growing.

API usage tradeoffs between network availability, cost, and privacy boundaries

Start with these habits:

Use deepseek-v4-flash first.
Avoid keeping an unnecessarily long chat history.
Keep role cards, world info, and system prompts reasonably short.
Test briefly before long sessions.
Check usage in the DeepSeek console regularly.

Should I still keep a local deployment?

Yes, if it fits the way you use SillyTavern.

The local route is more like a technical playground and a privacy preference. The API route is more like a stable daily-use option. After trying both, I prefer using the API route for regular SillyTavern sessions, and keeping Ollama for testing, offline use, and model comparison.

Summary

Connecting DeepSeek API to SillyTavern comes down to three things: get an API key, choose Chat Completion or OpenAI-compatible mode, and set the Base URL plus model name correctly.

If the goal is simply to use DeepSeek in SillyTavern without buying a GPU or downloading local model files, the API route has a much lower entry cost. If offline control becomes important later, it is still easy to return to Ollama, KoboldCPP, or LM Studio.

References

Chinese version of this article