DeepSeek API with SillyTavern: A No-GPU Setup
How to connect SillyTavern to DeepSeek through the official API or an OpenAI-compatible configuration, without a local GPU or large model downloads.
There are two common routes for using DeepSeek in SillyTavern.
The first route is local deployment: run a model on your own computer with Ollama, KoboldCPP, or LM Studio, then let SillyTavern connect to that local service. This is good for people who have a GPU, want offline use, or enjoy comparing local models.
The second route is the API route: SillyTavern still runs on your machine, but model inference happens through the official DeepSeek API. This does not require a local GPU, and it avoids downloading tens of gigabytes of model files. As long as the network and API account are usable, an ordinary laptop can run the SillyTavern side.
This article covers the second route. If you want the local Ollama setup instead, see DeepSeek R1 with SillyTavern: Local Ollama Setup on Windows.

Who This Is For
The API route fits these situations:
- The computer has no dedicated GPU, or the VRAM is not enough.
- You do not want to download and manage local model files.
- You want model behavior closer to the official online DeepSeek experience.
- Token-based billing is acceptable.
- You mainly use SillyTavern while connected to the internet.
It is not for fully offline use. Prompts, role cards, chat content, and system messages are sent to the model provider. If a conversation contains private or sensitive information, treat the cloud API boundary as part of the risk model.
Prepare a DeepSeek API Key
Go to the official DeepSeek API platform first:
After registration, login, and any required account setup, create an API key. The full key is usually shown only once. Save it in a password manager or another safe local place. Do not commit it to Git and do not paste it into chat logs.
Model names and pricing should be checked on the official documentation:
As of May 5, 2026, DeepSeek’s documentation listed the newer model names:
deepseek-v4-flashdeepseek-v4-pro
The official page also noted that the older model names deepseek-chat and deepseek-reasoner were scheduled to retire on July 24, 2026. For new SillyTavern configurations, prefer deepseek-v4-flash or deepseek-v4-pro.
For everyday role chat, I would start with deepseek-v4-flash. It is a better default for cost and speed. If response quality matters more, compare it with deepseek-v4-pro.
Install and Start SillyTavern
If SillyTavern is not installed yet, install these on Windows:
Then run:
git clone https://github.com/SillyTavern/SillyTavern -b release
Enter the SillyTavern folder and double-click Start.bat. Once the browser opens, the SillyTavern application itself is running.
Official installation guide: SillyTavern Windows Installation
Configure the DeepSeek API
In SillyTavern, click the plug icon at the top to open API connection settings. The exact UI labels may change between versions, but the idea is stable: choose Chat Completion, then connect DeepSeek either through the built-in DeepSeek provider or through an OpenAI-compatible custom provider.

Option 1: Use the Built-In DeepSeek Provider
If your current SillyTavern version already includes DeepSeek as a Chat Completion source, start there:
- API type: Chat Completion
- Source / Provider: DeepSeek
- API Key: the key created in the DeepSeek platform
- Model: start with
deepseek-v4-flash, or usedeepseek-v4-prowhen quality matters more
Save the settings and test the connection. If SillyTavern returns model information or a test reply, the connection is working.
If the model list still only shows older names such as deepseek-chat and deepseek-reasoner, SillyTavern’s built-in list may not have caught up with DeepSeek’s documentation. In that case, use the OpenAI-compatible route and type the model name manually.
Option 2: Use an OpenAI-Compatible Configuration
DeepSeek’s API is compatible with the OpenAI API shape, so SillyTavern can also connect through a Custom or OpenAI-compatible provider.
A common configuration looks like this:
API type: Chat Completion
Source / Provider: Custom or OpenAI-compatible
API Key: sk-...
Base URL: https://api.deepseek.com
Model: deepseek-v4-flash
If the current SillyTavern version expects an OpenAI-style /v1 base URL, use:
https://api.deepseek.com/v1
Do not set the base URL to https://api.deepseek.com/chat/completions. SillyTavern builds the final endpoint path itself. In the configuration field, it usually needs only the base address.
Suggested Starting Parameters
For role chat, there is no single perfect parameter set. The first goal is to make the connection stable, then tune the behavior.
A conservative starting point:
- Model:
deepseek-v4-flash - Temperature:
0.8to1.0 - Top P:
0.9 - Max response length: start medium, then increase after checking speed and cost
- Streaming: enabled, so responses appear while being generated
If the character becomes too unfocused, lower Temperature. If replies are too short, increase max response length. If context cost grows too quickly, keep fewer chat messages or shorten the character card.
Common Questions
Why does the API route not need a GPU?
The model runs on DeepSeek’s servers. Your computer only runs the SillyTavern interface, sends requests, and displays responses. The bottlenecks become network access, API availability, and cost, not local GPU power.
What is the difference between DeepSeek API and local Ollama?
Ollama runs a local model. It gives you more control, can work offline, and does not bill by token. The tradeoff is hardware pressure: larger models need more VRAM and RAM.
DeepSeek API runs a cloud model. It needs no local GPU and is usually more stable. The tradeoffs are network dependency, usage-based billing, and the fact that conversation data is sent to the provider.
What if I entered the API key but it still cannot connect?
Check four things first:
- Whether the API key was copied completely, without extra spaces.
- Whether the Base URL is only the base address, not the full endpoint path.
- Whether the model name is currently available in DeepSeek’s official documentation.
- Whether the network can reach the DeepSeek API.
If the built-in DeepSeek provider fails, try the OpenAI-compatible configuration. If https://api.deepseek.com does not work in your SillyTavern version, try https://api.deepseek.com/v1.
How can I control cost?
Role chat can consume more tokens over time because the conversation context keeps growing.

Start with these habits:
- Use
deepseek-v4-flashfirst. - Avoid keeping an unnecessarily long chat history.
- Keep role cards, world info, and system prompts reasonably short.
- Test briefly before long sessions.
- Check usage in the DeepSeek console regularly.
Should I still keep a local deployment?
Yes, if it fits the way you use SillyTavern.
The local route is more like a technical playground and a privacy preference. The API route is more like a stable daily-use option. After trying both, I prefer using the API route for regular SillyTavern sessions, and keeping Ollama for testing, offline use, and model comparison.
Summary
Connecting DeepSeek API to SillyTavern comes down to three things: get an API key, choose Chat Completion or OpenAI-compatible mode, and set the Base URL plus model name correctly.
If the goal is simply to use DeepSeek in SillyTavern without buying a GPU or downloading local model files, the API route has a much lower entry cost. If offline control becomes important later, it is still easy to return to Ollama, KoboldCPP, or LM Studio.