How to Run DeepSeek R1 Privately and Access It Anywhere from Your iPhone

On the same day the biggest names in the US tech sector were watching the inauguration of their President from a better vantage point than most of the incoming cabinet picks, a small hedge-fund-turned-AI-research-lab based out of Hangzhou, China released an article that was about to turn their worlds upside down. DeepSeek, a company of less than 200 people (compared to 3500+ of OpenAI or Anthropic and just over 1000), announced their first reasoning model: DeepSeek R1. Not 5 days later, moments before Wall Street rang their closing bell for the week, DeepSeek released both the model weights and the research paper detailing how they had developed a reasoning model with the capabilies of OpenAI’s own o1 model, but at a fraction of the compute cost. Now, anybody could create, train or infer their own model that today would cost an end user $200/month to use – a model that OpenAI spent 20 times more to build themselves, and kept their methods behind closed doors. The result? The market realizing that you don’t need a nuclear-powered datacenter to build cutting-edge AI, and over $1 trillion dollars in tech stocks was sold off.

If you’d like to learn more about the DeepSeek model, including how it came to be and why the way in which this model was announced and released has such massive implications for the AI space, check out this incredible article by AI engineer and venture capitalist Lance Co Ting Keh.

What does this mean for us as AI consumers?

Anticipating the public’s desire following the mass media hype, DeepSeek released an app on the stores of both major platforms. Simply by downloading an app onto your phone or accessing their website, and creating an account, you too can access the latest and greatest in open-source models.

However, the issue still of using someone else’s AI still remains: everything you send is going to someone else’s computer. Regardless of the data collection practices of any LLM provider, the fact remains that the information passed to these services is under their control. Any documents or photos you upload, any questions that may relate to private matters in your life, are likely going to be used in some way to benefit the providing business and/or their connected partners. It’s like money you take into a casino—there’s a statistically significant chance that it will no longer be yours.

This is what makes DeepSeek’s decision to release this model to the public so incredible – we can run the same model, privately on our own hardware. Traditionally, there have been technical hurdles and an often infuriating amount of trial an error in order to do this. In this guide, you will learn how to spin up and manage your own instance of DeepSeek R1 – and then access it from any device, anywhere, through a familiar, ChatGPT-style interface.

Let’s get started!

Here’s What You’ll Need

To set up your private deployment of DeepSeek R1, you’ll need the following:

  • A desktop PC, preferably with an NVIDIA graphics card: In this setup, we’re using an RTX 3090 with 24GB of memory (VRAM). The greater the VRAM of a graphics card, the bigger and better the model you can run. DeepSeek R1 models can run on something as small as your MacBook or even a Raspberry Pi, but performance will be severely degraded from what you’ve come to expect from the likes of ChatGPT.
  • Your smartphone, be it an iPhone or an Android.
  • This guide, to walk you through the process.
  • A can-do attitude! This project may seem a bit technical at first glance, but will serve as a great introduction into the world of containerization and using the command line, if you’re up for it!

Step 1: Install Docker on your PC

Docker is a containerization platform that allows you to run applications on your PC in isolation. Instead of installing software directly onto your PC, potentially affecting or being affected by other software, we run it in an isolated space on the computer called a container. When we are done, we simply stop and/or remove the container. Thus, to keep things modular, isolated and easy to manage, we will deploy our instance of DeepSeek R1 in a container. We do need to install software to manage these containers, and that’s precisely what Docker is for!

To install Docker on your machine, follow the guide below that corresponds to the operating system on your PC:

Windows

Linux (Ubuntu)*

Why not Mac? Docker does not currently support GPUs on Mac, which are needed to run the model at a speed that does require you waiting several years for a response. We will use a different method for a Mac installation.

*if you have another Linux distro installed, then I fully trust in your ability in finding how to install Docker on your own.

Step 2: Run an Ollama instance

Remember how I started this guide by mentioning the technical acumen and monk-like patience it once took to deploy and run large language models? That was before the days of frameworks Ollama – an open-source, lightweight framework for building and running language models. In just a few commands, we can deploy language models that have been specifically optimized for running on everyday machines and making them accessible through an API endpoint. Cool, right?

Here’s how to deploy DeepSeek R1 using Ollama:

Linux and Windows:

  1. Open Terminal (Linux) or the Command Prompt (Windows)
  2. Run the following command to pull and start the Ollama container:
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest

Mac:

  1. Download Ollama for Mac.
  2. Double-click the download to open it, and follow the prompts.

Now, at this point Ollama is running – but it does not yet have any models to serve to us. We must first download (pull) models from Ollama’s repository. This will require you to:

  1. Visit Ollama’s Model Library and select a model to download. Given we want to run DeepSeek R1, click deepseek-r1.

Selecting the dropdown menu, we see the size variants of the DeepSeek model.

Wait, variants?

When the world went nuts over DeepSeek R1’s performance compared to OpenAI’s state-of-the-art ‘o1’ model, they were comparing the full-size model, which is 671 billion parameters and requires a whopping 1128GB of VRAM (that’s 35 of the latest NVIDIA RTX 5090 GPUs at $2,000 a pop, or 8 NVIDIA H200 GPUs at $120,000 a pop. The choice truly is yours!). To make the model’s reasoning capabilities more generally accessible, DeepSeek went to the trouble of using R1’s responses to fine-tune smaller, open-source models already available. By distilling the reasoning capabilities of their biggest model into smaller models, DeepSeek drastically increased the performance of those models compared to their original base while maintianing their smaller size. These distilled models include Meta’s Llama series (3.1 8B and 3.3 70B) and the Qwen’s. 2.5 model series (1.5B, 7B, 14B and 32B), where the B stands for billions of parameters.

Despite being orders of magnitude smaller, DeepSeek-R1-Distill-Qwen-32B improves test benchmarks by over 20 points (think of it like your child working with a tutor for a few weeks and improving from C’s and D’s to A’s and B’s). It also outperforms both Anthropic’s latest Claude 3.5 model and OpenAI’s GPT-4o and o1 models. Oh, and we can run them ourselves without paying either of those two companies. Insane.

From the original DeepSeek R1 paper, published by DeepSeek. All you need to know is bigger number is better.

Going one step further to make the models more accessible, the Ollama community take these models and make them more lightweight by taking the average of many of the parameters (a technique known as quantization), such that the model doesn’t need as many of them – making it smaller and thus able to fit on GPUs with less VRAM. The models available here on Ollama are 4-bit quantized, which marginally impacts the performance compared to the unquantized models*.

*At time of writing, the exact difference has not yet been benchmarked for the R1 series of models, but traditionally quantized models do not drastically perform worse than their original, unquantized counterparts.

Both distilation and quantization allows us to run a DeepSeek R1 on a wide variety of hardware, from a single Macbook to a large datacenter.

Okay, back to the tutorial…

  1. In the dropdown menu, select the model which best suits your hardware. My RTX 3090 has 24GB of VRAM, meaning the biggest and most capable model it can fit is the 32b model. For Macs, refer to this hardware chart to see which model you can run.
  1. Back in Terminal/Command Prompt, run the following command. Replace the number after the colon with your preferred model size.

Linux and Windows:

docker exec ollama ollama pull deepseek-r1:32b

Mac:

ollama pull deepseek-r1:32b
  1. Wait for model to finish downloading.
  2. On Mac only, run the following command to start the Ollama server. (The server starts on launch for our Docker instances)
ollama serve
  1. Confirm the Ollama server is running by opening a browser and navigating to http://localhost:11434. If all is well, you should see the words “Ollama is running”!

Step 3: Run an Open WebUI instance

Now that we have Ollama with DeepSeek R1 downloaded and ready to serve, we need a way to interact with it. While you could technically send requests to the model via the command line or API calls, that’s not exactly a ChatGPT-like experience. This is where Open WebUI comes in.

Open WebUI is an open-source, web-based chat interface designed to make interacting with local AI models as seamless as using ChatGPT or any other LLM-powered chatbot. It provides a clean, modern UI with features like chat history, multiple model support, and function calling. By setting up Open WebUI, we can create a familiar, user-friendly chat experience – whether on our PC or even remotely from your phone!

Here’s how to deploy your Open WebUI:

Linux and Windows:

  1. Open Terminal (Linux) or the Command Prompt (Windows)
  2. Run the following command to pull and start the Open WebUI container:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Mac:

  1. Open Terminal.
  2. Check your Python version to ensure you have Python 3.11 or greater by running this command (may be python3 instead of python, check both),
python --version
  1. Install Open WebUI (will be pip if python worked in the earlier step, otherwise try pip3) by running the following:
pip install open-webui
  1. Begin the Open WebUI server:
open-webui serve --port 3000

Note: the default port for Open WebUI is 8080, which also a popular port for other softwares that serve themselves via endpoints. To avoid conflict, I’ve changed the port to 3000.

Confirm the Open WebUI server is running by opening a browser and navigating to http://localhost:3000. If you see the login screen, then the install was a success!

Step 4: Connecting Ollama to Open WebUI

We now have two of the main components of this setup up and running. Let’s get them connected, so that we can use Open WebUI to chat with DeepSeek R1!

  1. Put any email and password combination you wish into the input fields of the Open WebUI login screen. If this is the first instance of Open WebUI on this machine (and/or no prior accounts have been made), then this login motion will create the administrator account.
  2. Next, enter your name and complete the other fields.
  3. You should now see a chat input field. We’ll leave that for the moment. Click the icon in the top left corner of the screen to show the sidebar menu.
  4. Click your name in the bottom right corner, and select “Admin Panel”
  1. Along the top of the Admin Panel menu, you should see Users, Evaluations, Functions and Settings. Click Settings.
  2. In the menu that appears on the left, click Connections.
  3. In the Ollama API submenu, under Manage Ollama API Connections click the Cog icon on the right side.
  1. Under the URL field, put either:

Mac:

http://localhost:11434

Docker (Windows or Linux):

http://host.docker.internal:11434
  1. Click the Verify Connection icon, whichs is denoted by the clockwise arrows highlighted in red in the image above.
  2. If you recieve the “Server Connection Verified” message pop up in the top right corner of the screen, you have succesfully connected Ollama to Open WebUI!
  3. Close the Edit Connection screen.
  4. In the top right corner, click on the pen and paper icon to start a New Chat.
  5. Check to see if “deepseek-r1:xx” is in the top left corner of the chat window. If it is not, then click the little down arrow icon next to either “Arena Model” or “Select a model”. In the dropdown, select “deepseek-r1:xx”.

Now, we should be ready to chat with DeepSeek R1! Send it a message. How did it go?

Congratulations! At this point, you have successfully deployed your own large language model and chat interface on your own computer. At this stage, you can open up http://localhost:3000 on the browser on your PC and chat anytime with your model. You have successfully set up your own private AI and chat interface!

I’d recommend Googling for guides on using Open WebUI and all of the cool features it carries (including add new models as they get released to your Ollama instance via the Settings here!), with more being added by the community with each release.

Although it would be nice to have this on your iPhone, so you can chat with it on the go… wouldn’t it?

Step 5: Making Open WebUI accessible from anywhere with ZeroTier

ZeroTier is a private virtual network service, which allows devices across multiple physical networks to interact as if they were on the same local network. In plain language, it is a service that will allow you to connect any device you like (such as your iPhone!) to the PC hosting our Ollama and Open WebUI instances – such that they can access them!

There are many others like ZeroTier, but having worked with this service in several other projects, I chose ZeroTier as it was familiar, is often very fast, is easy to manage and setup, and is free!

First, we install ZeroTier onto the PC that is hosting Ollama and Open WebUI:

  1. Create a ZeroTier account.
  2. Once logged in, click “Create a Network”. A table will appear below with your new network. Click on it, and keep this window/tab open, as we will need it for later.
  3. Download ZeroTier onto your PC and follow the prompts until the installation is complete.
  4. Next, follow these steps to connect your new Network to the PC’s new installed instance of ZeroTier. You’ll should see steps depending on your operating system.
  5. Navigate back to your browser (remember tab I mentioned earlier?) and authenticate the PC on your Network.
  6. Take note of the IP address assigned to your machine, which you will find in the table labelled Members.

Repeat Steps 3-5 for any new device you wish to add to this same network, including on your iPhone. Let’s do it!

Step 6: Install ZeroTier onto iPhone

Don’t have an iPhone? Well, simply replace the word “iPhone” with the word “Android device” in the below steps, use the Google Play Store instead of the Apple App Store to download ZeroTier, and it should all work the same.

  1. On your iPhone, download the ZeroTier app from the App Store.
  2. Open the app and log in.
  3. Click the + icon in the top right corner to add a Network.
  4. Enter the Network ID from your ZeroTier dashboard.
  5. You should see your network added to the list, with the switch on the right being toggled to the On position.
  6. Navigate back to your browser (remember tab I mentioned earlier?) and authenticate your iPhone on your Network. Wait 30 seconds.
  7. Back in the iOS app, click on your Network. Under “Managed IPs”, you should see an IP address. If so, this means your device has successfully connected to your ZeroTier network!
  8. While here, toggle “Enable on Demand” to on, which will allow the device to connect to the other devices at all times.

So… now what? Now, we can access our Open WebUI instance from any device within the network!

How? Well, instead of navigating to http://localhost:3000, you will instead navigate to the ZeroTier IP address of the machine hosting Ollama and Open WebUI. Recall in the previous Step section we took down this IP address.

To test, on your iPhone, open a browser and navigate to

http://{Host PC’s Zero IP Address}:3000

Where {Host PC’s IP Address} is the ZeroTier address we took down earlier of the host machine.

Did it appear? Great! You can now chat to your DeepSeek R1 instance… from anywhere!

Step 7: Add Open WebUI to your iPhone’s home screen

Similarly to the last step, although the interface will be slightly different, you can also replicate these steps on any Android device.

  1. Open Safari on your iPhone and navigate to http://{Host PC’s Zero IP Address}:3000
  2. Click the share button in the middle of the bottom navigation menu (red square in image below).

  1. Scroll to see the option “Add to Home Screen”.

  1. Give the page a name – I simply went with “Ollama”. It will appear like the name of an app on your Home Screen.

  1. Go to your Home Screen and open the app.

  1. Notice the lack of URL bar or anything related to the browser? It’s as if Open WebUI is its own smartphone app!

Step 8: Wait, am I… done?

Yup.

As long as your PC is turned on and the services are running (on Mac, after each restart, you will need to manually restart Ollama and Open WebUI by running the “serve” commands in Steps 2 and 3), you will be able to access your own private AI chat application from anywhere!

Great work.

Ways to improve this setup

This works great, but like with all good things, they can be made great. Here are a few ways to enhance your private AI experience.

Upgrade Your GPU: A higher-end GPU will improve model performance and reduce latency, even with the same model. One with a larger VRAM will allow you to deploy even larger, more capable models.

Other LLMs, including multimodal LLMs: The DeepSeek R1 model series is a text generation model. Ollama natively supports mutimodal models that can take images as input, right from your phone – such as llama3.2-vision. You can learn more about adding more models to your Ollama server through your Open WebUI instance (yup, even from your phone!) using this guide.

Set Up a Dedicated Server: For better reliability, consider hosting your AI deployments on a dedicated server rather than your personal PC.


Conclusion

Deploying DeepSeek R1 on your iPhone is a fantastic way to explore the world of local AI while maintaining control over your data. By combining Docker, ZeroTier, and Ollama, you’ve created a powerful toolset that puts AI in your hands—literally. Whether you’re experimenting with new models or simply enjoying the convenience of private AI, this setup is sure to impress. Happy exploring!*

*This conclusion was written with the DeepSeek R1 32B instance deployed using this guide. How did it do?

Leave a Reply

Your email address will not be published. Required fields are marked *