🔧 阿川の電商水電行

Shopify 顧問、維護與客製化

💡

小任務 / 單次支援方案

單次處理 Shopify 修正／微調

⭐️

維護方案

每月 Shopify 技術支援 + 小修改 + 諮詢

🚀

專案建置

Shopify 功能導入、培訓 + 分階段交付

👉 瞭解詳情 / 免費諮詢

小編精選 - 技術文章翻譯 · 01月24日

使用 Ollama 在筆記型電腦上執行 DeepSeek-R1

昨天，DeepSeek 發布了一系列非常強大的語言模型，包括 DeepSeek R1 和一些基於 Qwen 和 Llama 架構的精煉（較小）模型。這些模型因其性能、推理能力以及最重要的是具有 MIT 許可證的開源特性而在人工智慧社群中引起了很大的轟動。

我一直在透過他們自己的 API 和我自己的 MacBook Pro 上本地測試這些模型，我不得不說，即使對於 8B 和 14B 型號等較小的型號，性能也非常驚人。這是將 DeepSeek R1 與 OpenAI 和 Anthropic 的其他最先進模型進行比較的基準。

DeepSeek R1 基準測試

在本指南中，我將引導您了解如何在您自己的電腦上設定 Ollama 並本地執行最新的 DeepSeek R1 模型。但在開始之前，讓我們先來看看模型本身。

深尋R1

DeepSeek R1是一個專注於推理的大型語言模式。它可以處理需要多步驟解決問題和邏輯思維的任務。該模型採用了一種特殊的訓練方法，更加強調強化學習（RL）而不是監督微調（SFT）。這種方法有助於模型更好地自行解決問題。

該模型是開源的，這意味著它的權重可以在麻省理工學院的許可下使用。這允許人們將其用於商業目的、對其進行更改並基於它建立新版本。這與許多其他非開源的大型語言模型不同。

精煉模型：更小但仍然更強大

DeepSeek AI 也發布了該模型的較小版本。這些精煉模型有不同的尺寸，例如 1.5B、7B、8B、14B、32B 和 70B 參數。它們基於 Qwen 和 Llama 架構。這些較小的模型保留了較大模型的大量推理能力，但更容易在個人電腦上使用。

較小的型號，尤其是 8B 及更小的型號，可在配備 CPU、GPU 或 Apple Silicon 的普通電腦上執行。這使得人們很容易在家中進行實驗。

奧拉瑪是什麼？

Ollama 是一款可讓您在自己的電腦上執行和管理大型語言模型 (LLM) 的工具。它使下載、執行和使用這些模型變得更加容易，而無需強大的伺服器。 Ollama 支援各種作業系統，包括 macOS、Linux 和 Windows。它的設計簡單易用，具有用於拉取、執行和管理模型的基本命令。

Ollama 還提供了一種透過 API 使用模型的方法，可讓您將它們整合到其他應用程式中。重要的是，Ollama 提供了與 OpenAI API 的實驗性相容層。這意味著您通常可以將專為 OpenAI 設計的現有應用程式和工具與本機 Ollama 伺服器結合使用。它可以配置為使用 GPU 來實現更快的處理，並提供自訂模型建立和模型共享等功能。 Ollama 是探索和使用LLMs的絕佳方式，無需依賴雲端的服務。

安裝奧拉瑪

在使用 DeepSeek 模型之前，您需要安裝 Ollama。以下是在不同作業系統上執行此操作的方法：

macOS

造訪Ollama 網站並下載 macOS 安裝程式。
打開下載的檔案並將 Ollama 應用程式拖曳到您的應用程式資料夾中。
啟動 Ollama 應用程式。它將在背景執行並顯示在您的系統托盤中。
開啟終端機並輸入ollama -v檢查安裝是否成功。

Linux

打開終端機並執行以下命令來安裝 Ollama：

    curl -fsSL https://ollama.com/install.sh | sh

如果您喜歡手動安裝，請從Ollama 網站下載正確的.tgz軟體包。然後，使用以下命令將包解壓縮到/usr ：

    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
    sudo tar -C /usr -xzf ollama-linux-amd64.tgz

要啟動 Ollama，請執行ollama serve 。您可以透過在另一個終端中輸入ollama -v來檢查它是否正常運作。
為了獲得更可靠的設置，請建立 systemd 服務。首先，為 Ollama 建立使用者和群組：

    sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
    sudo usermod -a -G ollama $(whoami)

然後，在/etc/systemd/system/ollama.service中建立包含以下內容的服務檔案：

    [Unit]
    Description=Ollama Service
    After=network-online.target

    [Service]
    ExecStart=/usr/bin/ollama serve
    User=ollama
    Group=ollama
    Restart=always
    RestartSec=3
    Environment="PATH=$PATH"

    [Install]
    WantedBy=default.target

最後，啟動並啟用服務：

    sudo systemctl daemon-reload
    sudo systemctl enable ollama
    sudo systemctl start ollama
    sudo systemctl status ollama

視窗

造訪Ollama 網站並下載 Windows 安裝程式 ( OllamaSetup.exe )。
執行安裝程式。 Ollama 將安裝在您的使用者設定檔中。
Ollama 將在背景執行並顯示在您的系統托盤中。
開啟命令提示字元或 PowerShell 並鍵入ollama -v以檢查安裝是否成功。

了解 Olama 命令

Ollama 使用簡單的指令來管理模型。以下是您需要的一些關鍵命令：

ollama -v ：檢查 Ollama 的安裝版本。
ollama pull <model_name>:<tag> ：從 Ollama 庫下載模型。
ollama run <model_name>:<tag> ：執行模型並啟動互動式聊天會話。
ollama create <model_name> -f <Modelfile> ：使用模型檔案建立自訂模型。
ollama show <model_name> ：顯示有關模型的資訊。
ollama ps ：列出目前正在執行的模型。
ollama stop <model_name> ：從記憶體卸載模型。
ollama cp <source_model> <destination_model> ：複製模型。
ollama delete <model_name> ：刪除模型。
ollama push <model_name>:<tag> ：將模型上傳到模型庫。

Ollama 上的 DeepSeek 模型

Ollama 庫中提供了不同大小和格式的 DeepSeek 模型。這是一個細分：

型號尺寸：型號有多種尺寸，例如 1.5b、7b、8b、14b、32b、70b 和 671b。 “b”代表十億個參數。較大的模型通常表現較好，但需要更多資源。
量化版本：某些模型有量化版本（例如q4_K_M 、 q8_0 ）。這些版本使用更少的記憶體並且執行速度更快，但品質可能略有下降。
蒸餾版本： DeepSeek 也提供蒸餾版本（例如qwen-distill 、 llama-distill ）。這些較小的模型經過訓練，可以像較大的模型一樣執行，平衡性能和資源使用。
標籤：每個模型都有一個latest標籤和顯示大小、量化和蒸餾方法的特定標籤。

使用 DeepSeek 模型

以下是將 DeepSeek 模型與 Ollama 結合使用的方法：

拉模型

若要下載 DeepSeek 模型，請使用以下指令：

ollama pull deepseek-r1:<model_tag>

將<model_tag>替換為您要使用的模型的特定標籤。例如：

下載最新的 7B 型號：

    ollama pull deepseek-r1:7b

若要下載具有q4_K_M量化的 14B Qwen 蒸餾模型：

    ollama pull deepseek-r1:14b-qwen-distill-q4_K_M

要下載fp16精度的 70B Llama 蒸餾模型：

    ollama pull deepseek-r1:70b-llama-distill-fp16

以下是一些可用的標籤：

latest
1.5b
7b
8b
14b
32b
70b
671b
1.5b-qwen-distill-fp16
1.5b-qwen-distill-q4_K_M
1.5b-qwen-distill-q8_0
14b-qwen-distill-fp16
14b-qwen-distill-q4_K_M
14b-qwen-distill-q8_0
32b-qwen-distill-fp16
32b-qwen-distill-q4_K_M
32b-qwen-distill-q8_0
70b-llama-distill-fp16
70b-llama-distill-q4_K_M
70b-llama-distill-q8_0
7b-qwen-distill-fp16
7b-qwen-distill-q4_K_M
7b-qwen-distill-q8_0
8b-llama-distill-fp16
8b-llama-distill-q4_K_M
8b-llama-distill-q8_0

執行模型

下載模型後，您可以使用以下命令執行它：

ollama run deepseek-r1:<model_tag>

例如：

要執行最新的 7B 型號：

    ollama run deepseek-r1:7b

要執行具有q4_K_M量化的 14B Qwen 蒸餾模型：

    ollama run deepseek-r1:14b-qwen-distill-q4_K_M

要以fp16精度執行 70B Llama 蒸餾模型：

    ollama run deepseek-r1:70b-llama-distill-fp16

這將啟動一個互動式聊天會話，您可以在其中詢問模型問題。

使用API

您也可以將 Ollama API 與 DeepSeek 模型結合使用。這是使用curl的範例：

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "prompt": "Write a short poem about the stars."
}'

對於聊天完成：

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:7b",
  "messages": [
    {
      "role": "user",
      "content": "Write a short poem about the stars."
    }
  ]
}'

使用 OpenAI 相容 API

Ollama 提供了一個與部分 OpenAI API 相容的實驗性相容層。這允許您將專為 OpenAI 設計的現有應用程式和工具與本機 Ollama 伺服器結合使用。

關鍵概念

API 端點： Ollama 的 OpenAI 相容 API 位於http://localhost:11434/v1 。
驗證： Ollama 的 API 不需要本機使用的 API 金鑰。您通常可以在客戶端中使用"ollama"等佔位符作為api_key參數。
部分相容性： Ollama 的相容性是實驗性的和部分的。並非 OpenAI API 的所有功能都受支持，且行為上可能存在一些差異。
專注於核心功能： Ollama 主要旨在支援 OpenAI API 的核心功能，例如聊天完成、文字完成、模型清單和嵌入。

支援的端點和功能

以下是支援的端點及其功能的詳細資訊：

/v1/chat/completions

*   **Purpose:** Generate chat-style responses.

*   **Supported Features:**

    *   Chat completions (multi-turn conversations).

    *   Streaming responses (real-time output).

    *   JSON mode (structured JSON output).

    *   Reproducible outputs (using a `seed`).

    *   Vision (multimodal models like `llava` that can process images).

    *   Tools (function calling).

*   **Supported Request Fields:**

    *   `model`: The name of the Ollama model to use.

    *   `messages`: An array of message objects, each with a `role` (`system`, `user`, `assistant`, or `tool`) and `content` (text or image).

    *   `frequency_penalty`, `presence_penalty`: Controls repetition.

    *   `response_format`: Specifies the output format (e.g. `json`).

    *   `seed`: For reproducible outputs.

    *   `stop`: Sequences to stop generation.

    *   `stream`: Enables/disables streaming.

    *   `stream_options`: Additional options for streaming.

        * `include_usage`: Includes usage information in the stream.

    *   `temperature`: Controls randomness.

    *   `top_p`: Controls diversity.

    *   `max_tokens`: Maximum tokens to generate.

    *   `tools`: List of tools the model can access.

/v1/completions

*   **Purpose:** Generate text completions.

*   **Supported Features:**

    *   Text completions (single-turn generation).

    *   Streaming responses.

    *   JSON mode

    *   Reproducible outputs.

*   **Supported Request Fields:**

    *   `model`: The name of the Ollama model.

    *   `prompt`: The input text.

    *   `frequency_penalty`, `presence_penalty`: Controls repetition.

    *   `seed`: For reproducible outputs.

    *   `stop`: Stop sequences.

    *   `stream`: Enables/disables streaming.

    *   `stream_options`: Additional options for streaming.

        * `include_usage`: Includes usage information in the stream.

    *   `temperature`: Controls randomness.

    *   `top_p`: Controls diversity.

    *   `max_tokens`: Maximum tokens to generate.

    *   `suffix`: Text to append after the model's response

/v1/models
/v1/models/{model}
/v1/embeddings

如何將 Ollama 與 OpenAI 用戶端結合使用

以下是如何配置流行的 OpenAI 用戶端以與 Ollama 配合使用：

OpenAI Python 函式庫：

    from openai import OpenAI

    client = OpenAI(
        base_url='http://localhost:11434/v1/',
        api_key='ollama',  # Required but ignored
    )

    # Example chat completion
    chat_completion = client.chat.completions.create(
        messages=[
            {'role': 'user', 'content': 'Say this is a test'},
        ],
        model='deepseek-r1:7b',
    )

    # Example text completion
    completion = client.completions.create(
        model="deepseek-r1:7b",
        prompt="Say this is a test",
    )

    # Example list models
    list_completion = client.models.list()

    # Example get model info
    model = client.models.retrieve("deepseek-r1:7b")

OpenAI JavaScript 函式庫：

    import OpenAI from 'openai';

    const openai = new OpenAI({
      baseURL: 'http://localhost:11434/v1/',
      apiKey: 'ollama', // Required but ignored
    });

    // Example chat completion
    const chatCompletion = await openai.chat.completions.create({
      messages: [{ role: 'user', content: 'Say this is a test' }],
      model: 'deepseek-r1:7b',
    });

    // Example text completion
    const completion = await openai.completions.create({
      model: "deepseek-r1:7b",
      prompt: "Say this is a test.",
    });

    // Example list models
    const listCompletion = await openai.models.list()

    // Example get model info
    const model = await openai.models.retrieve("deepseek-r1:7b")

curl （直接 API 呼叫）：

    # Chat completion
    curl http://localhost:11434/v1/chat/completions \
        -H "Content-Type: application/json" \
        -d '{
            "model": "deepseek-r1:7b",
            "messages": [
                {
                    "role": "user",
                    "content": "Hello!"
                }
            ]
        }'

    # Text completion
    curl http://localhost:11434/v1/completions \
        -H "Content-Type: application/json" \
        -d '{
            "model": "deepseek-r1:7b",
            "prompt": "Say this is a test"
        }'

    # List models
    curl http://localhost:11434/v1/models

    # Get model info
    curl http://localhost:11434/v1/models/deepseek-r1:7b

    # Embeddings
    curl http://localhost:11434/v1/embeddings \
        -H "Content-Type: application/json" \
        -d '{
            "model": "all-minilm",
            "input": ["why is the sky blue?", "why is the grass green?"]
        }'

選擇正確的型號

選擇 DeepSeek 模型時，請考慮以下因素：

大小：較大的模型通常表現較好，但需要更多資源。如果您的資源有限，請從較小的模型開始。
量化：量化模型使用較少的內存，但品質可能稍低。
蒸餾：蒸餾模型在性能和資源使用之間提供了良好的平衡。

最好嘗試不同的模型，看看哪一種最適合您。

額外提示

請務必檢查Ollama 庫以取得最新型號和標籤。
使用ollama ps監視模型所使用的資源。
您可以調整temperature 、 top_p和num_ctx等參數來微調模型的輸出。

故障排除

如果您有任何問題，請檢查 Ollama 日誌：

macOS： ~/.ollama/logs/server.log
Linux： journalctl -u ollama --no-pager
Windows： %LOCALAPPDATA%\Ollama\server.log

您也可以使用OLLAMA_DEBUG=1環境變數來取得更詳細的日誌。

與LLMs一起走得更遠

當然，在本地執行這些模型只是一個開始。您可以使用 API 將這些模型整合到您自己的應用程式中，建立自訂應用程式，例如聊天機器人、具有 Retriever 增強生成 (RAG) 的研究工具等。

我寫了一些關於進一步探索這些模型的指南，例如：

使用 Docker 設定 Postgres 和 pgvector 以建立 RAG 應用程式- 在本逐步指南中了解如何使用 Docker 設定 Postgres 和 pgvector 以實現 RAG（檢索增強生成）。
深入研究 Postgres 和 pgvector 中的向量相似性搜尋- 了解如何使用 pgvector 使 Postgres 中的向量相似性搜尋變得更容易。探索用於建立索引、查詢向量等的函數。
使用 AI SDK 在 Node 中建立 AI 代理- 了解如何使用 AI SDK 在 Node 中建立 AI 代理程式以自動化工作流程和任務。
如何使用 LLM 和 Web 爬網豐富客戶資料- 了解如何使用 LLM 和 Puppeteer 爬網客戶網站並豐富 SaaS 產品中的資料。