機率的世界－LLMs如何選擇你的下一個詞

引言

以 ChatGPT 為首的大型語言模型（LLM）透過 「預測下一個單詞」 來生成文章。

這種「預測」並非單純的決定論，而是基於概率分佈的抽樣來進行。

「為什麼即使提問相同，每次得到的回答都不同？」

關鍵在於 Softmax 和 抽樣策略。

這一次，我們將 LLM 選擇下一個單詞的原理以 數學公式、圖解和程式碼 的方式，簡明扼要地總結，讓初學者也能輕鬆理解。

最後附上實用的範例程式碼，但因為較長，所以將其摺疊起來，請展開查看。

1. 單詞生成的基本：概率分佈

例如，我們考慮以下的句子：

我昨天 ○○ 去了。

對於「○○」部分，模型內部可能會輸出以下的分數（logit）作為候選項：

學校 → 2.0
公司 → 1.0
公園 → 0.5

這些分數並不會直接用作概率，而是通過 Softmax 函數 轉換為概率分佈。

2. Softmax 的角色

a001 (1).png

Softmax 是一個將多個分數「正規化」為「概率」的函數。

$$
P(w_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}
$$

$z_i$ … 模型輸出的分數（logit）
$P(w_i)$ … 單詞 $w_i$ 被選中的概率

這樣一來，所有概率的總和將為1。

3. 抽樣方法的不同

a002 (1).png

在得到概率分佈後，實際上「選擇哪個單詞」的方法有幾種。

3.1 貪心抽樣（Greedy）

始終選擇概率最高的單詞
輸出穩定，但句子容易變得單調、乏味

3.2 Top-k 抽樣

只保留概率最高的 k 個候選單詞，並從中進行概率性選擇
例如：若 k=2，則只考慮「學校」和「公司」

3.3 Top-p 抽樣（Nucleus）

在累積概率超過 p 之前收集候選，然後進行選擇
候選數動態變化，因此更具彈性

4. 透過 Temperature 調整「創造性」

Softmax 有一個參數叫做 temperature，可以調整概率分佈的「尖銳度」。

低值（例如：0.5） → 集中在概率高的候選上（穩健但單調）
高值（例如：2.0） → 概率均勻化，稀有的單詞也會更容易被選中（創造性強但容易失控）

5. 通過程式碼體驗概率的世界

我們來用 Python 實現 Softmax 和抽樣。

import numpy as np

def softmax(logits, temperature=1.0):
    exp = np.exp(np.array(logits) / temperature)
    return exp / np.sum(exp)

words = ["學校", "公司", "公園"]
logits = [2.0, 1.0, 0.5]  # 模型的輸出分數

# 觀察不同的溫度下的概率分佈
for t in [0.5, 1.0, 2.0]:
    probs = softmax(logits, temperature=t)
    print(f"Temperature={t}: {dict(zip(words, np.round(probs, 3)))}")

# 抽樣
def sample_word(words, logits, temperature=1.0):
    probs = softmax(logits, temperature)
    return np.random.choice(words, p=probs)

print("\n抽樣結果:")
for _ in range(5):
    print(sample_word(words, logits, temperature=1.0))

執行示例（結果每次都會變）

Temperature=0.5: {'學校': 0.79, '公司': 0.16, '公園': 0.05}
Temperature=1.0: {'學校': 0.62, '公司': 0.23, '公園': 0.15}
Temperature=2.0: {'學校': 0.45, '公司': 0.27, '公園': 0.28}

抽樣結果:
學校
學校
公司
學校
公園

<details><summary>更詳細及實用的實作（程式碼）請點此</summary>

import numpy as np
import matplotlib.pyplot as plt
from typing import List, Tuple

def softmax(logits: List[float], temperature: float = 1.0) -> np.ndarray:
    """
    Softmax 函數將 logits 轉換為概率分佈

    參數:
        logits: 模型的輸出分數（logits）
        temperature: 溫度參數（低=集中，高=分散）

    返回:
        概率分佈（總和為1）
    """
    exp_logits = np.exp(np.array(logits) / temperature)
    probabilities = exp_logits / np.sum(exp_logits)
    return probabilities

def greedy_sampling(words: List[str], logits: List[float]) -> str:
    """貪心抽樣：選擇概率最高的單詞"""
    probs = softmax(logits)
    return words[np.argmax(probs)]

def top_k_sampling(words: List[str], logits: List[float], k: int = 5, temperature: float = 1.0) -> str:
    """Top-k 抽樣：從上位 k 個候選中隨機選擇"""
    probs = softmax(logits, temperature)

    # 獲取上位 k 個的索引
    top_k_indices = np.argsort(probs)[-k:]
    top_k_probs = probs[top_k_indices]

    # 重新正規化上位 k 個的概率
    top_k_probs = top_k_probs / np.sum(top_k_probs)

    # 抽樣
    chosen_idx = np.random.choice(top_k_indices, p=top_k_probs)
    return words[chosen_idx]

def top_p_sampling(words: List[str], logits: List[float], p: float = 0.9, temperature: float = 1.0) -> str:
    """Top-p（Nucleus）抽樣：從累積概率達到 p 的候選中選擇"""
    probs = softmax(logits, temperature)

    # 按概率降序排列的索引
    sorted_indices = np.argsort(probs)[::-1]
    sorted_probs = probs[sorted_indices]

    # 計算累積概率
    cumulative_probs = np.cumsum(sorted_probs)

    # 尋找累積概率超過 p 的第一個索引
    cutoff_idx = np.where(cumulative_probs >= p)[0][0] + 1

    # 對選中的候選進行概率重新正規化
    selected_indices = sorted_indices[:cutoff_idx]
    selected_probs = probs[selected_indices]
    selected_probs = selected_probs / np.sum(selected_probs)

    # 抽樣
    chosen_idx = np.random.choice(selected_indices, p=selected_probs)
    return words[chosen_idx]

def demonstrate_sampling():
    """抽樣方法的演示"""
    # 單詞候選和其 logits（分數）
    words = ["學校", "公司", "公園", "醫院", "圖書館"]
    logits = [2.0, 1.0, 0.5, -0.5, -1.0]

    print("=== LLM 單詞抽樣演示 ===\n")

    # 1. 顯示概率分佈
    print("1. 計算概率分佈")
    for temp in [0.5, 1.0, 2.0]:
        probs = softmax(logits, temperature=temp)
        print(f"Temperature={temp}:")
        for word, prob in zip(words, probs):
            print(f"  {word}: {prob:.3f}")
        print()

    # 2. 比較各抽樣方法
    print("2. 抽樣方法比較（執行10次）")
    print("-" * 60)

    methods = [
        ("貪心", lambda: greedy_sampling(words, logits)),
        ("Top-k (k=3)", lambda: top_k_sampling(words, logits, k=3)),
        ("Top-p (p=0.8)", lambda: top_p_sampling(words, logits, p=0.8)),
    ]

    for method_name, method_func in methods:
        print(f"{method_name:15s}: ", end="")
        results = [method_func() for _ in range(10)]
        print(" | ".join(results))

    print("\n" + "=" * 60)

def visualize_temperature_effect():
    """可視化 Temperature 的影響"""
    words = ["學校", "公司", "公園"]
    logits = [2.0, 1.0, 0.5]
    temperatures = [0.1, 0.5, 1.0, 2.0, 5.0]

    plt.figure(figsize=(12, 8))

    for i, temp in enumerate(temperatures):
        probs = softmax(logits, temperature=temp)

        plt.subplot(2, 3, i + 1)
        plt.bar(words, probs, color=['#3498db', '#e74c3c', '#2ecc71'])
        plt.title(f'Temperature = {temp}')
        plt.ylabel('概率')
        plt.ylim(0, 1)

        # 顯示概率值
        for j, prob in enumerate(probs):
            plt.text(j, prob + 0.01, f'{prob:.3f}', ha='center', va='bottom')

    plt.tight_layout()
    plt.suptitle('Temperature 對概率分佈的變化', y=1.02, fontsize=16)
    plt.show()

def sample_text_generation():
    """模擬實際的文章生成"""
    # 根據上下文的單詞候選和 logits
    context_predictions = [
        {
            "context": "我昨天",
            "words": ["學校", "公司", "公園", "醫院"],
            "logits": [2.0, 1.5, 0.5, -0.5],
        },
        {
            "context": "我昨天 學校",
            "words": ["去", "在", "從", "到"],
            "logits": [3.0, 1.0, 0.5, -1.0],
        },
        {
            "context": "我昨天 學校 去",
            "words": ["了", "在", "到", "回"],
            "logits": [2.5, 1.0, 0.8, 0.2],
        },
    ]

    print("=== 文章生成模擬 ===")
    print("根據上下文預測下個單詞，生成文章\n")

    for step in context_predictions:
        print(f"上下文: '{step['context']}'")
        print("候選單詞的概率:")

        probs = softmax(step["logits"])
        for word, prob in zip(step["words"], probs):
            print(f"  {word}: {prob:.3f}")

        # 使用 Top-p 抽樣進行選擇
        chosen = top_p_sampling(step["words"], step["logits"], p=0.8)
        print(f"選擇的單詞: '{chosen}'\n")

if __name__ == "__main__":
    # 設定 NumPy 的隨機種子（為了結果的重現性）
    np.random.seed(42)

    # 執行演示
    demonstrate_sampling()

    # 文章生成模擬
    sample_text_generation()

    # 可視化（若要執行，請取消註解）
    # visualize_temperature_effect()

</details>