小編精選 - 技術文章翻譯 · 02月27日

軟體 3.1？ - AI 功能

觀看：人工智慧功能深度分析：

Andrej Karpathy 為軟體編寫方式製定了一套版本號方案。軟體 1.0 是人類編寫的程式碼。軟體 2.0 是透過優化學習到的神經網路權重。軟體 3.0 是用簡單易懂的語言提示 LLM（學習邏輯模型），聽起來比稱之為「vibe coding」（氛圍編碼）好聽得多——有趣的是，「氛圍編碼」也是 Karpathy 創造的一個術語。

當然，軟體3.0時代已經來臨。每天都有數百萬人在使用它。像是Kiro、Cursor、Claude Code和ChatGPT這樣的工具，讓你能夠描述需求並獲得對應的程式碼。 Karpathy強調部分自主工具中的「生成-驗證循環」：模型產生變更，人工驗證，然後迭代完成。

但問題遠不止於誰審查了什麼。看看LLM在軟體3.0中實際產生了什麼：文字。字串形式的程式碼。 JSON有效負載。 Markdown文件。模型生成，你收到文本，然後你完成其他所有工作——將其集成到你的程式碼庫中，編寫測試，執行持續集成，部署。如果你專注於驗證，你會寫測試案例，但這些用例會在部署之前執行。一旦程式碼發布，測試就不會再執行。 LLM的參與在它把輸出交給你時就結束了。你執行的軟體與幫助它產生程式碼的模型沒有任何關係。

現在考慮另一種方案。 LLM 產生的程式碼會在應用程式內部實際運作—每次函數被呼叫時都會執行。它傳回的是原生 Python 物件——DataFrame、Pydantic 模型、資料庫連接——而不是需要解析的 JSON 字串。驗證也不是部署前的一道關卡；而是每次呼叫時都會執行的後置條件，將失敗回饋給模型進行自動重試。這同時改變了三件事：AI 在軟體中的位置（執行時，而不僅僅是開發階段）、它生成的內容（可以呼叫方法的實時物件，而不是序列化的文本）以及你如何信任它（持續的自動化驗證，而不是一次性的人工審核）。

這就是AI Functions的核心實驗，它是 Strands Labs 基於Strands Agents SDK開發的新專案。你用自然語言規範而非實作程式碼來寫 Python 函數。你加入後置條件——純 Python 斷言，用於定義正確的輸出格式。當函數被呼叫時，LLM 會產生程式碼，在你的 Python 進程中執行程式碼，並將結果作為原生物件傳回，然後由後置條件進行驗證。如果驗證失敗，系統會將錯誤重試為回饋。人無需檢查產生的程式碼。每次都是後置條件完成檢查。

如果軟體 3.0 是“人提示，LLM 生成，人驗證”，那麼我認為 AI 功能是軟體 3.1：人指定，LLM 生成並執行，機器在執行時驗證。範式相同－程式介面使用自然語言。但執行模型不同。 LLM 不再產生供人集成的文本，而是產生可執行的程式碼，返回應用程式直接使用的物件，並在每次呼叫時透過後置條件進行驗證。軟體 3.1 是一個“小版本更新”，而非主版本升級。升級之處在於程式碼產生之後的操作。

本文將深入探討人工智慧功能是什麼、它們是如何運作的，以及自動化驗證能夠實現什麼。

人工智慧的功能有哪些？

AI Functions 建構於Strands Agents SDK之上，這是一個用於建立 AI 代理的開源框架。 AI Functions 引入了一個核心抽象： @ai_function裝飾器。您可以使用自然語言規範而非實作程式碼來編寫 Python 函數。當函數被呼叫時，LLM 會產生實作程式碼，執行該程式碼，並傳回結果。此外，您還可以選擇新增後置條件（這一點至關重要），用於驗證輸出並在驗證失敗時觸發自動重試。

最簡單的例子如下：

from ai_functions import ai_function

@ai_function
def translate_text(text: str, lang: str) -> str:
    """
    Translate the text below to the following language: {lang}.
    {text}
    """

result = translate_text("The quarterly results exceeded expectations.", lang="French")

你可以像呼叫任何 Python 函數一樣呼叫translate_text 。裝飾器會攔截該呼叫，根據文件字串建立提示符（替換參數），將其發送給 LLM，並將結果作為類型化的 Python 字串返回。從呼叫者的角度來看，它只是一個接受字串並傳回字串的函數。由 LLM 執行該函數只是一個實作細節。

單就這一點而言，它仍然有點像軟體3.0——輸入指令，輸出結果。這固然不錯，但AI函數真正精彩之處還不在這裡。只有加入結構化、驗證、程式碼執行、多智能體組合和非同步工作流程等功能，AI函數才真正變得有趣。這才是3.1版的起點。

使用 Pydantic 進行結構化輸出

AI 函數可以傳回任意類型的物件，而不僅僅是字串。當您將 Pydantic 模型指定為傳回類型時，框架會自動強制執行模式相容性：

from ai_functions import ai_function
from pydantic import BaseModel

class MeetingSummary(BaseModel):
    attendees: list[str]
    key_decisions: list[str]
    action_items: list[str]

@ai_function
def summarize_meeting(transcript: str) -> MeetingSummary:
    """
    Summarize the following meeting transcript in less than 50 words.
    <transcript>
    {transcript}
    </transcript>
    """

呼叫summarize_meeting(transcript)函數，即可取得一個具有類型化欄位、IDE 自動補全功能和 Pydantic 內建驗證的MeetingSummary物件。 LLM 的輸出會被解析為 Pydantic 模型，如果結構不匹配，框架會自動處理重試。從呼叫者的角度來看，該函數傳回的是一個類型化的 Python 物件。

這是像Instructor這樣的框架所建立的模式。 AI Functions 的貢獻不在於結構化輸出本身，而是結構化輸出如何與系統中的其他所有內容協同工作。

後條件

後置條件是人工智慧函數超越簡單提示框架的核心所在。後置條件是一個 Python 函數，用於驗證人工智慧函數的輸出。如果驗證失敗，錯誤訊息會回饋給 LLM（邏輯邏輯模型），LLM 會進行重試。多個後置條件並行執行，因此 LLM 可以同時接收所有失敗訊號，並在一次重試中解決所有問題。

from ai_functions import ai_function, PostConditionResult
from pydantic import BaseModel

class MeetingSummary(BaseModel):
    attendees: list[str]
    key_decisions: list[str]
    action_items: list[str]

def check_length(response: MeetingSummary):
    total = sum(len(d.split()) for d in response.key_decisions)
    assert total <= 50, f"Key decisions should total under 50 words, got {total}"

@ai_function
def check_quality(response: MeetingSummary) -> PostConditionResult:
    """
    Check if the meeting summary below satisfies the following criteria:
    - Key decisions must be specific and actionable, not vague
    - Action items must each name a responsible person
    <decisions>{response.key_decisions}</decisions>
    <actions>{response.action_items}</actions>
    """

@ai_function(post_conditions=[check_length, check_quality])
def summarize_meeting(transcript: str) -> MeetingSummary:
    """
    Summarize the following meeting transcript in less than 50 words.
    <transcript>
    {transcript}
    </transcript>
    """

這裡有兩點要注意。首先， check_length是一個普通的 Python 函數，失敗時會拋出一個 `AssertionError 。這是一個確定性的、可檢查的驗證——不涉及邏輯邏輯模型 (LLM)，因此不存在歧義。其次， check_quality本身是一個 AI 函數，它會傳回一個PostConditionResult物件－一個包含 `passed （布林值）和message （字串）欄位的 Pydantic 模型。它使用邏輯邏輯模型來評估摘要是否滿足難以用斷言表達的品質標準——例如特異性、可操作性和歸因性。這是一個 AI 函數驗證另一個 AI 函數的過程。框架對這兩個函數的處理方式相同：如果其中任何一個失敗，錯誤都會作為回饋傳回給產生該函數的邏輯邏輯模型。

這就形成了一個自我糾錯的循環。產生邏輯邏輯模型（LLM）無需在第一次嘗試時就完全正確。它需要能夠根據關於哪裡出錯的具體回饋進行改進。實際上，這意味著開發者的工作重心從精心設計完美的提示轉移到編寫良好的後置條件——這是一項截然不同的技能。

當然，我們需要了解這裡發生了什麼，這也可能導致我們的專案中存在「隱藏」的程式碼重試循環！在過度依賴這種方法之前，我們需要確保有可靠的監控和可觀測性。

傳回原生 Python 物件

大多數 LLM 框架強制使用 JSON 序列化輸出。而 AI 函數可以傳回不可序列化的 Python 物件——例如 DataFrame、SymPy 表達式、資料庫連線等等——因為產生的程式碼與應用程式執行在同一個 Python 解譯器中。

正是這項特性使 AI Functions 與其他框架在本質上區分開來。例如，它可以載入與資料格式無關的資料，無論購買記錄以何種方式儲存都能輕鬆處理：

from ai_functions import ai_function
from pandas import DataFrame, api

def check_invoice_dataframe(df: DataFrame):
    """Post-condition: validate DataFrame structure."""
    assert {'product_name', 'quantity', 'price', 'purchase_date'}.issubset(df.columns)
    assert api.types.is_integer_dtype(df['quantity']), "quantity must be an integer"
    assert api.types.is_float_dtype(df['price']), "price must be a float"
    assert api.types.is_datetime64_any_dtype(df['purchase_date'])
    assert not df.duplicated(subset=['product_name', 'purchase_date']).any()

@ai_function(
    code_execution_mode="local",
    code_executor_additional_imports=["pandas.*", "sqlite3", "json"],
    post_conditions=[check_invoice_dataframe],
)
def import_invoice(path: str) -> DataFrame:
    """
    The file `{path}` contains purchase logs. Extract them in a DataFrame with columns:
    - product_name (str)
    - quantity (int)
    - price (float)
    - purchase_date (datetime)
    """

呼叫import_invoice('data/orders.json')函數，你會得到一個真正的 Pandas DataFrame——不是它的 JSON 表示，也不是序列化的字串，而是一個真正的 DataFrame 物件，你可以立即對其呼叫.describe() 、 .groupby()或.plot(.describe()、.groupby()或.plot()` 。如果傳入的是 SQLite 文件，同樣的函數會檢查資料庫模式，編寫對應的 SQL 查詢，並傳回相同的經過驗證的 DataFrame 結構。

開發者無需編寫任何特定格式的解析邏輯。自然語言規範定義了輸出應包含的內容。後置條件驗證結構不變性。 LLM（語言邏輯模型）在呼叫時動態地將不透明檔案轉換為經過驗證的 DataFrame。

之所以可行，是因為該框架為 LLM 提供了一個 Python 執行器工具，該工具與呼叫程式碼共享相同的執行時間環境。 LLM 產生 Python 程式碼，在您的進程內執行程式碼，並直接傳回結果物件。無需序列化往返。 code\_execution\_modecode_execution_mode="local"`參數是一個明確選擇啟用專案－預設情況下，框架不會執行任意產生的程式碼，您需要宣告允許哪些導入。

程式碼執行與信任模型

程式碼執行模型值得我們更加關注，因為它揭示了 AI Functions 對信任的刻意處理方式。

啟用code_execution_mode="local"後，LLM 可以在您的解釋器中產生並執行 Python 程式碼。這非常強大——它使得返回 DataFrame、執行計算以及與本地環境互動成為可能。但同時，這也是一個安全隱憂。該框架透過多種機制來緩解這個問題：

需要明確選擇啟用。程式碼執行預設為關閉狀態。您必須為每個函數單獨啟用此功能。
code_executor_additional_imports限制。 code\_executor\_additional\_imports明確聲明了產生的程式碼可以使用的套件。任何未列出的包都不可用。
後置條件驗證。無論輸出是如何產生的，都會對其進行驗證。即使產生的程式碼走了意料之外的路徑，後置條件也能捕捉無效結果。

但客觀來說，這是一種權衡。您是在進程中執行 LLM 產生的程式碼。該框架使用基於抽象語法樹 (AST) 的生成程式碼驗證，並結合受控導入和逾時機制，試圖防止惡意導入和阻止危險操作。但這並不能提供真正的沙箱環境，也無法防止資源耗盡（例如無限循環、過度記憶體分配）。對於實驗而言，在適當的約束條件下，這是一個合理的選擇。對於生產工作負載，該專案建議在容器或其他隔離環境中執行 AI 函數，以提供進程級隔離。

多智能體組合

AI 函數的回傳結果可以透過常規 Python 自然地組合。由於 AI 函數傳回的是類型化物件，因此您可以像連結其他函數一樣連結它們——透過將輸出作為輸入傳遞：

from ai_functions import ai_function
from pandas import DataFrame

@ai_function(code_execution_mode="local", code_executor_additional_imports=["pandas.*"])
async def analyze_sales_data(path: str) -> DataFrame:
    """
    Load the sales data from `{path}` and compute a summary DataFrame
    with total revenue, average order value, and top 5 products by volume.
    """

@ai_function
def write_executive_summary(company: str, financials: DataFrame) -> str:
    """
    Write a concise executive summary for {company} highlighting key trends
    and recommendations based on the provided financial data.
    """

financials = await analyze_sales_data("data/q4_sales.csv")
summary = write_executive_summary("Acme Corp", financials)
print("Top Products:", financials.head())
print("Summary:", summary)

這只是普通的函陣列合。第一個函數傳回一個 DataFrame；第二個函數接受一個 DataFrame 作為輸入。不需要特殊的狀態傳遞機制。

對於更複雜的工作流程，AI 功能可以作為其他代理的工具，從而實現協調器將任務委託給專門的子代理的編排模式：

from ai_functions import ai_function
from ai_functions.types import PostConditionResult
from pydantic import BaseModel, Field
from typing import Literal

@ai_function(
    description="Search the web for a topic and return a cited summary",
    tools=[websearch_tool],
    post_conditions=[check_length, check_citations],
)
def search_agent(query: str, max_words: int = 500) -> str:
    """
    Perform a web search on the following topic and return a summary.
    Every claim must be supported by citations to sources.
    <query>{query}</query>
    """

@ai_function(
    description="Suggest the plan and organization of a report",
    tools=[websearch_tool],
)
def report_planner(topic: str) -> ReportPlan:
    """Generate a plan to write a report on: {topic}"""

@ai_function(tools=[report_planner, search_agent, report.add_section])
def report_orchestrator(topic: str) -> Literal["done"]:
    """
    Write a report on the following topic: {topic}
    """

編排器將report_planner 、 search_agent和report.add_section視為可呼叫的工具。每個子代理程式都執行著自己的後置條件，因此編排器能夠接收到經過驗證的結果。搜尋代理程式的引用資訊會在結果到達編排器之前進行驗證。這便建立了一個經過驗證的代理層級結構－後置條件在整個多代理系統中相互關聯。

非同步執行和平行工作流程

AI 函數可以定義為非async ，這使得獨立任務能夠並行執行：

from ai_functions import ai_function
import asyncio
import pandas as pd

@ai_function(tools=[websearch_tool])
async def research_market(company: str) -> str:
    """Research and summarize the competitive landscape and recent news for: {company}"""

@ai_function(code_execution_mode="local", code_executor_additional_imports=["pandas.*", "yfinance.*"])
async def load_financial_data(stock: str) -> pd.DataFrame:
    """
    Use the `yfinance` Python package to retrieve the historical prices of {stock}
    in the last 30 days. Return a DataFrame with columns [date, price].
    """

@ai_function(code_execution_mode="local", code_executor_additional_imports=["pandas.*", "plotly.*"])
def write_investment_memo(company: str, research: str, financials: pd.DataFrame) -> str:
    """
    Write an investment memo for {company}. Use the market research and financial data:
    {research}
    """

async def due_diligence_workflow(company: str):
    research, financials = await asyncio.gather(
        research_market(company),
        load_financial_data(company)
    )
    write_investment_memo(company, research, financials)

這兩個任務並行執行。由於它們是獨立的——一個搜尋網絡，一個加載並轉換本地資料——並行處理可以在大約一半的實際執行時間內獲得相同的結果，而且沒有額外的開銷。然後，結果會輸入到一個同步報表產生器中，該產生器會同時使用這兩個任務。

請注意tools=[websearch_tool]參數。 AI 函數可以使用任何Strands 工具。該框架為 Python 程式碼執行提供了內建工具，您還可以為每個函數傳遞額外的工具（例如網路搜尋、API 用戶端、檔案 I/O）。 LLM 會在執行過程中決定何時以及如何使用這些工具。

配置共享

工作流程的不同部分可能需要不同的模型。快速驗證檢查不需要與複雜分析相同的模型。 AI 函數使用AIFunctionConfig物件在函數之間共用配置：

from ai_functions import ai_function, AIFunctionConfig
from pandas import DataFrame

class Configs:
    BIG_MODEL = AIFunctionConfig(model="us.anthropic.claude-sonnet-4-5-20250929-v1:0")
    FAST_MODEL = AIFunctionConfig(model="us.anthropic.claude-haiku-4-5-20251001-v1:0")
    DATA_ANALYSIS = AIFunctionConfig(
        model="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        code_execution_mode="local",
        code_executor_additional_imports=["pandas.*", "numpy.*"],
    )

@ai_function(config=Configs.DATA_ANALYSIS)
def normalize_dataset(path: str) -> DataFrame:
    """Load, clean, and normalize the dataset at `{path}` into a standard schema."""

@ai_function(config=Configs.FAST_MODEL)
def validate_email(text: str) -> bool:
    """Check if the following string is a valid email address: {text}"""

配置是普通的 Python 物件，因此將整個流程從一個模型系列切換到另一個模型系列只需修改一行程式碼。在開發過程中，您可能會將所有請求都路由到一個功能強大但成本較高的模型。為了優化成本，您可以切換配置中的模型引用，並觀察哪些功能會受到影響。 `@ai_function的關鍵字參數可以覆蓋各個函數的配置值，因此您可以進行個人化設定而無需複製整個配置。

驗證的不僅是輸出

後置條件系統的一項較為巧妙的功能是驗證結果中那些難以用結構性檢定來表達的屬性。借助人工智慧驅動的後置條件，您可以利用一個邏輯邏輯模型（LLM）來驗證另一個LLM，從而評估語義品質——例如，論證基礎、引用品質和邏輯一致性：

from ai_functions import ai_function, PostConditionResult

@ai_function
def check_citations(summary: str) -> PostConditionResult:
    """
    Validate if all the claims made in the following summary are supported
    by an inline citation to a credible source.
    <summary>
    {summary}
    </summary>
    """

def check_length(summary: str, max_words: int):
    assert len(summary.split()) <= max_words

@ai_function(
    tools=[websearch_tool],
    post_conditions=[check_length, check_citations],
)
def market_researcher(query: str, max_words: int = 500) -> str:
    """
    Research and provide a well-sourced answer to: {query}
    Every claim must be supported by citations to credible sources.
    """

研究代理生成摘要。 check\_length check_length確定性地驗證字數。 check\_citations check_citations使用 LLM（邏輯邏輯模型）評估每個論點是否確實有引用來源支持。如果代理人在沒有進行實際研究的情況下憑空臆造出答案，引用檢查會發現這一點，並觸發重試，同時反饋具體哪些論點缺少來源。

這與檢查輸出結構不同，它是一種利用人工智慧來驗證人工智慧的驗證方式——檢查那些難以用斷言表達的語義屬性。它解決了基於邏輯邏輯模型（LLM）的系統中最棘手的問題之一：如何確定模型沒有憑空捏造結果？後置條件並不能完全解決這個問題，但它們建立了第二個獨立的評估機制，從而顯著降低了失敗率。

測試套件作為後置條件

後置條件模型在自動化編碼中有一個有趣的應用：將現有的測試套件用作後置條件。如果測試通過，則表示實現正確；如果測試失敗，則將失敗結果作為錯誤訊息回饋。

from ai_functions import ai_function
from pydantic import BaseModel
from typing import Any, Literal
import pytest, io
from contextlib import redirect_stderr, redirect_stdout

class FeatureRequest(BaseModel):
    description: str
    test_files: list[str]

# Post-conditions can request original input arguments by name.
# Here, `feature` matches the parameter name of `implement_feature`.
def run_tests(_answer: Any, feature: FeatureRequest):
    stdio_capture = io.StringIO()
    with redirect_stdout(stdio_capture), redirect_stderr(stdio_capture):
        retcode = pytest.main(feature.test_files)
    if retcode:
        raise RuntimeError(stdio_capture.getvalue())

@ai_function(post_conditions=[run_tests])
def implement_feature(feature: FeatureRequest) -> Literal["done"]:
    """
    Implement the following feature in the current code base:
    <feature>{feature.description}</feature>
    Once done the code base should pass the following tests: {feature.test_files}
    """

def run_workflow(features: list[FeatureRequest]):
    for feature in features:
        implement_feature(feature)

AI 函數的回傳值只是字串"done" ——這並不重要。重要的是其副作用：程式碼庫現在應該通過指定的測試。後置條件執行pytest ，並在任何測試失敗時引發異常。 LLM 接收測試輸出作為回饋，並不斷迭代，直到所有測試都通過為止。

文件指出，如果在提示指令之外提供後置條件，智能體通過的測試數量大約會增加 10-15%。智能體在應對具體的驗證失敗方面，其效率明顯高於遵循書面指示。這與一個更廣泛的規律相符：具體的、自動化的回饋循環優於詳細的提示。這正是 3.1 版本優於 3.0 版本的原因。

試試看

AI Functions 是一個實驗專案。其程式碼在strands-labs/ai-functions開源，隸屬於Strands Labs GitHub 組織——該組織專門託管基於 Strands Agents SDK 建置的實驗性專案。除了 AI Functions，您還會發現Robots （執行在邊緣硬體上的實體 AI 代理程式）和Robots Sim （用於機器人開發的模擬環境）。這三個專案都基於Strands Agents SDK建置，該 SDK 自 2025 年 5 月開源以來，下載量已超過 1400 萬次。這三個專案都明確標榜為實驗性專案——而這正是關鍵所在。在這個領域，了解哪些方法有效的最佳方法就是實際建構並觀察哪些方法行不通。

使用pip install strands-ai-functions （或uv add strands-ai-functions ）安裝它，克隆儲存庫以取得完整的範例集，然後開始實驗。

AI Functions 並非生產系統，而是討論的起點，或許也是 Karpathy 版本編號的下一步方向。不妨一試。寫一些後置條件，看看定義驗收標準是否比審核 LLM 輸出更自然。然後思考：4.0 版本會是什麼樣子？

我們目前還不知道結果。但實驗已經開始了 :)

原文出處：https://dev.to/aws/software-31-ai-functions-5acn