使用 Gemini Live API 建立免持 AI 健身小程式

這是Google AI Studio 多模態挑戰賽的參賽作品

我建造了什麼

AI私人教練是一款實驗性的健身應用，採用「語音優先」的概念，將你的智慧型手機變成互動式健身夥伴。該應用程式主要透過語音指令進行控制，讓你專注於鍛煉，而不是螢幕。

我正在探索的問題：

運動期間的干擾：需要不斷與手機螢幕互動
缺乏個人化：大多數應用程式都提供一刀切的解決方案
被動互動：應用程式充當追蹤器而非助手

已實現的功能：

🎤語音建立程序：與人工智慧對話，建立個性化的鍛煉程序
🧠即時音訊互動：運動期間雙向語音溝通
📊綜合資料庫系統：用於儲存程式、會話和進度的系統
📈分析儀錶板：可視化進度追蹤和效能洞察
📅 Google 日曆整合：自動將鍛鍊新增至日曆
🎯混合架構：對話速度與分析準確性的結合

瀏覽器

示範

⚡直播小程式

📱在 AI Studio 中查看

💻GitHub 倉庫：ai-personal-trainer

我該如何使用 Google AI Studio

開發直接在Google AI Studio中開始，我在那裡嘗試了不同的多模式互動方法。

開發過程：

在 Google AI Studio 中進行原型設計：建立使用者介面和初始系統設置
匯出和開發：下載專案進行本地開發
擴充開發：使用Gemini CLI整合複雜功能
最終部署：上傳完成的專案並使用部署應用程式

雙模型架構：

主模型：即時對話

// Connection to live audio dialog
sessionRef.current = await clientRef.current.live.connect({
  model: 'gemini-2.5-flash-preview-native-audio-dialog',
  callbacks: {
    onopen: () => setConnectionStatus('connected'),
    onmessage: async (message) => {
      // Processing user speech
      if (message.serverContent?.inputTranscription) {
        const userText = message.serverContent.inputTranscription.text;
        onTranscript(userText);
      }

      // Playing AI response  
      const audio = message.serverContent?.modelTurn?.parts[0]?.inlineData;
      if (audio && outputAudioContextRef.current) {
        await playAudioResponse(audio);
      }
    }
  },
  config: {
    systemInstruction: createDynamicPrompt(),
    responseModalities: [Modality.AUDIO],
    outputAudioTranscription: {}, // <--- Enable LLM transcription
    inputAudioTranscription: {}, // <--- Enable user transcription
    speechConfig: {
      voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Orus' } }
    }
  }
});

分析模型：資料擷取

// Precise interpretation of user commands
export const interpretWorkoutCommand = async (transcript: string): Promise<{ command: 'log_set' | 'get_form_tip' | 'chat_message', data: { reps?: number, weight?: number, text?: string } | null }> => {
    const prompt = `You are an AI assistant interpreting voice commands from a user during a workout. The user's voice transcript is: "${transcript}".

    Your task is to analyze the transcript and classify it into one of the following commands, extracting relevant data.

    POSSIBLE COMMANDS:
    1. 'log_set': The user is reporting the completion of a set. They might mention repetitions (reps) and/or weight.
       - Keywords: "done", "finished", "log it", "reps", "weight", "kilos", numbers.
       - Example Transcripts: "Okay, 12 reps at 50 kilos", "I'm done", "8 reps", "log 90 pounds".
    2. 'get_form_tip': The user is asking for advice on their exercise form.
       - Keywords: "form", "technique", "how do I do this", "am I doing it right".
       - Example Transcripts: "check my form", "what's the technique for this".
    3. 'chat_message': The user is saying something else, likely a question or comment for the AI coach. This is the default if no other command fits.
       - Example Transcripts: "how many sets left", "I'm feeling tired", "what's the next exercise".

    Respond in JSON format with "command" and optional "data".
    - For 'log_set', 'data' should be an object with optional 'reps' and 'weight' numbers.
    - For 'get_form_tip', 'data' should be null.
    - For 'chat_message', 'data' should be an object with the original transcript as 'text'.

    Return ONLY the JSON object.

    Example Responses:
    - Transcript: "10 reps at 80 kg" -> { "command": "log_set", "data": { "reps": 10, "weight": 80 } }
    - Transcript: "how do I do this right?" -> { "command": "get_form_tip", "data": null }
    - Transcript: "what's the next exercise?" -> { "command": "chat_message", "data": { "text": "what's the next exercise?" } }
    `;

    try {
        const result = await ai.models.generateContent({
            model: "gemini-2.5-flash",
            contents: prompt,
...

多模態能力

1.無縫音訊交互

實作：使用Gemini Live API SDK連接的 client.live.connect 進行持續雙向串流傳輸

獨特性：就像電話交談一樣——你可以打斷對方並立即得到回應

2.混合命令處理

建築學：

對話模型：保持自然對話
分析模型：從語音擷取精確資料

處理範例：

User: "Did eight reps with sixty kilos, felt pretty easy"

Dialog Model → "Great! Logged 8 reps with 60 kg. Should we increase the weight?"
Analysis Model → 
  {
    "command": "log_set",
    "data": {
      "reps": 8,
      "weight": 60
    }
  }

3.功能齊全的資料系統

實施架構：

鍛鍊計畫（ /programs/{programId} ）：

{
  "name": "Strength program, 12 weeks",
  "createdBy": "userId",
  "workouts": {
    "day1": {
      "dayName": "Chest and triceps",
      "exercises": [
        {
          "exerciseId": "bench_press",
          "name": "Barbell bench press",
          "sets": [{"reps": 8, "weight": 60}],
          "rest": 120
        }
      ]
    }
  }
}

詳細培訓課程（ /sessions/{sessionId} ）：

{
  "userId": "user123",
  "date": "2024-01-15T10:00:00Z",
  "programId": "strength_program_001",
  "workoutId": "day1_chest",
  "duration": 5400, // seconds
  "voiceTranscript": "Complete log of conversation with AI...",
  "performedSets": {
    "set001": {
      "exerciseId": "bench_press",
      "setNumber": 1,
      "reps": 8,
      "weight": 62.5,
      "timestamp": "2024-01-15T10:15:30Z"
    }
  }
}

4.自動日曆集成

已實現：與 Google 日曆 API 直接集成

export const scheduleWorkouts = async (workouts: Workout[], accessToken: string): Promise<void> => {
    if (!workouts || workouts.length === 0) {
        throw new Error("No workouts to schedule.");
    }

    const schedulePromises = workouts.map(workout => {
        const startTime = getNextWorkoutDate(workout.dayOfWeek);
        const endTime = new Date(startTime.getTime() + 60 * 60 * 1000); // Assume 1-hour duration

        const event = {
            'summary': `Workout: ${workout.dayName}`,
            'description': `Your scheduled workout session.\n\nExercises:\n- ${workout.exercises.map(e => e.name).join('\n- ')}`,
            'start': {
                'dateTime': startTime.toISOString(),
                'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
            },
            'end': {
                'dateTime': endTime.toISOString(),
                'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
            },
        };

        return fetch('https://www.googleapis.com/calendar/v3/calendars/primary/events', {
            method: 'POST',
            headers: {
                'Authorization': `Bearer ${accessToken}`,
                'Content-Type': 'application/json',
            },
            body: JSON.stringify(event),
        });
    });

    await Promise.all(schedulePromises);
};

5.健身術語的語境理解

AI 理解特定詞彙：

集合日誌資料辨識（ log_set ）：AI 在使用者語音中搜尋數字和關鍵字（例如「reps」、「times」、「weight」、「pounds」），以自動填寫有關已完成集合的資料。
處理請求和評論（ get_form_tip ， chat_message ）：不包含直接記錄資料的短語將被處理為訓練師請求或簡單評論。