這是Google AI Studio 多模態挑戰賽的參賽作品
AI私人教練是一款實驗性的健身應用,採用「語音優先」的概念,將你的智慧型手機變成互動式健身夥伴。該應用程式主要透過語音指令進行控制,讓你專注於鍛煉,而不是螢幕。
我正在探索的問題:
運動期間的干擾:需要不斷與手機螢幕互動
缺乏個人化:大多數應用程式都提供一刀切的解決方案
被動互動:應用程式充當追蹤器而非助手
已實現的功能:
🎤語音建立程序:與人工智慧對話,建立個性化的鍛煉程序
🧠即時音訊互動:運動期間雙向語音溝通
📊綜合資料庫系統:用於儲存程式、會話和進度的系統
📈分析儀錶板:可視化進度追蹤和效能洞察
📅 Google 日曆整合:自動將鍛鍊新增至日曆
🎯混合架構:對話速度與分析準確性的結合
💻GitHub 倉庫:ai-personal-trainer
開發直接在Google AI Studio中開始,我在那裡嘗試了不同的多模式互動方法。
在 Google AI Studio 中進行原型設計:建立使用者介面和初始系統設置
匯出和開發:下載專案進行本地開發
擴充開發:使用Gemini CLI整合複雜功能
最終部署:上傳完成的專案並使用部署應用程式
// Connection to live audio dialog
sessionRef.current = await clientRef.current.live.connect({
model: 'gemini-2.5-flash-preview-native-audio-dialog',
callbacks: {
onopen: () => setConnectionStatus('connected'),
onmessage: async (message) => {
// Processing user speech
if (message.serverContent?.inputTranscription) {
const userText = message.serverContent.inputTranscription.text;
onTranscript(userText);
}
// Playing AI response
const audio = message.serverContent?.modelTurn?.parts[0]?.inlineData;
if (audio && outputAudioContextRef.current) {
await playAudioResponse(audio);
}
}
},
config: {
systemInstruction: createDynamicPrompt(),
responseModalities: [Modality.AUDIO],
outputAudioTranscription: {}, // <--- Enable LLM transcription
inputAudioTranscription: {}, // <--- Enable user transcription
speechConfig: {
voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Orus' } }
}
}
});
// Precise interpretation of user commands
export const interpretWorkoutCommand = async (transcript: string): Promise<{ command: 'log_set' | 'get_form_tip' | 'chat_message', data: { reps?: number, weight?: number, text?: string } | null }> => {
const prompt = `You are an AI assistant interpreting voice commands from a user during a workout. The user's voice transcript is: "${transcript}".
Your task is to analyze the transcript and classify it into one of the following commands, extracting relevant data.
POSSIBLE COMMANDS:
1. 'log_set': The user is reporting the completion of a set. They might mention repetitions (reps) and/or weight.
- Keywords: "done", "finished", "log it", "reps", "weight", "kilos", numbers.
- Example Transcripts: "Okay, 12 reps at 50 kilos", "I'm done", "8 reps", "log 90 pounds".
2. 'get_form_tip': The user is asking for advice on their exercise form.
- Keywords: "form", "technique", "how do I do this", "am I doing it right".
- Example Transcripts: "check my form", "what's the technique for this".
3. 'chat_message': The user is saying something else, likely a question or comment for the AI coach. This is the default if no other command fits.
- Example Transcripts: "how many sets left", "I'm feeling tired", "what's the next exercise".
Respond in JSON format with "command" and optional "data".
- For 'log_set', 'data' should be an object with optional 'reps' and 'weight' numbers.
- For 'get_form_tip', 'data' should be null.
- For 'chat_message', 'data' should be an object with the original transcript as 'text'.
Return ONLY the JSON object.
Example Responses:
- Transcript: "10 reps at 80 kg" -> { "command": "log_set", "data": { "reps": 10, "weight": 80 } }
- Transcript: "how do I do this right?" -> { "command": "get_form_tip", "data": null }
- Transcript: "what's the next exercise?" -> { "command": "chat_message", "data": { "text": "what's the next exercise?" } }
`;
try {
const result = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: prompt,
...
實作:使用Gemini Live API SDK連接的 client.live.connect 進行持續雙向串流傳輸
獨特性:就像電話交談一樣——你可以打斷對方並立即得到回應
建築學:
對話模型:保持自然對話
分析模型:從語音擷取精確資料
處理範例:
User: "Did eight reps with sixty kilos, felt pretty easy"
Dialog Model → "Great! Logged 8 reps with 60 kg. Should we increase the weight?"
Analysis Model →
{
"command": "log_set",
"data": {
"reps": 8,
"weight": 60
}
}
實施架構:
鍛鍊計畫( /programs/{programId}
):
{
"name": "Strength program, 12 weeks",
"createdBy": "userId",
"workouts": {
"day1": {
"dayName": "Chest and triceps",
"exercises": [
{
"exerciseId": "bench_press",
"name": "Barbell bench press",
"sets": [{"reps": 8, "weight": 60}],
"rest": 120
}
]
}
}
}
詳細培訓課程( /sessions/{sessionId}
):
{
"userId": "user123",
"date": "2024-01-15T10:00:00Z",
"programId": "strength_program_001",
"workoutId": "day1_chest",
"duration": 5400, // seconds
"voiceTranscript": "Complete log of conversation with AI...",
"performedSets": {
"set001": {
"exerciseId": "bench_press",
"setNumber": 1,
"reps": 8,
"weight": 62.5,
"timestamp": "2024-01-15T10:15:30Z"
}
}
}
已實現:與 Google 日曆 API 直接集成
export const scheduleWorkouts = async (workouts: Workout[], accessToken: string): Promise<void> => {
if (!workouts || workouts.length === 0) {
throw new Error("No workouts to schedule.");
}
const schedulePromises = workouts.map(workout => {
const startTime = getNextWorkoutDate(workout.dayOfWeek);
const endTime = new Date(startTime.getTime() + 60 * 60 * 1000); // Assume 1-hour duration
const event = {
'summary': `Workout: ${workout.dayName}`,
'description': `Your scheduled workout session.\n\nExercises:\n- ${workout.exercises.map(e => e.name).join('\n- ')}`,
'start': {
'dateTime': startTime.toISOString(),
'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
},
'end': {
'dateTime': endTime.toISOString(),
'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
},
};
return fetch('https://www.googleapis.com/calendar/v3/calendars/primary/events', {
method: 'POST',
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(event),
});
});
await Promise.all(schedulePromises);
};
AI 理解特定詞彙:
集合日誌資料辨識( log_set
):AI 在使用者語音中搜尋數字和關鍵字(例如「reps」、「times」、「weight」、「pounds」),以自動填寫有關已完成集合的資料。
處理請求和評論( get_form_tip
, chat_message
):不包含直接記錄資料的短語將被處理為訓練師請求或簡單評論。
已實施:專門的分析部分,提供有關鍛鍊表現和進度追蹤的詳細見解。
功能包括:
可視化進度圖表和圖形
歷史鍛鍊資料視覺化
績效指標和趨勢分析
語音辨識準確性:人工智慧並不總是能正確解釋即時對話中的命令,尤其是在背景噪音的情況下
命令執行:模型有時會在回應後「忘記」執行應用程式中的特定操作
=
傳統的健身應用程式會強迫你做出選擇:要麼追蹤資料,要麼專注於訓練。而多模式方法解決了這個難題:
語音介面讓您專注於鍛煉
智慧語音分析自動建構資料
即時回饋營造私人教練的感覺
自動運動計畫將健身融入日常生活
最終,健身伴侶能夠理解自然語音並適應每個使用者的獨特風格。
**注意:**儘管我的 AI 教練非常聰明且富有激勵性,但我們強烈建議保持常識,尤其是在健康問題上。 💪
我向Google AI Studio 多模式挑戰賽的組織者表示深深的感謝,感謝他們為我提供了嘗試尖端人工智慧技術的獨特機會。
特別感謝:
Google AI Studio 團隊打造的直覺式平台,讓複雜的技術變得觸手可及
Gemini Live Audio API 開發者為革命性的即時語音互動技術
Dev.to 社群為創新專案提供了鼓舞人心的平台
該專案得益於 Google AI 建立的尖端工俱生態系統和支持性開發者社群。
使用 React、TypeScript、Firebase、Google Calendar API 和 Google Gemini 多模式功能開發的 MVP
由 Premananda 傾情打造
原文出處:https://dev.to/prema_ananda/building-a-hands-free-ai-fitness-applet-with-gemini-live-api-3fg1