This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built an interactive web application that brings the classic drawing and guessing game into the digital age with a modern twist. The app challenges a player to draw a word provided by the game, while a sophisticated generative AI attempts to guess the drawing in near real-time.
This creates a unique and engaging solo-player experience where the user's artistic skills are pitted against the AI's image recognition capabilities. It solves the problem of needing multiple players for a game of Pictionary and provides a fun, interactive way to experience the power of multimodal AI.

Demo

link: https://ai-pictionary-598974168521.us-west1.run.app
{% embed https://youtu.be/imZn1DYhdu4 %}

How I Used Google AI Studio

I leveraged the Gemini API, accessible through the @google/genai SDK, to power the core guessing mechanic of the game. Specifically, I used the gemini-2.5-flash model for its speed and powerful multimodal capabilities.

The implementation involves capturing the user's drawing from the HTML canvas as a PNG image, converting it to a base64 string, and sending it to the Gemini model. This image is sent alongside a carefully crafted text prompt: "What is this a drawing of? Look at the image carefully and provide your best guess in a single word." The model then processes this combined visual and textual input to return its guess as a single word of text. This demonstrates a powerful image-to-text, or visual understanding, use case.

Multimodal Features

The central multimodal feature of this application is visual reasoning and description. The app seamlessly integrates two distinct modalities:

Image Input: The user's free-form drawing on the canvas serves as the primary visual input.

Text Output: The Gemini model analyzes this visual information and generates a textual guess.

<!-- Don't forget to add a cover image (if you want). -->

<!-- Team Submissions: Please pick one member to publish the submission and credit teammates by listing their DEV usernames directly in the body of the post. -->

<!-- Thanks for participating! -->


精選技術文章翻譯,幫助開發者持續吸收新知。

共有 0 則留言


精選技術文章翻譯,幫助開發者持續吸收新知。
🏆 本月排行榜
🥇
站長阿川
📝10   💬6   ❤️11
454
🥈
我愛JS
📝1   💬5   ❤️4
88
🥉
AppleLily
📝1   💬4   ❤️1
47
#4
💬2  
6
#5
💬1  
5
評分標準:發文×10 + 留言×3 + 獲讚×5 + 點讚×1 + 瀏覽數÷10
本數據每小時更新一次