A second AI cursor 🖱️ for your desktop that can see your screen, hear you speak, and talk to you.
Powered by Google's Gemini 2.0 Flash (Experimental) model, the Multimodal Live API, Pointing, and Function calling capabilities.
Created by @13point5.
- 🖱️ Second AI cursor on your desktop
- 🚀 Multimodality: The model can see 📸, hear 🎤, and speak 🔊
- ⚡️ Real-time with low latency
- 📚 Understanding complex diagrams in Research papers, Architecture diagrams, etc
- 🌐 Navigating complex websites to perform a task like adding a payment method on Amazon
- 📝 Real time AI tutor with whiteboards
- Frontend: Electron, React, TypeScript, Vite
- AI: Google Gemini API
- A lot of code from the Gemini Multimodal Live API Web console
- Built using Google's Multimodal Live API
- Node.js (v16 or higher)
- npm
- Gemini API key
- Clone the repository
git clone https://github.com/13point5/gemini-cursor.git
cd gemini-cursor
- Install dependencies
npm install
- Run the app
npm run start
-
Enter the Gemini API key in the app
-
Click the Play button and the Share Screen button
-
Minimize the app and enjoy!