Gemini Multimodal Tool
Use the ai-gem CLI tool for multimodal AI processing and image generation via Google's Gemini API.
Usage
# Text queries
ai-gem "Write a haiku about Python programming"
# Analyze documents
ai-gem "Summarize this document" document.pdf
# Analyze images
ai-gem "What's in this image?" photo.jpg
# Process YouTube videos
ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID"
# Compare multiple files
ai-gem "Compare these files" file1.pdf file2.png
# Web search
ai-gem "Current AI news" --search
# Generate images (uses Nano Banana Pro by default)
ai-gem --image "A cute robot reading a book in a cozy library"
ai-gem --image "A landscape at sunset" --aspect-ratio 16:9
ai-gem --image "A cat wearing a hat" -o cat.png
ai-gem --image "Edit this to add sunglasses" reference.jpg
# Use alternative image model
ai-gem --image "A blue triangle" -m gemini-2.5-flash-image
Image Generation Options
--image/-i: Generate an image instead of text--output/-o: Output file path (auto-generated if omitted)--aspect-ratio/-a: Aspect ratio (1:1, 9:16, 16:9, etc.)--model/-m: Override model (default: nano-banana-pro-preview)- Attachments serve as reference images for editing
Requirements
GEMINI_API_KEYenvironment variable must be set- The
hamelpackage must be installed:pip install hamel
Supported Input Types
- PDFs
- Images (PNG, JPEG, GIF, WebP)
- Videos (MP4, etc.)
- YouTube URLs
- Plain text files
- Multiple files for comparison