Prerequisites
- Python 3.9-3.12
- pip package manager
- An image file to test with (any common format: JPG, PNG, etc.)
Step 1: Install Mozo
1
Install via pip
Verify installation:
Step 2: Start the Server
1
Start Mozo server
http://localhost:8000 with all 35+ models ready to use.You should see:
Models load on first access (lazy loading), not at startup. The server starts in seconds regardless of how many models are available.
Step 3: Make Your First Prediction
Choose your preferred method:- cURL
- Python
- JavaScript
Success! You’ve made your first prediction. The model detected objects in your image and returned bounding boxes, class names, confidences, and segmentation masks.
Alternative: Python SDK (No Server)
You can also use Mozo directly in Python without starting the HTTP server. This is ideal for embedding in applications or when you only need Python access.- Object Detection
- Text Recognition
- Depth Estimation
- Visual Q&A
REST API vs Python SDK:
- REST API (
mozo start): Use when you need HTTP access, multi-language support, or want all models instantly available - Python SDK (
ModelManager): Use when embedding in Python apps, avoiding HTTP overhead, or needing direct memory access
What Just Happened?
- Mozo received your request and identified which model to use (
detectron2/mask_rcnn_R_50_FPN_3x) - The model loaded (takes a few seconds on first access, then cached)
- Inference ran on your image
- Results returned in a unified JSON format
Try Other Models
Now that the server is running, try different models:Object Detection (Faster R-CNN)
Text Recognition (EasyOCR)
Depth Estimation
Visual Question Answering
Explore Available Models
List all model families:Server Options
Customize server behavior:Next Steps
Choosing a Model
Learn which model to use for different tasks
REST API Reference
Explore all API endpoints and options
Model Families
See detailed documentation for each model family
Troubleshooting
Model loading is slow on first request
Model loading is slow on first request
This is expected behavior. Models load on first access and can take several seconds depending on model size. Subsequent requests to the same model are fast (cached in memory).
Out of memory error
Out of memory error
Some models (like Qwen2.5-VL 7B) require significant RAM (16GB+). Either:
- Use a smaller variant
- Manually unload other models first
- Enable aggressive cleanup:
curl -X POST "http://localhost:8000/models/cleanup?inactive_seconds=60"
ImportError: detectron2 not installed
ImportError: detectron2 not installed
Detectron2 requires platform-specific installation. See Detectron2 installation guide.
MPS/GPU issues on macOS
MPS/GPU issues on macOS
Some models have limited MPS support. If you encounter errors, the server automatically falls back to CPU (set via
PYTORCH_ENABLE_MPS_FALLBACK=1).