Skip to main content

The Problem

Deploying machine learning models traditionally requires complex infrastructure:
  • Docker containers for consistent environments
  • Kubernetes for orchestration and scaling
  • Cloud services for hosting and model management
  • API gateways for routing and authentication
  • Monitoring systems for health checks and performance
Setting up this infrastructure takes days or weeks. Managing it requires specialized DevOps expertise. Running it costs money even when models sit idle. Most developers just want to test if a model works for their use case. They don’t want to become infrastructure experts first.

The Solution

Mozo is a universal model server that eliminates deployment complexity:
# Install Mozo
pip install mozo

# Start server with 35+ models ready
mozo start

# Make predictions via HTTP
curl -X POST "http://localhost:8000/predict/detectron2/mask_rcnn_R_50_FPN_3x" \
  -F "[email protected]"
That’s it. No Docker, no Kubernetes, no cloud setup. Just install, start, and use.

Key Features

35+ Pre-Configured Models

Object detection, OCR, depth estimation, vision-language models, and more. All variants tested and ready to use.

Zero Deployment Overhead

No Docker, Kubernetes, or cloud services needed. Install with pip, start with one command.

Automatic Memory Management

Models load on first access and automatically unload when inactive. Efficient memory usage without manual intervention.

Unified Output Format

All detection models return PixelFlow Detections format. Write once, work with any model family.

Available Model Families

Mozo includes pre-configured variants for:
  • Detectron2 (27 variants) - Object detection, instance segmentation, keypoint detection
  • EasyOCR (5 variants) - Text recognition with 80+ language support
  • PaddleOCR (5 variants) - PP-OCRv5 for high-accuracy OCR
  • PPStructure (4 variants) - Document analysis with layout and table recognition
  • Florence-2 (8 task variants) - Multi-task vision model
  • Depth Anything (3 variants) - Monocular depth estimation
  • Qwen2.5-VL (1 variant) - Vision-language understanding and VQA
  • Qwen3-VL (1 variant) - VLM with chain-of-thought reasoning
  • BLIP VQA (2 variants) - Visual question answering
  • Stability Inpainting (1 variant) - Image inpainting with Stable Diffusion 2
  • Datamarkin (dynamic variants) - Cloud-based custom model inference

Two Ways to Use Mozo

Start the server and make HTTP requests from any language:
mozo start
import requests

response = requests.post(
    "http://localhost:8000/predict/detectron2/mask_rcnn_R_50_FPN_3x",
    files={"file": open("image.jpg", "rb")}
)
detections = response.json()

Python SDK (Advanced)

Embed Mozo directly in Python applications:
from mozo import ModelManager
import cv2

manager = ModelManager()
model = manager.get_model('detectron2', 'mask_rcnn_R_50_FPN_3x')

image = cv2.imread('image.jpg')
detections = model.predict(image)

print(f"Found {len(detections)} objects")

Next Steps

Why Mozo?

For quick prototyping: Test if a model works for your use case in minutes, not days. For production deployments: Mozo handles model lifecycle, memory management, and concurrent access automatically. For multi-model applications: Switch between models or use multiple models simultaneously without infrastructure changes. For team collaboration: Consistent model serving API means everyone uses the same interface, regardless of underlying framework.
Mozo is designed for development and moderate production workloads. For high-throughput production deployments requiring GPU scaling, consider dedicated model serving solutions like TorchServe or Triton.