POST /vision/analyze
- Request payload:
model_id
(string, optional): which model to run (e.g. “yolo‐tiny-v4”).threshold
(float, optional): minimum confidence for detections.image
(binary file, multipart/form-data) ORimage_base64
(string).
- Response payload (JSON): json
{ "model_id": "yolo-tiny-v4", "detections": [ { "label": "person", "confidence": 0.82, "bbox": [x, y, width, height] }, { "label": "dog", "confidence": 0.64, "bbox": [x, y, width, height] } ], "processing_time_ms": 125 }
- Notes:
- Don’t bake in how the model runs—just define the contract.
- If no
model_id
is provided, fall back to a default “demo” model (e.g. basic motion‐detection or edge‐detection).
- Request payload:
GET /vision/models
- Returns a list of registered models, e.g.: json
[ { "model_id": "yolo-tiny-v4", "description": "YOLOv4 Tiny (party‐mode)" }, { "model_id": "mobilenet_ssd_v2", "description": "MobileNet‐SSD v2 (lightweight)" }, { "model_id": "face-detect-v1", "description": "Simple Haar Cascade face detector" } ]
- This lets devs know which models are available in the OTR environment.
- Returns a list of registered models, e.g.: json
POST /vision/models
(optional, if you want to let developers upload their own models)- Request (multipart/form-data):
model_file
(binary blob, e.g. a.tflite
or.onnx
)model_id
(string)meta
(JSON, e.g. input size, framework, description)
- Response (JSON): confirmation that
model_id
is now registered.
- Request (multipart/form-data):
DELETE /vision/models/{model_id}
- Removes a model from the registry (only if you choose to host user‐uploaded models).
2. “Reference Implementation” in Python and/or Node.js that:
- Captures or loads an image/frame.
- Knows how to call
/vision/analyze
. - Parses the JSON results and exposes a simple API (e.g.
detect_objects(image_path)
).
Below is a Python helper example. .modify it for Raspberry Pi (e.g. use picamera
to grab a frame) or for desktop (OpenCV):
python# otr_vision.py
import requests
import base64
class OTRVisionClient:
def __init__(self, base_url, api_key=None):
self.base_url = base_url.rstrip('/')
self.headers = {}
if api_key:
self.headers['Authorization'] = f"Bearer {api_key}"
def list_models(self):
resp = requests.get(f"{self.base_url}/vision/models", headers=self.headers)
resp.raise_for_status()
return resp.json()
def analyze_image(self, image_path=None, image_bytes=None, model_id=None, threshold=0.5):
"""
Either image_path or image_bytes must be provided.
Returns: dict with detections.
"""
if image_path:
with open(image_path, 'rb') as f:
img_data = f.read()
elif image_bytes:
img_data = image_bytes
else:
raise ValueError("Provide either image_path or image_bytes.")
# Build multipart form
files = {
'image': ('frame.jpg', img_data, 'application/octet-stream')
}
data = {
'threshold': threshold
}
if model_id:
data['model_id'] = model_id
resp = requests.post(f"{self.base_url}/vision/analyze",
files=files,
data=data,
headers=self.headers)
resp.raise_for_status()
return resp.json()
# Example usage:
if __name__ == "__main__":
client = OTRVisionClient("http://localhost:8000", api_key="YOUR_API_KEY")
print("Available models:", client.list_models())
# Analyze a local JPEG
results = client.analyze_image(image_path="test.jpg", model_id="yolo-tiny-v4", threshold=0.6)
for det in results["detections"]:
print(f"Detected {det['label']} ({det['confidence']:.2f}) at {det['bbox']}")
3. Example CV Models & Docker Compose Recipe
Since hardware varies (Raspberry Pi vs. x86 Linux vs. cloud), provide at least one “demo” model out of the box:
- Model A: MobileNet‐SSD v2 (TFLite file, ~4 MB)
- Good for Pi Zero/3/4; CPU‐only.
- Can detect person, dog, cat, etc.
- Model B: YOLOv5 Nano (ONNX + a tiny runtime)
- For slightly beefier CPUs (e.g. Pi 4 4 GB or x86), can give better multi‐class detection.
yaml# docker-compose.yml (reference)
version: '3'
services:
vision:
image: yourregistry/otr-vision-demo:latest
ports:
- "8000:8000"
volumes:
- ./models:/app/models
environment:
- API_KEY=supersecretapikey
Inside that container, run a simple FastAPI app (example in /app/main.py
):
python# main.py (inside Docker image)
import uvicorn
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
import numpy as np
import cv2
import tflite_runtime.interpreter as tflite # or use onnxruntime
app = FastAPI()
MODELS = {}
def load_tflite_model(model_path):
interpreter = tflite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
return interpreter
# On startup, load both demo models
@app.on_event("startup")
async def startup_event():
MODELS['mobilenet_ssd_v2'] = load_tflite_model("/app/models/mobilenet_ssd_v2.tflite")
# If you want to support ONNX:
# import onnxruntime
# MODELS['yolo_nano'] = onnxruntime.InferenceSession("/app/models/yolo_nano.onnx")
def run_tflite_detection(interpreter, img_bytes, threshold):
# Convert bytes → NumPy → resize, run inference → parse boxes.
nparr = np.frombuffer(img_bytes, np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
h, w = img.shape[:2]
# Preprocess to 300x300 for MobileNet-SSD:
blob = cv2.resize(img, (300, 300))
blob = np.expand_dims(blob, axis=0)
blob = blob.astype(np.float32) / 127.5 - 1.0
input_index = interpreter.get_input_details()[0]["index"]
interpreter.set_tensor(input_index, blob)
interpreter.invoke()
# Postprocess: get boxes & classes
boxes = interpreter.get_tensor(interpreter.get_output_details()[0]["index"])[0] # shape: [N,4]
classes = interpreter.get_tensor(interpreter.get_output_details()[1]["index"])[0] # shape: [N]
scores = interpreter.get_tensor(interpreter.get_output_details()[2]["index"])[0] # shape: [N]
detections = []
for i in range(len(scores)):
if scores[i] >= threshold:
ymin, xmin, ymax, xmax = boxes[i]
detections.append({
"label": str(int(classes[i])),
"confidence": float(scores[i]),
# Convert normalized coords → pixel coords
"bbox": [
int(xmin * w),
int(ymin * h),
int((xmax - xmin) * w),
int((ymax - ymin) * h)
]
})
return detections
@app.post("/vision/analyze")
async def analyze(
model_id: str = Form(None),
threshold: float = Form(0.5),
image: UploadFile = File(...)
):
img_bytes = await image.read()
if not model_id:
model_id = 'mobilenet_ssd_v2'
if model_id not in MODELS:
raise HTTPException(status_code=404, detail="Model not found")
interpreter = MODELS[model_id]
detections = run_tflite_detection(interpreter, img_bytes, threshold)
return {
"model_id": model_id,
"detections": detections,
"processing_time_ms": 0 # (you can measure elapsed time)
}
@app.get("/vision/models")
async def list_models():
return [{"model_id": mid, "description": "demo model"} for mid in MODELS.keys()]
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
- Out‐of‐the‐Box
- A running
/vision/analyze
server with two demo models. - Can
docker-compose up
and immediatelycurl -F image=@test.jpg localhost:8000/vision/analyze
. - They get back a JSON list of bounding boxes and class IDs.
- A running
- Extend It
- Upload your own
.tflite
or.onnx
viaPOST /vision/models
(if you enable that). - Swap in a PyTorch model in their own build.
- Point the Python helper (
otr_vision.py
) at your hosted OTR Vision endpoint.
- Upload your own
4. Sample Client Code for Pi/Embedded Use
Since many developers will run on a Raspberry Pi (no GPU by default), here a snippet that shows how to grab a frame from a Pi camera, send it to OTR’s vision API, and interpret the response:
python# pi_cv_sample.py
import time
import io
from picamera import PiCamera
from otr_vision import OTRVisionClient
def main():
camera = PiCamera()
camera.resolution = (640, 480)
camera.framerate = 24
stream = io.BytesIO()
client = OTRVisionClient("http://otr-vision.local:8000", api_key="dev-key")
print("Available models:", client.list_models())
# Warm up camera
time.sleep(2)
try:
# Capture a single frame
camera.capture(stream, format='jpeg')
stream.seek(0)
img_bytes = stream.read()
# Send it to the server
res = client.analyze_image(image_bytes=img_bytes, model_id="mobilenet_ssd_v2", threshold=0.6)
print("Detections:")
for d in res["detections"]:
label = d["label"]
conf = d["confidence"]
bbox = d["bbox"] # [x, y, w, h]
print(f" • {label} @ {bbox} with {conf:.2f}")
finally:
camera.close()
if __name__ == "__main__":
main()
- Dependencies:
pip install picamera otr‐vision‐client
(if you packageotr_vision.py
as a pip module)- Or simply drop the
.py
files into the Pi’s filesystem andpython3 pi_cv_sample.py
.
- Outcome:
- Immediately see how to integrate OTR’s Vision API in a “field” environment.
- They can swap out the model (e.g. use a “vehicle detection” model), tweak thresholds, and route their own alerts (e.g. “if bounding‐box area > X, trigger payload”).
5. Example Use Case
- “Sample Models Gallery”
- Link to the MobileNet‐SSD v2 TFLite file (host on IPFS or S3).
- Link to a couple of ONNX demos (e.g. YOLO‐Nano, tiny object detectors).
- Provide a short description of each: “Good for Pi Zero, detects people/dogs/cats” vs. “Tiny YOLO, detects more classes at ~10 FPS on Pi 4”.
- “How to Plug Your Own Model”
- If you support
POST /vision/models
, include a multipart curl example: bashcurl -X POST http://otr-vision.local:8000/vision/models \ -F "model_id=my-custom-v1" \ -F "meta={\"framework\":\"tflite\",\"input_size\":300}" \ -F "model_file=@./my_custom_model.tflite"
- Then
GET /vision/models
to confirm. - Finally,
POST /vision/analyze -F image=@test.jpg -F model_id=my-custom-v1
.
- If you support
- “Best Practices & Hardware Tips”
- Note that large CV models (>20 MB) may be sluggish on Pi 3; recommend smaller TFLite quantized models.
- Suggest optional Coral USB Edge TPU or NCS2 “accelerator” if they need >10 FPS.
- Show environment variables for batch sizes, concurrency limits, and how to scale if they put CV in your cloud.