1. POST /vision/analyze
    • Request payload:
      • model_id (string, optional): which model to run (e.g. “yolo‐tiny-v4”).
      • threshold (float, optional): minimum confidence for detections.
      • image (binary file, multipart/form-data) OR image_base64 (string).
    • Response payload (JSON): json{ "model_id": "yolo-tiny-v4", "detections": [ { "label": "person", "confidence": 0.82, "bbox": [x, y, width, height] }, { "label": "dog", "confidence": 0.64, "bbox": [x, y, width, height] } ], "processing_time_ms": 125 }
    • Notes:
      • Don’t bake in how the model runs—just define the contract.
      • If no model_id is provided, fall back to a default “demo” model (e.g. basic motion‐detection or edge‐detection).
  2. GET /vision/models
    • Returns a list of registered models, e.g.: json[ { "model_id": "yolo-tiny-v4", "description": "YOLOv4 Tiny (party‐mode)" }, { "model_id": "mobilenet_ssd_v2", "description": "MobileNet‐SSD v2 (lightweight)" }, { "model_id": "face-detect-v1", "description": "Simple Haar Cascade face detector" } ]
    • This lets devs know which models are available in the OTR environment.
  3. POST /vision/models (optional, if you want to let developers upload their own models)
    • Request (multipart/form-data):
      • model_file (binary blob, e.g. a .tflite or .onnx)
      • model_id (string)
      • meta (JSON, e.g. input size, framework, description)
    • Response (JSON): confirmation that model_id is now registered.
  4. DELETE /vision/models/{model_id}
    • Removes a model from the registry (only if you choose to host user‐uploaded models).

2. “Reference Implementation” in Python and/or Node.js that:

  1. Captures or loads an image/frame.
  2. Knows how to call /vision/analyze.
  3. Parses the JSON results and exposes a simple API (e.g. detect_objects(image_path)).

Below is a Python helper example. .modify it for Raspberry Pi (e.g. use picamera to grab a frame) or for desktop (OpenCV):

python# otr_vision.py
import requests
import base64

class OTRVisionClient:
def __init__(self, base_url, api_key=None):
self.base_url = base_url.rstrip('/')
self.headers = {}
if api_key:
self.headers['Authorization'] = f"Bearer {api_key}"

def list_models(self):
resp = requests.get(f"{self.base_url}/vision/models", headers=self.headers)
resp.raise_for_status()
return resp.json()

def analyze_image(self, image_path=None, image_bytes=None, model_id=None, threshold=0.5):
"""
Either image_path or image_bytes must be provided.
Returns: dict with detections.
"""
if image_path:
with open(image_path, 'rb') as f:
img_data = f.read()
elif image_bytes:
img_data = image_bytes
else:
raise ValueError("Provide either image_path or image_bytes.")

# Build multipart form
files = {
'image': ('frame.jpg', img_data, 'application/octet-stream')
}
data = {
'threshold': threshold
}
if model_id:
data['model_id'] = model_id

resp = requests.post(f"{self.base_url}/vision/analyze",
files=files,
data=data,
headers=self.headers)
resp.raise_for_status()
return resp.json()

# Example usage:
if __name__ == "__main__":
client = OTRVisionClient("http://localhost:8000", api_key="YOUR_API_KEY")
print("Available models:", client.list_models())

# Analyze a local JPEG
results = client.analyze_image(image_path="test.jpg", model_id="yolo-tiny-v4", threshold=0.6)
for det in results["detections"]:
print(f"Detected {det['label']} ({det['confidence']:.2f}) at {det['bbox']}")

3. Example CV Models & Docker Compose Recipe

Since hardware varies (Raspberry Pi vs. x86 Linux vs. cloud), provide at least one “demo” model out of the box:

  • Model A: MobileNet‐SSD v2 (TFLite file, ~4 MB)
    • Good for Pi Zero/3/4; CPU‐only.
    • Can detect person, dog, cat, etc.
  • Model B: YOLOv5 Nano (ONNX + a tiny runtime)
    • For slightly beefier CPUs (e.g. Pi 4 4 GB or x86), can give better multi‐class detection.

yaml# docker-compose.yml (reference)
version: '3'
services:
vision:
image: yourregistry/otr-vision-demo:latest
ports:
- "8000:8000"
volumes:
- ./models:/app/models
environment:
- API_KEY=supersecretapikey

Inside that container, run a simple FastAPI app (example in /app/main.py):

python# main.py (inside Docker image)
import uvicorn
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
import numpy as np
import cv2
import tflite_runtime.interpreter as tflite # or use onnxruntime

app = FastAPI()
MODELS = {}

def load_tflite_model(model_path):
interpreter = tflite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
return interpreter

# On startup, load both demo models
@app.on_event("startup")
async def startup_event():
MODELS['mobilenet_ssd_v2'] = load_tflite_model("/app/models/mobilenet_ssd_v2.tflite")
# If you want to support ONNX:
# import onnxruntime
# MODELS['yolo_nano'] = onnxruntime.InferenceSession("/app/models/yolo_nano.onnx")

def run_tflite_detection(interpreter, img_bytes, threshold):
# Convert bytes → NumPy → resize, run inference → parse boxes.
nparr = np.frombuffer(img_bytes, np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
h, w = img.shape[:2]

# Preprocess to 300x300 for MobileNet-SSD:
blob = cv2.resize(img, (300, 300))
blob = np.expand_dims(blob, axis=0)
blob = blob.astype(np.float32) / 127.5 - 1.0

input_index = interpreter.get_input_details()[0]["index"]
interpreter.set_tensor(input_index, blob)
interpreter.invoke()

# Postprocess: get boxes & classes
boxes = interpreter.get_tensor(interpreter.get_output_details()[0]["index"])[0] # shape: [N,4]
classes = interpreter.get_tensor(interpreter.get_output_details()[1]["index"])[0] # shape: [N]
scores = interpreter.get_tensor(interpreter.get_output_details()[2]["index"])[0] # shape: [N]

detections = []
for i in range(len(scores)):
if scores[i] >= threshold:
ymin, xmin, ymax, xmax = boxes[i]
detections.append({
"label": str(int(classes[i])),
"confidence": float(scores[i]),
# Convert normalized coords → pixel coords
"bbox": [
int(xmin * w),
int(ymin * h),
int((xmax - xmin) * w),
int((ymax - ymin) * h)
]
})
return detections

@app.post("/vision/analyze")
async def analyze(
model_id: str = Form(None),
threshold: float = Form(0.5),
image: UploadFile = File(...)
):
img_bytes = await image.read()
if not model_id:
model_id = 'mobilenet_ssd_v2'
if model_id not in MODELS:
raise HTTPException(status_code=404, detail="Model not found")

interpreter = MODELS[model_id]
detections = run_tflite_detection(interpreter, img_bytes, threshold)
return {
"model_id": model_id,
"detections": detections,
"processing_time_ms": 0 # (you can measure elapsed time)
}

@app.get("/vision/models")
async def list_models():
return [{"model_id": mid, "description": "demo model"} for mid in MODELS.keys()]

if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
  1. Out‐of‐the‐Box
    • A running /vision/analyze server with two demo models.
    • Can docker-compose up and immediately curl -F image=@test.jpg localhost:8000/vision/analyze.
    • They get back a JSON list of bounding boxes and class IDs.
  2. Extend It
    • Upload your own .tflite or .onnx via POST /vision/models (if you enable that).
    • Swap in a PyTorch model in their own build.
    • Point the Python helper (otr_vision.py) at your hosted OTR Vision endpoint.

4. Sample Client Code for Pi/Embedded Use

Since many developers will run on a Raspberry Pi (no GPU by default), here a snippet that shows how to grab a frame from a Pi camera, send it to OTR’s vision API, and interpret the response:

python# pi_cv_sample.py
import time
import io
from picamera import PiCamera
from otr_vision import OTRVisionClient

def main():
camera = PiCamera()
camera.resolution = (640, 480)
camera.framerate = 24
stream = io.BytesIO()

client = OTRVisionClient("http://otr-vision.local:8000", api_key="dev-key")
print("Available models:", client.list_models())

# Warm up camera
time.sleep(2)

try:
# Capture a single frame
camera.capture(stream, format='jpeg')
stream.seek(0)
img_bytes = stream.read()

# Send it to the server
res = client.analyze_image(image_bytes=img_bytes, model_id="mobilenet_ssd_v2", threshold=0.6)

print("Detections:")
for d in res["detections"]:
label = d["label"]
conf = d["confidence"]
bbox = d["bbox"] # [x, y, w, h]
print(f" • {label} @ {bbox} with {conf:.2f}")

finally:
camera.close()

if __name__ == "__main__":
main()
  1. Dependencies:
    • pip install picamera otr‐vision‐client (if you package otr_vision.py as a pip module)
    • Or simply drop the .py files into the Pi’s filesystem and python3 pi_cv_sample.py.
  2. Outcome:
    • Immediately see how to integrate OTR’s Vision API in a “field” environment.
    • They can swap out the model (e.g. use a “vehicle detection” model), tweak thresholds, and route their own alerts (e.g. “if bounding‐box area > X, trigger payload”).

5. Example Use Case

  1. “Sample Models Gallery”
    • Link to the MobileNet‐SSD v2 TFLite file (host on IPFS or S3).
    • Link to a couple of ONNX demos (e.g. YOLO‐Nano, tiny object detectors).
    • Provide a short description of each: “Good for Pi Zero, detects people/dogs/cats” vs. “Tiny YOLO, detects more classes at ~10 FPS on Pi 4”.
  2. “How to Plug Your Own Model”
    • If you support POST /vision/models, include a multipart curl example: bashcurl -X POST http://otr-vision.local:8000/vision/models \ -F "model_id=my-custom-v1" \ -F "meta={\"framework\":\"tflite\",\"input_size\":300}" \ -F "model_file=@./my_custom_model.tflite"
    • Then GET /vision/models to confirm.
    • Finally, POST /vision/analyze -F image=@test.jpg -F model_id=my-custom-v1.
  3. “Best Practices & Hardware Tips”
    • Note that large CV models (>20 MB) may be sluggish on Pi 3; recommend smaller TFLite quantized models.
    • Suggest optional Coral USB Edge TPU or NCS2 “accelerator” if they need >10 FPS.
    • Show environment variables for batch sizes, concurrency limits, and how to scale if they put CV in your cloud.