Computer Vision Api – Over The Radio (OTR) -A Free Communications Protocol -DePin

POST /vision/analyze
- Request payload:
  - model_id (string, optional): which model to run (e.g. “yolo‐tiny-v4”).
  - threshold (float, optional): minimum confidence for detections.
  - image (binary file, multipart/form-data) OR image_base64 (string).
- Response payload (JSON): json{ "model_id": "yolo-tiny-v4", "detections": [ { "label": "person", "confidence": 0.82, "bbox": [x, y, width, height] }, { "label": "dog", "confidence": 0.64, "bbox": [x, y, width, height] } ], "processing_time_ms": 125 }
- Notes:
  - Don’t bake in how the model runs—just define the contract.
  - If no model_id is provided, fall back to a default “demo” model (e.g. basic motion‐detection or edge‐detection).
GET /vision/models
- Returns a list of registered models, e.g.: json[ { "model_id": "yolo-tiny-v4", "description": "YOLOv4 Tiny (party‐mode)" }, { "model_id": "mobilenet_ssd_v2", "description": "MobileNet‐SSD v2 (lightweight)" }, { "model_id": "face-detect-v1", "description": "Simple Haar Cascade face detector" } ]
- This lets devs know which models are available in the OTR environment.
POST /vision/models (optional, if you want to let developers upload their own models)
- Request (multipart/form-data):
  - model_file (binary blob, e.g. a .tflite or .onnx)
  - model_id (string)
  - meta (JSON, e.g. input size, framework, description)
- Response (JSON): confirmation that model_id is now registered.
DELETE /vision/models/{model_id}
- Removes a model from the registry (only if you choose to host user‐uploaded models).

2. “Reference Implementation” in Python and/or Node.js that:

Captures or loads an image/frame.
Knows how to call /vision/analyze.
Parses the JSON results and exposes a simple API (e.g. detect_objects(image_path)).

Below is a Python helper example. .modify it for Raspberry Pi (e.g. use picamera to grab a frame) or for desktop (OpenCV):

python# otr_vision.py
import requests
import base64

class OTRVisionClient:
    def __init__(self, base_url, api_key=None):
        self.base_url = base_url.rstrip('/')
        self.headers = {}
        if api_key:
            self.headers['Authorization'] = f"Bearer {api_key}"

    def list_models(self):
        resp = requests.get(f"{self.base_url}/vision/models", headers=self.headers)
        resp.raise_for_status()
        return resp.json()

    def analyze_image(self, image_path=None, image_bytes=None, model_id=None, threshold=0.5):
        """
        Either image_path or image_bytes must be provided.
        Returns: dict with detections.
        """
        if image_path:
            with open(image_path, 'rb') as f:
                img_data = f.read()
        elif image_bytes:
            img_data = image_bytes
        else:
            raise ValueError("Provide either image_path or image_bytes.")

        # Build multipart form
        files = {
            'image': ('frame.jpg', img_data, 'application/octet-stream')
        }
        data = {
            'threshold': threshold
        }
        if model_id:
            data['model_id'] = model_id

        resp = requests.post(f"{self.base_url}/vision/analyze",
                             files=files,
                             data=data,
                             headers=self.headers)
        resp.raise_for_status()
        return resp.json()

# Example usage:
if __name__ == "__main__":
    client = OTRVisionClient("http://localhost:8000", api_key="YOUR_API_KEY")
    print("Available models:", client.list_models())

    # Analyze a local JPEG
    results = client.analyze_image(image_path="test.jpg", model_id="yolo-tiny-v4", threshold=0.6)
    for det in results["detections"]:
        print(f"Detected {det['label']} ({det['confidence']:.2f}) at {det['bbox']}")

3. Example CV Models & Docker Compose Recipe

Since hardware varies (Raspberry Pi vs. x86 Linux vs. cloud), provide at least one “demo” model out of the box:

Model A: MobileNet‐SSD v2 (TFLite file, ~4 MB)
- Good for Pi Zero/3/4; CPU‐only.
- Can detect person, dog, cat, etc.
Model B: YOLOv5 Nano (ONNX + a tiny runtime)
- For slightly beefier CPUs (e.g. Pi 4 4 GB or x86), can give better multi‐class detection.

yaml# docker-compose.yml (reference)
version: '3'
services:
  vision:
    image: yourregistry/otr-vision-demo:latest
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models
    environment:
      - API_KEY=supersecretapikey

Inside that container, run a simple FastAPI app (example in /app/main.py):

python# main.py (inside Docker image)
import uvicorn
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
import numpy as np
import cv2
import tflite_runtime.interpreter as tflite  # or use onnxruntime

app = FastAPI()
MODELS = {}

def load_tflite_model(model_path):
    interpreter = tflite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()
    return interpreter

# On startup, load both demo models
@app.on_event("startup")
async def startup_event():
    MODELS['mobilenet_ssd_v2'] = load_tflite_model("/app/models/mobilenet_ssd_v2.tflite")
    # If you want to support ONNX:
    # import onnxruntime
    # MODELS['yolo_nano'] = onnxruntime.InferenceSession("/app/models/yolo_nano.onnx")

def run_tflite_detection(interpreter, img_bytes, threshold):
    # Convert bytes → NumPy → resize, run inference → parse boxes.
    nparr = np.frombuffer(img_bytes, np.uint8)
    img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    h, w = img.shape[:2]

    # Preprocess to 300x300 for MobileNet-SSD:
    blob = cv2.resize(img, (300, 300))
    blob = np.expand_dims(blob, axis=0)
    blob = blob.astype(np.float32) / 127.5 - 1.0

    input_index = interpreter.get_input_details()[0]["index"]
    interpreter.set_tensor(input_index, blob)
    interpreter.invoke()

    # Postprocess: get boxes & classes
    boxes = interpreter.get_tensor(interpreter.get_output_details()[0]["index"])[0]  # shape: [N,4]
    classes = interpreter.get_tensor(interpreter.get_output_details()[1]["index"])[0]  # shape: [N]
    scores = interpreter.get_tensor(interpreter.get_output_details()[2]["index"])[0]  # shape: [N]

    detections = []
    for i in range(len(scores)):
        if scores[i] >= threshold:
            ymin, xmin, ymax, xmax = boxes[i]
            detections.append({
                "label": str(int(classes[i])),
                "confidence": float(scores[i]),
                # Convert normalized coords → pixel coords
                "bbox": [
                    int(xmin * w),
                    int(ymin * h),
                    int((xmax - xmin) * w),
                    int((ymax - ymin) * h)
                ]
            })
    return detections

@app.post("/vision/analyze")
async def analyze(
    model_id: str = Form(None),
    threshold: float = Form(0.5),
    image: UploadFile = File(...)
):
    img_bytes = await image.read()
    if not model_id:
        model_id = 'mobilenet_ssd_v2'
    if model_id not in MODELS:
        raise HTTPException(status_code=404, detail="Model not found")

    interpreter = MODELS[model_id]
    detections = run_tflite_detection(interpreter, img_bytes, threshold)
    return {
        "model_id": model_id,
        "detections": detections,
        "processing_time_ms": 0  # (you can measure elapsed time)
    }

@app.get("/vision/models")
async def list_models():
    return [{"model_id": mid, "description": "demo model"} for mid in MODELS.keys()]

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Out‐of‐the‐Box
- A running /vision/analyze server with two demo models.
- Can docker-compose up and immediately curl -F image=@test.jpg localhost:8000/vision/analyze.
- They get back a JSON list of bounding boxes and class IDs.
Extend It
- Upload your own .tflite or .onnx via POST /vision/models (if you enable that).
- Swap in a PyTorch model in their own build.
- Point the Python helper (otr_vision.py) at your hosted OTR Vision endpoint.

4. Sample Client Code for Pi/Embedded Use

Since many developers will run on a Raspberry Pi (no GPU by default), here a snippet that shows how to grab a frame from a Pi camera, send it to OTR’s vision API, and interpret the response:

python# pi_cv_sample.py
import time
import io
from picamera import PiCamera
from otr_vision import OTRVisionClient

def main():
    camera = PiCamera()
    camera.resolution = (640, 480)
    camera.framerate = 24
    stream = io.BytesIO()

    client = OTRVisionClient("http://otr-vision.local:8000", api_key="dev-key")
    print("Available models:", client.list_models())

    # Warm up camera
    time.sleep(2)

    try:
        # Capture a single frame
        camera.capture(stream, format='jpeg')
        stream.seek(0)
        img_bytes = stream.read()

        # Send it to the server
        res = client.analyze_image(image_bytes=img_bytes, model_id="mobilenet_ssd_v2", threshold=0.6)

        print("Detections:")
        for d in res["detections"]:
            label = d["label"]
            conf = d["confidence"]
            bbox = d["bbox"]  # [x, y, w, h]
            print(f" • {label} @ {bbox} with {conf:.2f}")

    finally:
        camera.close()

if __name__ == "__main__":
    main()

Dependencies:
- pip install picamera otr‐vision‐client (if you package otr_vision.py as a pip module)
- Or simply drop the .py files into the Pi’s filesystem and python3 pi_cv_sample.py.
Outcome:
- Immediately see how to integrate OTR’s Vision API in a “field” environment.
- They can swap out the model (e.g. use a “vehicle detection” model), tweak thresholds, and route their own alerts (e.g. “if bounding‐box area > X, trigger payload”).

5. Example Use Case

“Sample Models Gallery”
- Link to the MobileNet‐SSD v2 TFLite file (host on IPFS or S3).
- Link to a couple of ONNX demos (e.g. YOLO‐Nano, tiny object detectors).
- Provide a short description of each: “Good for Pi Zero, detects people/dogs/cats” vs. “Tiny YOLO, detects more classes at ~10 FPS on Pi 4”.
“How to Plug Your Own Model”
- If you support POST /vision/models, include a multipart curl example: bashcurl -X POST http://otr-vision.local:8000/vision/models \ -F "model_id=my-custom-v1" \ -F "meta={\"framework\":\"tflite\",\"input_size\":300}" \ -F "model_file=@./my_custom_model.tflite"
- Then GET /vision/models to confirm.
- Finally, POST /vision/analyze -F image=@test.jpg -F model_id=my-custom-v1.
“Best Practices & Hardware Tips”
- Note that large CV models (>20 MB) may be sluggish on Pi 3; recommend smaller TFLite quantized models.
- Suggest optional Coral USB Edge TPU or NCS2 “accelerator” if they need >10 FPS.
- Show environment variables for batch sizes, concurrency limits, and how to scale if they put CV in your cloud.