Scalable API for CPU/GPU intensive tasks like ML inference Report

- Akshat Kalra (UBC)

1. Approach to Solving the Problem

Using Flask is not a good idea since it operates on WSGI (Web Server Gateway Interface)
Which essentially blocks the process until the first one is completed.
Which means the endpoint cannot take requests until the previous requests have been served.
FastAPI is an improvement to this since it operates on ASGI (Asynchronous server gateway interface)
Which means that the endpoints will be able to take requests while other requests are in progress, but the next request will only be served after the previous one has been served completely.
Also it is non-blocking only for I/O tasks, but Machine learning tasks are CPU intensive.
So simply using FastAPI endpoint would also not cut it.

My solution/approach:

Celery + Redis + FastAPI for Asynchronous Processing: To handle the CPU-intensive machine learning tasks without blocking incoming requests, I offload image processing to background tasks using Celery with Redis as the message broker.
- This allows the API to immediately return a task ID while the heavy computation is handled asynchronously by dedicated worker processes.
- The client can then poll for task completion via a dedicated status endpoint.
Scalability Considerations:
- Vertical Scaling: Increase the number of worker processes on the same machine by adjusting the Celery worker concurrency level.
- Horizontal Scaling: Deploy multiple instances of the FastAPI server and Celery workers using Docker Compose behind a load balancer to distribute incoming requests across several containers.
Client Testing: I developed a concurrent client script that send multiple batches of images (each batch consisting of 100 images) to simulate high load (eg. if you set number of batches to be 10, then it will send 10 batches of 1000 images which would be 1000 images) and measure processing times. This allows me to compare the performance of the simple FastAPI endpoint versus the Celery+Redis based approach.
Image processing workflow (Similar to the one in assignment PDF, altered according to my specific implementation)


Image Input (Batch of 100)
      ↓
Client sends batch images to FastAPI endpoint
      ↓
Model Inference (yolov8n-seg model for instance segmentation)
      ↓
Segmentation & Food Detection (model identifies food items)
      ↓
Area Calculation (Each segmentation mask’s pixel count is computed to derive the item’s area.)
      ↓
Results Output (Aggregated JSON response: Food Item + Area)

- Akshat Kalra (UBC)

1. Approach to Solving the Problem

My solution/approach:

2. Scalability Strategy

2.1 Vertical Scaling