A comprehensive benchmark comparing REST, gRPC, and GraphQL for serving machine learning models
When deploying machine learning models in production, choosing the right API protocol can significantly impact performance, scalability, and developer experience. This project aims to provide data-driven insights to help teams make informed decisions.
Current Status: This benchmark compares three popular API protocols (REST, gRPC, and GraphQL) for serving machine learning models, specifically text embeddings using the SentenceTransformer model.
Note: All implementations currently use HTTP/2. HTTP/1.1 comparison is planned for future work.
Key Finding: gRPC demonstrates significantly lower latency (~30x faster) and higher throughput (~3x) compared to REST and GraphQL for this ML serving use case.
All implementations use HTTP/2 and the same underlying ML model (SentenceTransformer all-MiniLM-L6-v2) to ensure fair comparison.
git clone https://github.com/ranjanarajendran/ml-serving-comparison.git
docker-compose up -d
./run_tests.sh
results/charts/