About TensorFlow Serving
TensorFlow Serving is a flexible, high-performance serving system designed for production that provides versioned model serving, gRPC/HTTP endpoints, and batching for TensorFlow (and extendable to other model types).
Key Features
- High-performance model server (tensorflow_model_server) with gRPC and REST APIs.
- Versioned model management with hot-swapping and A/B/canary deployment support.
- Configurable batching and low-latency inference optimizations.
- Extensible to serve non-TensorFlow models via custom servables.
Use Cases & Best For
About Model Serving & APIs
Deploy and serve ML models