About Ray Serve
Ray Serve is a scalable, framework-agnostic model serving library built on Ray for building online inference APIs, with features tailored to large-model serving and model composition.
Key Features
- Framework-agnostic serving — serve models from PyTorch, TensorFlow, scikit-learn or arbitrary Python code.
- Response streaming and dynamic request batching for LLM workloads.
- Multi-node / multi-GPU serving with flexible scheduling (fractional GPUs, autoscaling).
- Built for model composition — combine multiple models and business logic in a single Python app.
Use Cases & Best For
About Model Serving & APIs
Deploy and serve ML models