Billy Zhao

Personal website and portfolio

Musings

Longer form thoughts and reflections.

vLLM Workshop at MLOps Community Agent World: Hands-On Learning with Cast.ai

Deep dive into hosting open-source LLMs with vLLM on Kubernetes. Personal takeaways from the Cast.ai workshop covering GPU optimization, KV cache mechanics, and production deployment strategies for enterprise AI applications....

vLLM AI Agents Kubernetes LLMOps Cast.ai Open Source LLM MLOps GPU Optimization