AI engineer based in London who loves playing and watching basketball, enjoying a good sauna, and hunting for cafés and pastries around London.
Longer form thoughts and reflections.
Deep dive into hosting open-source LLMs with vLLM on Kubernetes. Personal takeaways from the Cast.ai workshop covering GPU optimization, KV cache mechanics, and production deployment strategies for enterprise AI applications....