Tag: Model Optimization
-
The Evolution of LLM Serving: Modern Architectures and Framework Selection
Explore the latest LLM-serving frameworks, including vLLM, Triton, SGLang, LangChain, Haystack, and more. Learn how paged attention, quantization, and orchestration optimize AI inference, and discover the best framework for your use case with performance benchmarks and trade-offs.
How DeepSeek-R1 Was Built: Architecture and Training Explained
Explore the DeepSeek-R1 Architecture and Training Process, from its Mixture of Experts (MoE) design to its reinforcement learning-based training. Learn how its expert routing, parallelization strategy, and optimization techniques enable high-performance AI at reduced computational costs.