What is Speculative Sampling? | Boosting LLM inference speed
Machine-readable: Markdown · JSON API · Site index
Описание видео
Speculative Sampling is a decoding strategy that yields 2-3x speedups in LLM inference by generating multiple tokens per model pass and, most importantly, without changes to the final output. Learn what is speculative sampling and how it works in this video explainer.
References:
[1] Create Your Own AI Agent (tutorial) https://youtu.be/Q7KhrSbEnSQ
[2] Typical sampling (video) https://youtu.be/a-6hVvU1WMk?t=423
[3] Google Research (paper) https://arxiv.org/pdf/2211.17192
[4] DeepMind (paper) https://arxiv.org/pdf/2302.01318
[5] What is rejection sampling https://en.wikipedia.org/wiki/Rejection_sampling
Video sections:
00:00 Speeding up LLM inference
01:27 What is speculative sampling
02:37 How speculative sampling works
03:52 Inference speed analysis
05:14 Preserving output quality
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: https://www.assemblyai.com
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers
🔑 Get your AssemblyAI API key here: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_marco_4
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning #chatgpt