Coding Challenge 188: Voice Chatbot
Machine-readable: Markdown · JSON API · Site index
Описание видео
In this coding challenge, I build a conversational voice chatbot entirely in the browser with p5.js. I combine three pieces: speech-to-text with OpenAI's Whisper model, text-to-speech with Kokoro TTS, and a "brain" for the bot. I also explore the transformers.js pipeline API and the Web Audio API. For the bot's brain, I start with a simple ELIZA-style therapist, then incorporate a RiveScript number-guessing game, and finally a local LLM. Code: https://thecodingtrain.com/challenges/188-voice-chatbot
🚀 Watch this video ad-free on Nebula https://nebula.tv/videos/codingtrain-coding-challenge-188-voice-chatbot
p5.js Web Editor Sketches:
🕹️ LLM Chatbot: https://editor.p5js.org/codingtrain/sketches/RHhT9I4Nm
🕹️ Number Guessing Bot: https://editor.p5js.org/codingtrain/sketches/AJw7zMN9q
🕹️ Therapy Bot: https://editor.p5js.org/codingtrain/sketches/37LFEPUVV
🕹️ Model Loading Bars: https://editor.p5js.org/codingtrain/sketches/E9Ob3x8eJ
🕹️ Waveform of Recording: https://editor.p5js.org/codingtrain/sketches/cck49wDub
🕹️ Real Time Waveform: https://editor.p5js.org/codingtrain/sketches/aaRIT-x6a
🎥 Previous: https://youtu.be/g3-PXyF8U70?list=PLRqwX-V7Uu6ZiZxtDDRCi6uhfTH4FilpH
🎥 All: https://www.youtube.com/playlist?list=PLRqwX-V7Uu6ZiZxtDDRCi6uhfTH4FilpH
References:
📓 p5.2 Reference: https://beta.p5js.org
📓 Introducing Whisper: https://cdn.openai.com/papers/whisper.pdf
📓 Model Cards for Model Reporting: https://arxiv.org/abs/1810.03993
📓 Open Neural Network Exchange: https://onnx.ai
📓 Onnx-community Whisper-tiny.en model: https://huggingface.co/onnx-community/whisper-tiny.en
📓 Xenova: https://github.com/xenova
📓 Transformers.js: https://huggingface.co/docs/transformers.js/installation
📓 Announcing the new p5.sound.js library!: https://medium.com/processing-foundation/announcing-the-new-p5-sound-js-library-42efc154bed0
📓 getUserMedia() documentation: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia
📓 MediaRecorder() documentation: https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder
📓 Kokoro Repo: https://github.com/hexgrad/kokoro
📓 KokoroTTS Model: https://huggingface.co/hexgrad/Kokoro-82M
📓 ELIZA: https://en.wikipedia.org/wiki/ELIZA
📓 Rivescript: https://www.rivescript.com
📓 SmolLM3: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
📓 Running models on WebGPU: https://huggingface.co/docs/transformers.js/guides/webgpu
📓 Using quantized models (dtypes): https://huggingface.co/docs/transformers.js/v3.8.1/guides/dtypes
Videos:
🚂 https://youtu.be/0Ad5Frf8NBM
🚂 https://youtu.be/KR61bXsPlLU
Live Stream Archives:
🔴 https://www.youtube.com/watch?v=KRDJAHArqaw
Related Coding Challenges:
🚂 https://youtu.be/eGFJ8vugIWA
🚂 https://youtu.be/8Z9FRiW2Jlc
🚂 https://youtu.be/iFTgphKCP9U
Timestamps:
0:00:00 Hello!
0:00:35 Mapping out the pieces: speech-to-text, text-to-speech, and the brain
0:01:07 Thoughts on AI and creative exploration
0:02:44 Choosing the tools: Whisper and Kokoro TTS
0:04:06 Building a push-to-talk UI in p5.js
0:04:51 Finding models on Hugging Face with Transformers.js
0:05:36 About the Whisper model and model cards
0:06:55 Loading the Whisper pipeline in p5.js
0:09:04 Accessing the microphone with getUserMedia
0:10:44 Capturing audio with MediaRecorder
0:12:05 Processing audio chunks into a waveform
0:15:55 Speech-to-text working!
0:16:36 Building the chatbot brain (ELIZA-style therapist)
0:18:50 Setting up Kokoro TTS for text-to-speech
0:21:07 Playing synthesized audio with AudioBufferSource
0:23:41 Text-to-speech working!
0:25:32 Handling playback events
0:26:56 Swapping in a RiveScript number-guessing brain
0:31:22 Adding a language model (SmolLM2) as the brain
0:38:33 Final demo: the random number chatbot
0:39:03 Goodbye!
Editing by Mathieu Blanchette
Animations by Jason Heglund
Music from Epidemic Sound
🚂 Website: https://thecodingtrain.com/
👾 Share Your Creation! https://thecodingtrain.com/guides/passenger-showcase-guide
🚩 Suggest Topics: https://github.com/CodingTrain/Suggestion-Box
💡 GitHub: https://github.com/CodingTrain
💬 Discord: https://thecodingtrain.com/discord
💖 Membership: http://youtube.com/thecodingtrain/join
🛒 Store: https://standard.tv/codingtrain
🖋️ Twitter: https://twitter.com/thecodingtrain
📸 Instagram: https://www.instagram.com/the.coding.train/
🎥 https://www.youtube.com/playlist?list=PLRqwX-V7Uu6ZiZxtDDRCi6uhfTH4FilpH
🎥 https://www.youtube.com/playlist?list=PLRqwX-V7Uu6Zy51Q-x9tMWIv9cueOFTFA
🔗 p5.js: https://p5js.org
🔗 p5.js Web Editor: https://editor.p5js.org/
🔗 Processing: https://processing.org
📄 Code of Conduct: https://github.com/CodingTrain/Code-of-Conduct
This description was auto-generated. If you see a problem, please open an issue: https://github.com/CodingTrain/thecodingtrain.com/issues/new
#texttospeech #speechtotext #chatbot #rivescript #llms #agents #ai #transformersjs #webaudioapi #javascript #p5js