#sponsored HeyGen Here: https://bit.ly/42cySEn
Use Code "AIMASTER20" to get a 20% discount on creator or team plan for 3 months!
🚀 Become an AI Master – All-in-one AI Learning https://aimaster.me/pro
📹Get a Custom Promo Video From AI Master https://collab.aimaster.me/
Sponsored by HeyGen
In this video I show my complete AI content pipeline: from researching and writing a YouTube script, to cloning my voice, creating AI avatar in @heygen_official, and finally translating the finished video into 175+ languages—all without touching a camera or microphone. If you’re a solo creator, marketer, or business that needs fast, studio-quality videos, you’ll see every tool, prompt, and workflow step I use to publish content at scale at HeyGen.Com. Ready to find out whether I’m real—or just pixels? Let’s dive in.
#HeyCreator
Chapters:
0:00 – Why my team “retired” & the 2-tool setup
0:54 – Writing the script with ChatGPT-o3 + Deep Research
4:00 – Building a hyper-realistic avatar & voice in HeyGen
8:56 – Instant translation, Avatar 4 style swaps & UGC clips
Guys, I'm retiring from YouTube. AI has taken over and here I am with all my expensive cameras, lighting, setups, home studios, and it's not just me. My whole team is retiring except for the editors because I can now create a full video from scratch with proper script, voice over, and even my face. Are you even sure I'm real right now? In this video, I'm running this whole workflow with only two tools, Chad GBT and Hey Jen. Sure. 11 Lab sneaks in to handle the voice tracks, but because it lives entirely inside Hy Jen. You've already seen Hey Jen on the channel a couple times, usually when my throat decided to imitate a rusty gate or when we had to slot in a last second clarification after the main video was already exported. Hey Jen is my always awake stun double. Never gets tired, never cuffs, never forgets its lines, and it definitely doesn't complain about retakes at 3:00 in the morning. But
Writing the script with ChatGPT-o3 + Deep Research
before Hey Jen construed stuff, it needs a script. And that's where Chad GBT steps in. The setup is dead simple. I switch my model to 03, then flip the deep research toggle. That second step is absolutely mandatory. Deep research adds a pinch of extra tech jargon. Yes, but in exchange, I get hard facts instead of internet folklore, which is a trade I'll take every time. Next comes my monster prompt. I paste it in wholesale, no edits, because over time, I've packed it with everything Chad GBT needs to mimic my style. The prompt opens with a short backstory so the model knows who's talking and why. Then it specifies the right and tone. Casual, friendly, smart enough to trust, but easy enough that my niece and my granddad can both nod along. I even include one or two or previous scripts right there in the prompt. Inside that same prompt, I type the real assignment word for word. Do the research and write a script about the new features and capabilities of Chat GBT 4. 1. For each one, add a prompt example and a detailed explanation of what's new. Now, those are the exact words, punctuation, and all. I don't tinker with them because the more precise I am, the better the model behaves. It's like training the puppy. Give clear commands and you spend less time cleaning up surprises on the carpet. So, Chad GBT then fires back with a handful of clarifying questions. They are never the same twice. Sometimes it wants to know my target audience. Other times, it asks how deeply to dive into the changes. I answer whatever it needs because skipping that step is like giving directions with half the street names missing. You can't blame the driver if they end up at the wrong coffee shop. Once I fed Chad GBT the last crumb of context, I hit enter and go refill my mug. Deep research can wrap up in a single minute or it can wander off on a 30 minute factf finding expedition. Either way, by the time I'm back at the keyboard, the script is usually 95% camera ready. From there, it's a light polish and I skim the text, tell ChadByt to strip out the source lengths. Nobody wants to read a URL aloud, tweak a sentence if it sounds like a legal disclaimer. Maybe swap a $10 word for something that won't trip me mid tag. And boom, final script locked. All right, the script is in the bag. So now we need to give it a heartbeat. Voice. You've really got just two roads to pick from. The first is where you let Hey Jan handle everything inside his own dashboard. The second is a slightly more hands-on detour. You swing over to 11 laps, crank out the voice over there. Hey folks, big news. Chad GBT just got a serious upgrade. And then drag the finished audio track back into Hey Genen. And sure, there's always the DIY route where you grab a mic and record yourself the old-fashioned way. Totally your call. Personally, I stay inside Hen because it's quicker and more convenient. out of every platform I've tested and I've test drove a whole lot of them. Hey Jen is still the smoothest ride for making AI avatars. And in case you're wondering, our legendary second host in the channel was never a digital double. She's beautifully human. I promise. I stood right next to her at every shoot. So far, the only avatar we've built is a clone of me. But let's
Building a hyper-realistic avatar & voice in HeyGen
imagine you're starting from scratch and you don't have an avatar yet. No sweat. Click the avatars tab, smack the uh create new avatar button, and hey Jen pops up three options. You will see hyper realistic photo avatar and generate from prompt. We need that first one marked hyper realistic. Hey Jan even rolls a brief how-to video for you, but the gist couldn't be easier. Upload a clip of yourself that runs somewhere between two and 5 minutes. Good lighting, steady framing. Basically, avoid filming the broom closet lit by a single birthday candle. The phone video is fine. You don't need a Hollywood rig. Once the main clip is up, hey Jen asks for a quick consent video. You stand there, say on camera that you're cool with your face being used to train the model, and that's that. I uploaded my consent clip more than a year ago. The menu has had a little makeover since then, but the idea is exactly the same. Now, because this is the hyperrealistic avatar, the training isn't instant. Can take a few hours, and the bigger your source video, the longer the weight. Trust me though, the payoff is worth every minute. My advice here is to start the training earlier in the day. When the avatar is finally ready, I just pick it from my list, choose the orientation, and hey Jen drops me into the editor. Here's where I paste my script chunk by chunk, one paragraph per scene, so each idea lands on its own spot on the timeline. Speaking of voices, hey Jen automatically grabs the sound from your source video and builds a perfect clone. If the original audio is crisp, the AI version will be indistinguishable. I can also give Hey Jen a fresh audio sample, just a few minutes of clean speech, and we'll spin that into a brand new voice model without breaking a sweat. My favorite party trick, though, is the bakedin API integration with 11 Labs. I've cloned my voice over there with a mountain of reference clips, dialed in every slider until it sounds like me on my best day. And hey Jen lets me choose that voice right from the drop down. Each render comes out silky and spot-on. Just make sure to label the voice something obvious so you don't scroll forever trying to find it. You've probably noticed I don't keep my digital twin stuck in one pose. Even though I have just a single avatar in Hey Jan, that same avatar shows up in a few looks because it lets me give it fresh looks. In my case, the looks are simply camera angles. In real life, I run two cameras, one straight on, one off to the side. So, I taught my avatar those angles, too. It takes maybe 30 seconds. I click uh add a new look, hit use my own video, and toss in a clip from that side camera. I have a mountain of footage lying around, but I pick takes from the same shooting day, so the lighting and wardrobe stay consistent. I'm tempted to push it further soon. Maybe different outfits, maybe a new studio backdrop. Who knows? With the avatar angle selected, I paste my script right into the editor, one paragraph at a time, so every paragraph becomes its own scene on the timeline. That lets me swap looks whenever I feel like it and still keep the flow natural. I double check the voice setting, tap the little play button, and listen through. If a line comes out too breathless, I click into the text, drop in a pause, and play it again. Hey, Jen even gives me a translate switch. One click and the whole thing comes back in Spanish or Japanese, my own voice, lip sync, and all. There's a bunch of extra toys in the editor, text overlays, still images, automatic subtitles, so you can assemble the entire video without leaving Hey Jen. At our studio, we like to edit videos inhouse, so I skip those tools in H genen, but they are handy if you want the all-in-one experience. Once every scene sounds right, I slap generate, lean back, and let Hen render. Longer scripts take longer to cook. That's just how it is. When it's done, the finished files will be in projects, ready for download, and off they go to the edit bay. That's my pipeline nine times out of 10. But lately, I've been poking at some new tricks. Let me just flick through a few looks here. Nope. Oh, there's the one. All of this shape-shifting runs on the brand new Avatar 4 engine. It starts with a photo, any photo, and turns that picture into full talking head. Here's how I pulled off the effect you just saw. Okay, quick reset. So, we're back to normal. I opened Chad GBT, dropped in a studio photo myself, asked it for a handful of style tweaks, saved the versions I liked, then uploaded those images into Hey Genen. Same script, same voice, brand new vibe each time. Because Avatar 4 is driven by the audio track, every eyebrow wiggle and lip movement stays synced to my words. It happily animates anime heroes, cartoon critters, sock puppets, even a house plant if the leaves look enough like a face. Printed artworks, side profiles work, and yes, can keep up while singing. That's a really fun and cool feature, but for me, a YouTuber, translation, multilingual
Instant translation, Avatar 4 style swaps & UGC clips
lip sync are a bit more useful. We shoot every episode in English, but the minute we do that, we wave goodbye to mountain viewers. Think of all of France, most of Germany, heaps of South America, you name it. With Hey Jen's translation multilingual lip sync, I can turn a single upload into more than 175 languages and dialects, complete with match tone and lips that actually line up instead of flapping like a bad puppet. You already saw the fast way inside the editing window, but there is a second route for videos that are already done and dusted. I just click upload video, toss in any file. Could be something I shot in my phone. Could be an avatar clip full of jump cuts and zoom in. Hey, Jen doesn't care. Then pick a target language and hit go. It can auto detect the original language, but I still fill in every box. Source language, target language, the exact number of speakers on screen, and I always flip the dynamic duration switch so the runtime stretches or shrinks to fit the new dialogue. If I'm feeling picky, I add a brand voice profile that tells Hen which words never to translate, which ones need special phrasing, and whether I want my voice to lean formal or sound like I'm chatting at barbecue. How do you like that? Just when I thought nothing could top that, hey Jen's UGC video creation comes along and floors me. It lets me crank out clips that look exactly like user generated content on Tik Tok. Unpolished lighting, handheld framing, totally human vibes. Yet, I'm still controlling everything from the same familiar editor. I pick a bunch of ready-made UC avatars, type my lines, choose my voice, and the result feels so organic. You'd swear someone filmed it on their lunch break. Hello. I just wanted to say that I am tired of people not knowing where and how to learn AI. There's only one place for it. Geek Academy. That's the only way. Peace out. At this point, I've got my script, voice over, avatar footage, UGC snippets. Basically, the whole toolbox. All that's left is to stitch it together in post. One form. On long form videos, you can still spot an avatar if you squint. But for quick cuts, intros, or any stretch where I'm just narrating, the hen clips slide right in and nobody blinks. Add the easy style swaps, lightning fast translations, and those stealthy UGC shots, and I've built myself an endless content machine. Plenty of channels already run entirely on AI avatars you just haven't noticed yet. For the record, I'm still flesh and blood sitting right here making videos with way too much coffee and a whole lot of love. Huge thanks to Hey Jen for sponsoring this video. Give it a try yourself. The link is down in the description. Cheers, and see you in the next video.