How To Make an AI Influencer?

22:49

How To Make an AI Influencer?

Vaibhav Sisinty 09.02.2026 38 487 просмотров 1 077 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

🔗 Join our WhatsApp Community Get the latest AI updates, tips, and insights straight to your inbox: https://dub.sh/ai-updates-vs https://higgsfield.ai/ I made a full cinematic ad using AI. No actors. No crew. No studio. Just prompts. In this video, I break down exactly how I created fake street interviews and a complete GrowthSchool ad using Kling 3.0 on Higgsfield — and I read every single prompt out loud so you can copy them. Here's what you'll learn: → How to write one prompt that generates video + dialogue + sound effects → How to keep the same face across every shot (Elements Library) → How to generate videos in any language (Hindi, Spanish, Japanese…) → How to direct action scenes with slow motion and speed ramps → How to storyboard multi-shot sequences — up to 6 camera cuts in one prompt → The 6 camera words that work in every Kling prompt (handheld, dolly, crane, arc, lateral pass, crash push) → How to use negative prompts to fix AI glitches → Full shot-by-shot breakdown of an 8-shot cinematic ad 0:00 — None of these people are real 0:49 — What's new in Kling 3.0 on Higgsfield 1:46 — Clip 1: Cafe girl — full prompt breakdown 4:39 — Clip 2: Character consistency with Elements 7:17 — Dolly tracking shot explained 8:07 — Clip 3: Multilingual prompts (Hindi) 9:30 — Camera vocabulary cheat sheet (6 words) 9:51 — Clip 4: Fight scene & speed ramps 11:46 — Clip 5: Multi-shot storyboarding 13:28 — Full GrowthSchool ad — just watch 14:22 — Ad breakdown: 8 shots explained 17:33 — Writing for the edit (Dutch angle, split dialogue) 19:02 — Cinematic details (lens flare, golden light) 20:12 — Two-word delivery note pattern 21:36 — SFX world-building 22:06 — Final thoughts + links -------- To Know More, Follow Vaibhav Sisinty On ⤵︎ Instagram @VaibhavSisinty https://www.instagram.com/vaibhavsisinty Twitter @VaibhavSisinty https://twitter.com/VaibhavSisinty Facebook @VaibhavSisinty https://www.facebook.com/vaibhavsisinty/ LinkedIn - Vaibhav Sisinty https://www.linkedin.com/in/vaibhavsisinty

Оглавление (16 сегментов)

None of these people are real

None of these people are real. Every face you just saw, every voice, every background generated by AI in about 3 minutes each. I've built three startups, raised $5 million from Seoia, and I use AI 8 to 10 hours a day to run my business. I make these videos to teach you exactly what I'm learning, the stuff that's actually helping me build faster. Here's exactly what I'm going to show you. First, the exact prompt behind each of those five people. I'm reading every line out loud and explaining what it does so you can write your own. Second, a full cinematic ad I built for my own company, Growth School, using all those techniques stacked together. You'll watch it, then I'll break down every single shot.

What's new in Kling 3.0 on Higgsfield

All of this was made with Cling 3. 0i, a brand new AI video model. I've been running it through Higsfield. That's where every generation in this video was done. Here's what's new. It generates up to 15 seconds of video. Native audio, dialogue, sound effects, music, all in one generation. Character consistency across shots, and multi-shot storyboarding, multiple camera angles in a single prompt. Here's the thing. I'm not just going to show you cool clips. I'm going to read you the actual prompt for each one and break down what every single line does. So you could literally copy paste these and get similar results. Sound good? Let's start with clip number one. One thing before we start, every single prompt I'm about to show you, all five interview prompts, all eight ad shots, they're all in our WhatsApp community. Copy paste ready. I also share new prompts and AI updates there before they go on YouTube. Links in the description. Go join. I'll wait. Let's start with

Clip 1: Cafe girl — full prompt breakdown

this one. Woman at a cafe talking about an AI tool she heard about. — You heard about this AI tool? Which AI tool was that? This AI tool that makes uh presentations using AI. — One prompt. Everything you see and hear came from one prompt. Let me show you exactly what I typed. A young Indian woman, late 20s, sitting at a modern cafe table with a laptop open. Warm afternoon light through window. She leans forward and says to someone off camera, "You heard about this AI tool? Which AI tool was that? This AI tool that makes presentations using AI. " Casual, curious delivery. Handheld medium close-up shallow depth of field. SFX coffee shop ambiance espresso machine hissing softly. Okay, let me walk you through this line by line. That's the character description. You're painting the scene. Who is this person? What do they look like? Where are they sitting? Be specific. Late 20s gives the model an age range. Modern cafe sets the vibe. Laptop open adds a prop. The more specific you are here, the better the output. Lighting. Most people skip this. Don't. One line about lighting changes the entire mood of your shot. Warm afternoon light gives you that golden hour feel. If I'd said harsh overhead fluorescent, you'd get a completely different video from the same prompt. She leans forward and says to someone off camera, "You heard about this AI tool? " Two things happening here. Leans forward is a body action. Tells the model how she moves. And then the dialogue. See how the words she speaks are inside single quotes. That's the rule. Any words inside quotes in your prompt. The character actually says them out loud in the video. That's native audio generation. No separate voiceover tool needed. Handheld medium close-up. Shallow depth of field. Camera direction. Handheld means slightly shaky, like someone filming with a phone. That's what makes it feel like a real street interview. Medium close-up means framed from the chest up. Shallow depth of field means the background is blurry. Three phrases and you just told the AI exactly how to shoot this. SFX, coffee shop ambiance, espresso machine, hissing softly. And this is the sound design. this AI tool that makes uh presentations using AI. — SFX colon then describe what you hear. The cafe chatter, the espresso machine in the background. This generates with the video, not after, not separately, together. One prompt, one generation. You get video with dialogue with sound effects. That's the shift. Watch it again now that you know what went into it. One prompt, character, lighting, dialogue, delivery, camera, sound, all in one box. Now, this one

Clip 2: Character consistency with Elements

— saw this AI tool that keeps the same face in every video. Like the exact same person, same place every time. — Woman walking down a busy street. Camera tracking her. But the lesson here isn't the camera. Watch this. Street, office, same person, same jawline, same eyes, two completely different generations, identical face every time. See the difference? Left side, old AI video. Generate someone twice, you get two different people. Right side, cling 3. 0 zero with elements. Same person every time. Let me show you how to set it up. Inside Higsfield, there's something called the elements library. I already have this character ready in my elements library. Her name is Arya Cling. But let me show you how I set her up. So you can do the same thing. You come to the elements library and upload reference images of front face, side angle, smiling, neutral, three to five photos. Then you label it. I called her Arya Cling. And here's the key step that most people miss. Every time you generate a video with this character, you have to tag the character in the prompt box from element library. That is what tells the model which face to use. Skip it and you'll get a random face. Now, let me read you the actual prompt for this walking shot. A young woman walks along a busy city sidewalk. She glances at camera and says, "I saw this AI tool that keeps the same face in every video. Like the exact same person, different places, same face every time. Casual amaze delivery, steady dolly tracking shot from slightly ahead, shallow depth of field, urban environment, natural afternoon light, SFX, traffic, distant chatter, footsteps on pavement, like let me walk through it. Arya reference image from elements first line. Before the prompt even starts, you tag your character from the elements library. The model looks at this image and says, "Okay, this is the person I'm generating. " That's how it keeps the face consistent. No reference image from elements, no consistency. Notice something different. In the cafe prompt, I described her in detail, age, hair, what she's wearing. Here, I don't need to. The reference image from elements already tells the model what she looks like. So, the character description can be shorter. You're just setting the scene now. What is she doing? Where is she? " Same dialogue framework as before. Words in quotes. She says them out loud, but I also added a body action glances at camera. That little head turned toward the lens is what makes it feel like a candid street interview moment instead of a scripted take. Steady dolly tracking shot from

Dolly tracking shot explained

slightly ahead. New camera move. Dolly tracking means the camera moves alongside her as she walks and from slightly ahead means the camera is in front of her looking back. That's what gives you this walking interview feel like a crew is walking backwards filming her. Two words in the prompt dolly tracking. That's all it takes. SFX, traffic, distant chatter, footsteps on pavement. Same SFX framework as the cafe. But notice I got more specific. Not just street sounds, but three distinct elements. traffic, chatter, footsteps. The more specific you are with your sound effects, the more real the video feels. Watch it again. Same prompting framework as the cafe girl. Character dialogue delivery, camera, SFX, but now with two new tools, the elements for character consistency and dolly tracking for the camera.

Clip 3: Multilingual prompts (Hindi)

Everything builds now Hindi AI. Let me read you the Hindi version prompt. Two things here. First, new camera move. Cranes down means the camera starts high at ceiling level and comes down to the subject. It's a reveal. You see the room first, then you see the person. And second, notice the word revealing. That's a storytelling cue. It tells the model this camera move has a purpose to reveal the character. That one word changes the pacing of the shot. Here's how multil language works. You literally write the dialogue in whatever language you want. Hindi, Japanese, Spanish, Chinese, Korean. Cling 3. 0 supports all of them. Just write it directly in the prompt. The model generates the voice and sings the lip movements to that language automatically. No separate translation tool. No dubbing. One prompt. SFX sets the room. Keyboard and office hum. Simple, but it tells your brain this is a real office. — Without it, the shot feels like it's happening in a void. Same prompt structure, same framework. Just swap the language in the dialogue line and the model handles everything else. I know.

Camera vocabulary cheat sheet (6 words)

Crane down, dolly, handheld. This sounds like film school. It really isn't. Here's your whole vocabulary. Six words. Handheld for real, dolly for smooth. Crane for up down. Arc for orbit. Lateral for fast energy, crash push for dramatic zoom. That's your whole camera vocabulary. These words work directly in cling prompts. This next one, this is where it gets

Clip 4: Fight scene & speed ramps

interesting. Part four is a fight scene and it was generated from one text prompt. Let me break this down. A martial artist in a black training outfit on a rooftop at golden hour. City skyline behind. Same as every prompt before this. But notice something. The setting is doing the work. Rooftop at golden hour with a skyline. That's not just a location. That's a mood. That's cinematic production value from one sentence. The light, the backdrop, the scale, all from describing where the person is. He throws a fast jab, dodges left, then lands a spinning roundhouse kick. This is action choreography. You're writing a sequence of physical moves. Jab, dodge, kick. You're saying exactly which moves in exactly which order. The more specific you are, the more the output looks like it was actually choreographed. Slow motion as the kick connects. Speed resumes on the follow through. This is a speed modifier. The model slows the frame rate right at the moment of impact. Then speed resumes on the follow through. That's a speed ramp. The same technique you see in every action movie. Two lines in your prompt. That's it. Dynamic tracking shot that follows the action. Camera swings with each move. We are not using a specific camera move from the vocabulary table here. Instead, we are telling the camera to match the action, follows the action, swings with each move. It's not on a fixed path. It's responding to what the character does. The camera operator follows the choreography. Same principle here. And each shot is tied to a specific sound effect. Whoosh on the jab, snap on the dodge, thud on the kick, wind on the rooftop. That's not just adding sound, that's sound design. Every sound matches a physical movement. Without these, a fight scene looks like a silent film. And for a fight scene, these are absolutely critical. Last one, and this

Clip 5: Multi-shot storyboarding

is the biggest one, multi-shot storyboarding. This is the single biggest upgrade in Cling 3. 0. Before this, every AI video was one continuous shot. Now you can direct up to six camera cuts in one prompt. Short one, 0 to 4 seconds. Short two, 4 to 9 seconds. Short three, 9 to 13 seconds. See the new format. Each one has its own time range. 0 to 4 seconds. 4 to 9 seconds. 9 to 13 seconds. In Higsfield, each shot gets its own prompt box. You set the duration. Minimum 2 seconds per shot. Maximum six shots total. Up to 15 seconds for the whole thing. Shot one, wide establishing shot. Shot two, medium shot inside. Shot three, close-up of her face. Each shot has a different camera angle and a different location. Wide exterior of the station, medium inside the station, close-up of her face by the window. You're not just changing the angle, you're changing the entire scene. That's short progression. Wide to set the scale, medium to meet the person, close up for the emotional beat. And because each shot has its own prompt, you can put the camera literally anywhere. Music, slow ambient synth, building warmth, SFX, faint heartbeat, quiet breathing. Short one gets space ambiance. Shot two adds interior sounds. Shot three brings in music for the first time, plus intimate sounds. — I'm 400 km above Earth and even up here, I can't escape people talking about AI. — Each shot has its own soundsscape. Three completely different audio worlds. The AI is your camera crew, your sound department, and your editor, all from one prompt. Okay, so now you've got the

Full GrowthSchool ad — just watch

complete toolkit. Sound good? Now, let me show you what happens when you put all of that together. I took every single technique we just covered. Camera vocabulary, dialogue, sound effects, negative prompts, multi-shot storyboarding, and I stacked them all into one thing. A real ad for my company, Growth School. No actors, no crew, no studio, no stock footage, just prompts. Let me just play it. No commentary, just watch. You heard about this AI tool? — Which AI tool was that? — This AI tool presentation using AI. — I saw some AI tool which can do that. — Change the clothes using some AI tool. — Looking into what's happening in the AI world.

Ad breakdown: 8 shots explained

Look at that. eight different scenes, a bee, a garden, a tennis court, a FIFA match, a film set, and a full cinematic car sequence with layered sound design. 29 seconds, one AI model. And here's the thing I mentioned at the beginning, that car sequence at the end. She speaks, she sigh, she turns off the car, opens the door, walks away into the sunset. Four shots, four sound effects, zero dialogue. A complete cinematic story told entirely through multi-shot storyboarding and SFX. That's what makes this usable for real business. It's not just talking heads, reading lines. You can direct actual scenes, emotions, actions, transitions, sound design that tells a story, all from prompts. So, let me break down every single shot so you can see exactly how each one was made. Extreme close-up of a honeybee landing on a bright orange flower. No character here, no dialogue. This is pure visual generation. You're just describing a scene, not a person. Extreme close-up and macro lens field tell the model to get tight on the subject like you're shooting with a macro photography setup. This is how you create cinematic B-roll with AI. SFX, BB buzzing, soft wind through leaves. Sound design for a nature shot. The buzzing sells the close-up. Without it, this just feels like a stock photo slideshow. With it, it feels alive. Same SFX framework as every other prompt, just applied to nature instead of a person. A gardener in a white shirt and sun hat tends to green leafy plants in a bright garden. Pure text to video, no reference image, no start frame. You just describe what you want and the model generates it from scratch. Same as the cafe girl in person one. Character description, setting, action, all in one sentence. He looks up from the plants and says, "You heard about this AI tool? Body action plus dialogue. Same framework as every person in act two. Looks up is the action. Words in quotes are what he says. " And notice this is the same line the cafe girl said in the opening. Same words, completely different person, completely different setting. One prompt framework, infinite variations. Handheld, medium short, handheld again. same shaky real feel from the cafe, but medium shot instead of medium close-up. Medium shot is wider, so you see more of his body and the garden. That one word shift changes the framing. A woman in a green cardigan sits on a tennis court tying her shoelaces, props, and actions. Tying her shoelaces gives the character something to do before she speaks. That's what makes it feel candid. She's mid-action when she looks up. — A character doing nothing feels posed. A character interrupted feels real. Low angle handheld. New angle. Low angle means the camera is below eye level looking up at her. That's a subtle power shift. When you look up at someone, it makes them feel confident, grounded. Combined with handheld, it still feels casual. Camera angle is just another tool in your vocabulary. Two separate clips here. A teacher in a

Writing for the edit (Dutch angle, split dialogue)

classroom. Then two football players walking onto a pitch. Each is its own generation. He gestures as he speaks. This AI tool. Notice the dialogue is incomplete. Just this AI tool with the sentence trailing off. That's intentional. This clip cuts mid-sentence to the next one. You're writing for the edit. The AI generates whatever words you put in quotes. So, you control exactly where the cut happens by controlling where the sentence ends. Slight Dutch angle. New camera technique. Dutch angle means the camera is slightly tilted, not level with the horizon. It adds energy. This feeling that something is off or dynamic. Film directors use it for tension or style. One phrase in your prompt and the AI tilts the camera for you. Two football players in national team jerseys walk onto a large stadium pitch. Two characters in one generation. You can generate scenes with multiple people. You just describe both in the prompt. National team jerseys gives the wardrobe. Walk onto a large stadium pitch sets the scale and action. The model handles two bodies in frame. Presentation using AI, casual matterof fact delivery. The other half of the sentence from the classroom starting with inverted commas tells you this is a continuation. The edit will make it feel like one thought split across two completely different words. Classroom to stadium in one cut. That contrast is what gives the ad its energy. A couple

Cinematic details (lens flare, golden light)

sitting on the roof of a vintage car at sunset. Warm golden light. Scene setting with mood. Roof of a vintage car is extremely specific. That one detail transforms the shot from generic to cinematic. And warm golden light does the same thing the cafe promps warm afternoon light did. Controls the entire color palette with one phrase. Same technique, bigger visual. Lens flare from sun. Cinematography detail. Lens flare is when sunlight hits the camera lens and creates those streaks and glows. Adding this one line makes the shot feel like it was filmed with a real camera by a real DP. These tiny visual cues are the difference between looks AI and looks cinematic. Stable identity for both characters. A large professional camera is visible filming him in the foreground. This is meta. You're including a camera inside the AI generated video. a character being filmed on a film set inside a video that was itself generated by AI. It's self-referential and it works because you describe the camera as a prop in the scene. You can put anything in frame just by describing it. Amused

Two-word delivery note pattern

laid-back delivery, new delivery combo. Amused is doing specific work. Slight smile, relaxed tone, like he's in on the joke. Every delivery note you've seen has been just two words. Casual, curious, casual, amazed, thoughtful, serious, amused, laid-back. That's the pattern. Two words, one for mood, one for energy. That's how you control a voice performance. Palm trees visible through the window. Golden sunset light streaming in. Environment through details. You're not describing a sunset scene. You're describing exactly what the camera sees. Palm trees through glass. Light streaming in. These specific visual anchors give the model something concrete to render. Vague prompts get vague results. Specific details get cinematic shots. Reflective, slightly tired delivery. Most nuanced delivery note so far. — Looking into what's happening in the AI world. [sighs] Reflective and slightly tied together create a very specific emotional texture. End of a long day. Processing thoughts. The model adjust speaking pace, voice energy, even facial micro expressions based on these words. This is why delivery nodes matter. They're your emotional direction. SFX, idling

SFX world-building

engine, muffled radio, distant traffic. Three layers of sound that tell your brain she's in a car without the camera ever showing the full vehicle. — Looking into what's happening in the AI world. — Idling engine underneath. Radio from the dashboard. Traffic outside. These sound details are doing world building. The viewer's brain fills in everything else.

Final thoughts + links

Look, not everything worked on the first try. Some shots took two or three generations, but that's a few minutes each. Total time from first prompt to that finished ad, maybe an hour. And most of that was me just experimenting and having fun with the prompts. Everything in this video was made on Higsfield using Cling 3. 0. They're running unlimited generations right now on the annual plan, so you can experiment without watching your credits. Links in the description. Every prompt from every shot, the interviews and the ad is in our WhatsApp community. Copy paste ready also in the description. And if you want to push this even further, that video is on screen right now.

Другие видео автора — Vaibhav Sisinty

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник