The NEXT BIG paradigm shift in AI: Continuous Autoregressive Language Models (CALM)
8:38

The NEXT BIG paradigm shift in AI: Continuous Autoregressive Language Models (CALM)

Universe of AI 05.11.2025 2 024 просмотров 74 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
We’ve built massive language models like ChatGPT and Gemini — but they’re still limited by one thing: tokens. Every AI today generates text one piece at a time, slowing things down and eating up compute power. A new model called CALM (Continuous Autoregressive Language Models) might change that forever. Instead of predicting words, CALM predicts continuous meaning vectors — unlocking a faster, cheaper, and smarter way for AI to think. In this video, we break down what CALM is, how it works, and why it could be the next big shift in artificial intelligence. 📘 Research Paper: https://arxiv.org/abs/2510.27688 🧠 Project Page: https://shaochenze.github.io/blog/2025/CALM 0:00 - Intro 0:44 - The Problem 1:33 - The idea behind CALM 2:25 - How it works 3:18 - Learning Without Probabilities 4:06 - Measuring Performance Without “Perplexity” 4:42 - Controlling Outputs 5:22 - Results 5:58 - Why it Matters 6:47 - What's Next 7:26 - Outro 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ CALM AI, CALM explained, Continuous Autoregressive Language Model, token free AI, AI paradigm shift, next AI breakthrough, AI revolution 2025, new AI model 2025, future of AI, AI research explained, Universe of AI, LLM explained, how AI works, language model 2025, transformer alternative, AI architecture explained, AI tokens explained, OpenAI news 2025, DeepMind AI, Gemini AI, ChatGPT update 2025, next generation LLM, semantic bandwidth AI, AI innovation, AI trends 2025, artificial intelligence explained, cutting edge AI research #UniverseOfAI #CALM #ArtificialIntelligence #AIExplained #LanguageModels #LLM #DeepLearning #AIResearch #AIRevolution

Оглавление (11 сегментов)

  1. 0:00 Intro 126 сл.
  2. 0:44 The Problem 136 сл.
  3. 1:33 The idea behind CALM 141 сл.
  4. 2:25 How it works 129 сл.
  5. 3:18 Learning Without Probabilities 131 сл.
  6. 4:06 Measuring Performance Without “Perplexity” 97 сл.
  7. 4:42 Controlling Outputs 99 сл.
  8. 5:22 Results 86 сл.
  9. 5:58 Why it Matters 139 сл.
  10. 6:47 What's Next 117 сл.
  11. 7:26 Outro 216 сл.
0:00

Intro

Have you ever wondered why AI tools like Chat GPT sometimes take a few seconds to finish a sentence? Almost like they're typing one word at a time. That's not just for effect. It's actually how these models think. They generate text piece by piece, predicting the next tiny bit of text over and over again. It works, but it's painfully slow and super expensive to run at scale. Now, a team from Tencent's WeChat AI and Singwa University has proposed something completely different. A new way for AI to generate language that might finally break that limit is called COM, short for continuous auto reggressive language models. And it could completely change how large language models work today. Let's start with what's wrong with the
0:44

The Problem

way today's models talk. When chat GPT or Gemini answers you, it doesn't actually know the whole sentence in advance. It's guessing one chunk at a time based on what came before. Every word, every punctuation mark is one prediction after another. That means even the biggest, smartest models are still crawling through text one tiny step at a time. Each of those steps only carries a little bit of meaning, about 15 bits of information according to researchers. To double that meaning, you'd have to make the model's vocabulary massive, which would make it too slow and expensive to train. So, no matter how powerful the hardware gets, were still stuck with this built-in speed limit. The comm team calls this the token bottleneck, and it's the reason today's AIS are both expensive and inefficient.
1:33

The idea behind CALM

So, here's the big idea behind COM. If predicting one word at a time is the bottleneck, why not predict bigger chunks of meaning all at once? Instead of generating discrete tokens, those little bits of text, comm predicts continuous vectors. You can think of a vector like a smooth highresolution picture of meaning. Imagine taking a simple phrase like the cat sat on the mat and then squeezing it into a single mathematical signal that captures this entire idea. Then instead of predicting word by word, the model predicts that single signal, that vector. It's like upgrading from dialup to fiber internet for AI languages. Each step carries way more meaning which means the model can think and write faster using less compute power. The researchers calls this increasing the semantic bandwidth. Basically packing more meaning into each prediction. To
2:25

How it works

make this happen, comm uses something called an autoenccoder. Think of it as a translator between words and this new meaning vectors. Here's what it does. It takes a small chunk of text, maybe four tokens, and compresses it into a single vector. Then it learns how to reverse that process and rebuild the original text from that same vector. And surprisingly, it's pretty accurate. The reconstructions hit over 99. 9% accuracy. Early on, those vectors were fragile. Even a tiny error could turn the cat sat on the mat into nonsense. So the team added a few clever tricks like variational encoding and dropout to make the system more forgiving to small changes. The result is a smooth, stable representation of language that's perfect for generating continuous text.
3:18

Learning Without Probabilities

Now here's where comm really sets itself apart. Normal language models learn by predicting probabilities. They use something called softmax, which decides which word is most likely to come next. But comm doesn't deal with words anymore. It deals with continuous numbers. There are no fixed categories, no list of possible next words. So instead of calculating probabilities, it learns by measuring how close its prediction is to the real answer. This method is called the energy score. The model gets rewarded when its predicted vector is close to the real one. That's accuracy. And when it produces a healthy variety of results, that's creativity. It's a new kind of learning, one that doesn't need to calculate explicit likelihoods, which makes it faster and far more flexible. If you've heard of
4:06

Measuring Performance Without “Perplexity”

perplexity, that's the classic way to judge how good a language model is. But perplexity depends on probabilities, and comm doesn't use those. So, the team invented a new score called Brier LM based on something called the Brier score from statistics. In plain English, it measures how close the model's outputs are to what humans would actually write without needing probabilities. This new metric works not just for calm, but also for comparing all kinds of model on equal ground. It's like a universal test for how confident and accurate an AI really is. Every AI
4:42

Controlling Outputs

model has that little temperature setting that controls how creative it gets. Lower temperature equals safe, predictable answers. Higher temperature equals more creative, sometimes weird ones. But Khan can't use the usual math for that because again, no probabilities. So the researchers built a workaround using something called rejection sampling. Basically, the model generates a bunch of possible answers and only keeps the one that match the desired creativity level. It's a surprisingly elegant solution that gives Comm the same control over tone and style that today's LLMs have without needing logits or probabilities at all. Does this actually
5:22

Results

work in practice? Pretty much yes. The researchers compared COM with regular transformers on standard benchmarks at the same quality level. Comm used about 40% less compute for both training and inference. For example, a comm model with roughly 370 million parameters matched a standard 280 million parameter transformer, but with almost half the training cost. That's huge when you think about how expensive it is to run modern LLMs. It means the next wave of models could be faster, cheaper, and greener without losing performance.
5:58

Why it Matters

Okay, so this all sounds cool, but why does it actually matter? Right now, the biggest problem with large language models isn't just how smart they are. It's how expensive they are to run. Every response costs money. Every token burns compute and scaling them takes up massive data centers. If something like com can cut that cost by even 30 or 40%. That's not just an upgrade, that's a revolution. It means faster models on smaller devices. It means AI that's not locked behind expensive APIs. and it could open the door to models that understand not just text but sound, video, and even real world signals, all using the same continuous ideabased framework. In short, Comm isn't just another cool paper. It's a step toward making AI more accessible and more efficient for everyone. Now, we're not
6:47

What's Next

there yet. Comm is still early. Think of it as a proof of concept, but it gives us a glimpse into what the next few years could look like. Imagine Open AI or Google taking this idea and building models that don't rely on tokens at all. They generate meaning directly, which could make responses instant, more humanlike, and less robotic. And it might also change how we train these systems. Instead of collecting massive text data sets, we could teach them directly on continuous data like video or sensor inputs, the way humans experience the world. That's a huge shift from AI that predicts text to AI that understands and expresses ideas. So what comm really
7:26

Outro

represents is a mindset shift. For years, we've been trying to make language models bigger, faster, smarter, stacking parameters, training on more data, scaling compute through the roof. But comm reminds us that maybe the next leap isn't about adding more. It's about rethinking the foundation. If models like this take off, AI won't need to predict every single token anymore. It could jump entire ideas at a time, like thinking in concepts instead of letters. Imagine AI that can summarize a movie scene or understand emotions in a conversation, not by analyzing words, but by processing pure meaning. That's where the future is heading. And comm might be one of the first real steps towards that. And honestly, that's what makes AI so exciting right now. We're past the bigger is better phase. Now, it's about being smarter with how we design intelligence itself. If this kind of breakdown helped you understand what's coming next in AI, make sure to hit that like button, subscribe to the universe of AI, and let me know in the comments what you think. Are continuous models like comm the future or just another experimental idea that will fade away? Either way, the next generation of AI is being built right now and is getting a lot more calm.

Ещё от Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться