Why hook strength beats generic virality scoring

May 20, 20268 min read

rankingengineering

Pick any AI clipping tool and read its landing page. You will find a single number attached to each clip — usually labeled virality score, magic score, or AI ranking — that promises to tell you which moments are about to go viral. The number is comforting. It compresses a complex decision into something a creator can sort by. But the number is doing a lot of work, and most of that work is fictional.

Inside that single score, AI clippers typically bundle five or six unrelated signals: sentiment intensity, transcript keyword matches against a trending-words list, audio energy peaks, face count, motion intensity, and sometimes — increasingly — an LLM-generated “how interesting is this” rating. Each of these signals captures something real, but they capture different things, and the platforms reward them differently across formats. Bundling them into one number is what makes the score brittle.

When we started building AutoAIClips, we tried the bundled-score approach. It worked in the demo but broke in production. Two months later, after watching what our users actually shipped and what landed on the For You page, we tore the composite down and rebuilt around a single signal: hook strength in the first three seconds. Here is the case for why that approach has held up while the composites rotted.

What the composite score is actually doing

The mechanical setup for most clippers looks like this. Transcribe the source video. Walk the transcript in overlapping windows of 30–60 seconds. For each window, compute a feature vector: average sentiment, peak audio energy, presence of trending keywords, density of speaker turns, number of detected faces, motion intensity from frame diffs. Pass the feature vector through a small linear model trained on a dataset of “clips that went viral.” Output a 0–100 score. Sort descending. Return the top ten.

This works as long as the model’s training distribution matches your use case. A clipper trained on dance challenges has learned that motion intensity and face count predict virality. A clipper trained on podcast clips has learned that peak sentiment and trending-keyword density predict virality. The moment your content distribution diverges from the training distribution — say, you upload an interview podcast to a tool trained on dance challenges — the composite score becomes randomness wearing a confidence interval.

Why hook strength survives

Across every short-form platform we’ve tested — TikTok, Reels, Shorts, LinkedIn video — the single dominant predictor of feed performance is whether the first three seconds compel a viewer to keep watching. The platforms do not all use the same algorithm, but they all use some version of three-second completion or swipe-away rate at second three as a primary input to the feed ranker.

This is not a guess. It is the most consistently surfaced public signal in the platforms’ own documentation, in researcher reverse-engineering papers, and in the leaks. TikTok’s 2023 leaked algorithm spec made three-second retention a heavily-weighted feature. Meta’s 2024 Reels ranking documentation lists “completion to three seconds” as an input to the candidate retrieval stage. YouTube Shorts engineers have publicly described the shelf’s ranking model as “Will the viewer swipe before second three?”

If you accept that premise — and the platforms are remarkably consistent on it — then ranking clips by anything other than hook strength is leaving money on the table. Sentiment intensity, trending keywords, face count: these matter only insofar as they influence the three-second hook. If a clip has a great hook but no trending keywords, it will still perform. If a clip is loaded with trending keywords but opens with a slow setup, it will get swiped.

How we actually measure hook strength

The AutoAIClips ranker scores each candidate window by sending the first three seconds of transcript — and only the first three seconds — to an LLM with a specific prompt: given this opening, would a TikTok viewer keep watching, or would they swipe? The model returns a 1–10 score and a one-line justification. We use the score; the justification is for our own debugging.

That sounds suspiciously simple. The trick is the prompt. We do not ask the model “is this viral.” That is too abstract; you get reasoning that hallucinates confidence. We ask “would a viewer swipe.” That is concrete; the model can reason about it the same way a human can. Three years ago this would have required fine-tuning; with GPT-4o or Claude 4.5 it works zero-shot.

What about retention beyond second three?

Hook strength gets you in the door. After second three, retention is driven by storytelling closure — does the clip pay off the hook? We score this as a separate dimension, with a different prompt, against the full transcript window. The final rank is a weighted sum: 0.65 × hook_strength + 0.35 × closure. We tried other weightings; this one produced the highest creator-reported satisfaction in the user research.

Why we don’t expose the score to users

You will notice that the AutoAIClips dashboard does not show a number next to each clip. We tested it; users hated it. The score creates a false anchor — creators look at the “87” next to a clip and start second-guessing whether it’s worth posting if a different clip scores 92. In reality, the difference between clip-rank-1 and clip-rank-3 is usually within the noise of our model.

What matters is the ordering: our top 10 are reliably better than the next 10. The specific rank within the top 10 is mostly noise. Showing the ordering (clip #1 first, clip #2 second, etc.) without showing the underlying scores gives creators the signal they actually need without the false precision.

The takeaway for creators

When you evaluate AI clippers, ignore the magic score. Watch the top three clips the tool produces. If those three clips have strong three-second openers — a question, a contrarian claim, a hooky one-liner, a visible reaction — the ranker is working. If they meander into the topic, the ranker is doing something else, and that something else is probably brittle.

Better clipping does not come from more features. It comes from picking the right signal and committing to it. We picked hook strength. The platforms’ rankers will keep changing, but as long as their primary input is three-second retention, our ranker stays aligned. That is the bet behind the product.

Try AutoAIClips on your back catalog.

$9.99/week. Cancel from the billing portal in one click.

Get started