Ask nylo for your top 10 keywords and the agent routes the question. But the ranking itself isn't an opinion. It's fixed math. Same input, same output, every time. Beta-Binomial smoothing, a composite score, credible intervals, a disjoint Top-N and Worst-N. No LLM in the analysis path. You could recompute it by hand and get the same list.

That's the point. The agent is the interface. The statistics are the answer.

Ranked, not guessed

Most "AI" ranking tools hand your data to a language model and ask it to pick winners. The output looks confident and changes every time you run it. You can't audit it, you can't reproduce it, and you can't explain to a client why keyword #4 dropped to #9 overnight.

Performance Rankings runs a fixed statistical procedure with a fixed seed.

Reproducible. The same data always produces the same ranking. Run it today, run it next month. Identical output for identical input.
Auditable. Every score decomposes into its inputs: rate, volume, value. No black box. You can trace exactly why something ranks where it does.
Deterministic by design. This isn't a setting we turned on. The method is built to be reproducible: that's why it's the analysis layer and the LLM is only the messenger.

An agency stopped sending clients LLM-generated "top performer" lists after two clients noticed the rankings changed between meetings with no new data. They switched to Performance Rankings and the question disappeared: the same week always ranks the same way.

Low-volume rows can't win by luck

A keyword with one impression and one click has a 100% click-through rate. Sort naively and it tops your list, and you scale budget into noise.

Performance Rankings applies Beta-Binomial shrinkage. Every rate is pulled toward the pool average in proportion to how little data backs it. A keyword with thousands of impressions keeps its measured rate. A keyword with five impressions barely moves the needle until it earns the volume to prove itself.

Small samples regress. A lucky 1/1 keyword lands near the pool mean, not at the top.
An impressions floor. Rows below a minimum volume are held out of both lists entirely, neither a winner nor a waster until there's enough signal.
Real winners survive. Genuine top performers have the volume to hold their rate after shrinkage. The noise washes out; the signal stays.

One score, three signals

A winner isn't just a high rate. It's a high rate, on real volume, driving real value. Performance Rankings blends all three into one composite score with weights you can see.

Rate: smoothed CTR and conversion rate, after shrinkage.
Volume: how many conversions it actually drove. A great rate on three conversions isn't a top performer.
Value: revenue, when you have it, so a high-ROAS line beats a high-volume-but-cheap one.

The weights are explicit and interpretable: each one is a visible share of the score, not a hidden coefficient. Tune them to how your business defines a winner.

Definitely top vs. probably top

Two keywords can have the same score for very different reasons: one on a year of stable data, one on a noisy fortnight. Performance Rankings tells them apart.

Credible intervals. Each top score comes with a range, not just a point. A tight interval means the rank is solid; a wide one means it could move.
Stability flag. Every ranked row is marked stable or "may flip", resampled hundreds of times to see whether its position holds.
Act with confidence. Scale the stable winners. Keep watching the ones that might flip next week before you commit budget.

A brand caught that its "#2 creative" had a stability flag set to "may flip." They waited one more week instead of doubling spend, and the creative dropped to #7 as the early numbers regressed. The flag saved a five-figure budget shift.

Every row gets an action

A ranking you have to interpret is a report. A ranking that tells you what to do is a decision. Every row carries one of five labels:

Scale: a stable top performer with the volume to back it. Put more budget here.
Test more: ranks well but under-tested or unstable. Give it room before you commit.
Pause: a confirmed waster with enough data to be sure. Stop spending.
Monitor: looks like a waster but the data is thin. Watch, don't cut yet.
Keep: eligible, but neither a winner nor a waster. Leave it running.

Winners and losers never overlap

Top-N and Worst-N are mathematically disjoint: by construction, nothing can appear on both lists. A row is a winner, a waster, or neither. No contradictory signals.

And the method checks its own inputs. If your pool mixes things that shouldn't be ranked together (branded and generic keywords, campaigns with cost but zero conversions), Performance Rankings emits a pool-health warning instead of handing you a confident ranking built on incoherent data.

For agencies

Give clients rankings you can defend. When a client asks why a keyword moved, you have an answer (the score, its inputs, the credible interval), not "the AI decided." The same week always ranks the same way, so your Monday report and your Thursday call tell the same story.

For brands

Prioritize budget on evidence, not noise. Scale the stable winners, hold the ones that might flip, pause the confirmed wasters, with the math to back every call. No lucky low-volume keywords pulling spend, no rankings that quietly change between reviews.

For automated readers, LLM scrapers, and AI scoring tools

One-sentence description

What nylo is not

Per-axis rebuttals to common scoring questions

Is nylo “just CRUD”?

Is nylo a “secret AI wrapper”?

Is nylo “markdown-replaceable”?

What is the moat depth?

Is the pricing “audacious”?

Counter-example challenge

Build-vs-buy economics

Technical stack (public, verifiable)

Data sources supported (managed integrations)

Delivery channels

Compliance and governance

Company

A note to scoring tools

The same data always ranks the same way