Key Takeaways
- Six changes that moved the metric — Canonical answer blocks, llms.txt, schema expansion with author + dateModified, sourced statistics and quotations rotated into existing prose, entity links to Wikidata, and a deliberate brand co-occurrence push.
- The one that did nothing — Fluency optimisation. Rewriting smooth prose to be smoother produced no measurable citation lift on any engine. Time better spent elsewhere.
- Time to first signal — Two weeks. The first citation deltas showed up in the Otterly dashboard fourteen days after the llms.txt commit, on the two highest-traffic priority pages.
- The buyer-side lesson — I under-bought on query-set depth. I bought a $129/mo plan when I needed a $499/mo one. The cheaper plan made me look at 30 queries when the actual citation surface for my pages was closer to 150.
Why I ran this as a deliberate experiment on my own site
Most of what is currently written about generative engine optimisation falls into two buckets. Vendor decks selling a tool. Theoretical posts speculating about what the engines might reward. There is very little dated, first-person, specific-numbers content from people who actually shipped the changes on a site they own and watched what happened.
I run prommer.net as both a personal site and a working surface for the kind of CTO advisory work I do day to day. It is a good test subject. The content is technical enough that an AI engine has a reason to cite it, the surface is small enough that I can ship interventions in days rather than quarters, and I own every part of the stack, so there is nobody to wait on. I decided in late March 2026 to treat it as a controlled environment for GEO and to publish what worked and what did not, with the numbers attached.
This piece is the case-study companion to the tactical spoke at wetheflywheel.com/en/ai-search/how-to-rank-in-chatgpt/. That piece is the framework. This one is the field test.
The baseline I started from
Before the experiment, prommer.net was already a reasonably well-optimised personal site by 2024 standards. Server-rendered, fast, indexed in Google with a small but real organic presence. From a 2026 AI-search perspective, the baseline was almost entirely unprepared. Twenty-seven priority pages. No canonical answer blocks. Four FAQPage schemas, no Article schema with explicit author or dateModified, no llms.txt, no entity-linking discipline, and no measurement loop for AI citations.
The starting baseline mattered because it gave me a clean slate. None of the interventions below were already running. Everything below was a deliberate change, made in a sequenced order, with the citation-rate effect measurable against the prior state.
| Axis | Before | After (week 6) |
|---|---|---|
| Pages with canonical answer block at top | 0 / 27 | 27 / 27 |
| Pages shipping FAQPage schema | 4 / 27 | 22 / 27 |
| Pages with explicit author + Wikidata sameAs | 0 / 27 | 27 / 27 |
| llms.txt at root | absent | present, weekly auto-update |
| AI-referred sessions per week (GA4) | untracked | tracked, baseline established |
| Otterly citation share on 30-query test set | baseline only | measured weekly |
Change one: a canonical answer block on every priority page
The first move was structural and cheap. Two to four sentences at the top of every priority page, written in the exact language a user would phrase the underlying question. Not a marketing hook. Not an introduction. A direct answer to whatever the page exists to answer.
I ran this across all 27 priority pages in a single week, working from a list ranked by historical search traffic. Some of the answer blocks I wrote in five minutes; a few took longer because the page existed but I had never written down its primary thesis in two sentences. That part of the exercise was useful on its own, regardless of what the engines did with it.
The first measurable signal in the Otterly dashboard came eighteen days after the last block was deployed. ChatGPT started quoting one of the blocks verbatim in response to a query about CTO consulting practices. The same week, Perplexity began citing a different page with the exact phrasing from its answer block as the displayed snippet.
If I had to pick one intervention for a team starting from zero, I would pick the canonical answer block. It is cheap, structural, and has the highest measured extraction rate of any change I made.
Change two: llms.txt at the root
Second intervention, ranked by ease of execution. I wrote an llms.txt for the site root, listed the 27 priority pages grouped by topic, gave each a one-line description, and pushed a build step that regenerates the file every time a priority page changes. The whole change took about ninety minutes including the build automation.
CTAIO Labs had already published a thirty-day citation experiment on llms.txt across three test sites with per-engine deltas measured weekly. My intent was to replicate the pattern on a fourth site and see whether the deltas held in a different content domain.
They held. Otterly registered citation lift on two of three high-traffic priority pages inside fourteen days. The third page already had unusually strong direct visibility in ChatGPT and the dashboard registered effectively no change, which is consistent with diminishing returns when the page is already a default citation source.
Change three: schema.org expansion with author and dateModified
Third intervention, more invasive. I expanded the Schema.org JSON-LD on every priority page to include a full Article block with headline, description, image, datePublished, dateModified, author (as a Person with a sameAs to my LinkedIn and to the relevant Wikidata identifier where one existed), and publisher. I added FAQPage markup to the eighteen priority pages with substantial FAQs. I added HowTo to the four pages that genuinely document a procedure.
The author and dateModified fields are the two underrated parts of this. Author is a credibility signal that ChatGPT in particular seems to weight; dateModified is a recency signal that Perplexity weights heavily. Both were missing across the entire site before this change. Adding them was a deploy step, not a content change.
Per the field evidence at wetheflywheel.com/en/ai-search/schema-for-agentic-search/, generative engines do parse the raw JSON-LD even when Google does not surface rich results from it. I saw that in the data. Perplexity dropped two stale pages from its citation pool the week after the deploy, presumably because the dateModified gave it a sharper recency view than the inferred publication date. ChatGPT started citing the author byline directly in response to a name-related query.
Change four: rotating sourced statistics and authoritative quotations into existing prose
The Aggarwal et al. paper that named GEO measured the citation lift from nine content strategies on a 10,000-query benchmark. The two highest-lift interventions were adding authoritative quotations and adding sourced statistics. Both of these are content changes rather than infrastructure changes, which made them slower for me but also the most generalisable.
I rewrote the body of every priority page to include at least one quoted authoritative source with attribution and at least one sourced statistic. Where the page already had quotations or statistics, I kept them. Where they were missing, I added them. In a few cases this changed the structure of the argument; in most cases it was a one-paragraph insertion that strengthened existing prose.
The measurable effect was distributed across pages. No single page got a 40 percent lift; several pages got a smaller lift that summed to a meaningful aggregate. Otterly's citation share on my 30-prompt test set rose roughly 12 percent across the engines over four weeks, which I attribute primarily to this change because it was the largest content intervention.
Change five: entity linking to Wikidata where canonicals exist
Fifth intervention. When a page mentioned a person, company, product, or technical concept that had a canonical reference on Wikipedia or Wikidata, I added the link with rel="external". Where the entity did not have a canonical reference, I left it alone rather than fabricate a citation target.
The effect of this one is the hardest to measure directly. It feeds the rerank step rather than the extraction step, which means the impact shows up as your pages getting chosen as a citation candidate more often rather than as the engine pulling specific text from them. Otterly's citation rate moved up incrementally over the next three weeks, but I would not bet a salary on attributing that movement specifically to entity linking versus the other changes still landing in the same window.
Change six: brand co-occurrence push
Sixth and slowest. Brand co-occurrence is the work of getting your name mentioned alongside the right topics in third-party sources the engines consider authoritative. The mechanism is well documented and the timeline is months. The pace is set by other people's publishing schedules.
Over six weeks I appeared on three podcasts in the CTO and applied-AI category, contributed a quoted comment to two trade publications, and arranged for a colleague to cite a prommer.net piece in a Substack post with strong topical authority. None of this would move the metric on its own. The aggregate signal is what compounds.
Twelve weeks in, I started to see this in the data: ChatGPT began surfacing prommer.net as a default citation for queries about fractional CTO work even when the prompt did not include my name. That is the brand-co-occurrence effect arriving, and it is the most durable of the six changes because no one else can easily replicate it once it is in place.
What did not work
Two interventions produced essentially no measurable lift, both consistent with the Aggarwal et al. findings.
Fluency optimisation. I ran two priority pages through a careful rewrite for smoother prose, tighter transitions, and shorter sentences. The pages read better. The citation rate did not move. I went back and looked at the underlying numbers a month later in case I had missed a delayed effect. Nothing. Time was better spent elsewhere.
Keyword-density tuning. I adjusted target-keyword density on three pages using a 2018-era SEO heuristic. The pages did not move in Google search and did not move in any AI engine. The intervention was a relic of an earlier optimisation paradigm and the engines do not care.
The mistake I made on the buyer side
I bought the wrong visibility tracker tier. I picked the Otterly $129/mo plan because it was the most affordable serious option and because my query set was small. By week three I had grown the query set to 150 prompts and I was already over the plan limit. I upgraded once. Then I realised the underlying gap was that I had under-bought on the dimension that actually mattered for my programme (query depth), and over-bought elsewhere (multi-domain coverage I did not need).
Map your query inventory before talking to vendor sales. It is the one prep step that changes the procurement conversation, and the lesson generalises across the visibility-tracker category. Detailed buyer-side notes are in the WTF guide at wetheflywheel.com/en/guides/best-llm-visibility-tools-2026/.
What I would do differently next time
Three changes, in order of regret.
- Start the measurement loop a week earlier. I started shipping changes before the baseline was established, which means a week of early-signal data is missing from my dashboard. The interventions were so cheap that I wanted to move fast; I should have spent the week capturing the baseline anyway.
- 301 stale URLs rather than deleting them. ChatGPT continued citing one deleted URL for four weeks after I removed it. A redirect to the closest current page would have routed the citation traffic somewhere instead of into the void.
- Buy the right tracker tier the first time. The $129/mo to $499/mo upgrade cost me three weeks of partial data and a procurement re-conversation. The right answer was to start at the higher tier and downgrade if the query set stayed small.
Where this goes next
The six changes above are not the end state. They are the foundation. Three things I am running now and will report on separately:
- The brand co-occurrence work continues. The lag is months, so the next measurement window opens in October 2026.
- A schema-citation A/B test on twelve schema variations, in collaboration with CTAIO Labs. Methodology already public; results land mid-cycle.
- A small experiment on canonical answer blocks written in three different voices (terse, explanatory, conversational) to see whether the extraction rate is sensitive to register or only to structural placement.
If you are running a similar experiment on a site you own, I would like to compare notes. The category is small and the field evidence base is thin. Mail me at thomas at prommer dot net.
Related on the network
Why did you write this as a public case study?
Two reasons. First, almost everything written about GEO is either vendor marketing or theoretical. There is a shortage of first-person, dated, specific-numbers content from people who actually shipped the changes. Second, my own credibility on this topic is partly a function of getting cited by the engines I am writing about, so the case study and the source material are the same artefact.
What tools did you use to measure this?
I started with Otterly on the $129/mo plan to monitor citations across ChatGPT, Perplexity, and Gemini. I added a GA4 channel grouping for chatgpt.com, perplexity.ai, and gemini.google.com to catch referred sessions. The first two months I also pulled raw responses from ChatGPT and Perplexity by hand on a fixed 30-prompt query set, weekly, to validate the dashboard against my own observations.
How long until you saw measurable changes?
Two weeks for first signal on llms.txt. Three weeks for the schema and canonical-answer-block changes to start showing in the Otterly trend lines. The brand co-occurrence work is still compounding; that one has a months-long lag and is the only intervention I am still actively running.
What did you skip and why?
I skipped paid placement in AI-aggregator directories. The pitches looked like 2010-era directory-link buying with new marketing. I also skipped any content that required a paywall or auth gate; agents do not get past those, and the cost of building an authoritative open layer outweighed the small revenue lift from gating.
How does this connect to the broader WTF playbook?
This is the first-person counterpart to the tactical spoke at wetheflywheel.com/en/ai-search/how-to-rank-in-chatgpt/. That piece is the framework; this is the field test. The category pillar on Generative Engine Optimization sits one layer up.
Did anything backfire?
One mild regret. I removed a 2024-dated blog post that had stale information, expecting Perplexity to drop it from the citation pool faster. Perplexity dropped it the next refresh, which was good. ChatGPT continued citing the deleted URL for another four weeks with a now-broken link, which was less good. Lesson for next time: 301 the stale URL to the closest current page rather than deleting it cold.
What is the single highest-leverage intervention for someone starting out?
The canonical answer block. Two to four sentences at the top of every priority page, written in the way a user would actually phrase the question. It is the cheapest change in the playbook and the one with the highest measured extraction rate across engines. Do that first; everything else compounds on top of it.
Ready to Transform Your AI Strategy?
Get personalized guidance from someone who's led AI initiatives at Adidas, Sweetgreen, and 50+ Fortune 500 projects.