What technical changes does an agency need to make on a client's website for GEO?

Q: What technical changes does GEO require on a client website?

Six: valid JSON-LD schema on every page, an llms.txt index at the domain root, clean canonical URLs, server-rendered or pre-rendered body content, AI crawlers explicitly allowed in robots.txt, and meta descriptions in the 150-160 character band.

Q: What schema.org types matter most for GEO?

Article or WebPage and Organization on every page. Layer FAQPage on Q&A sections, HowTo on step guides, Product with Offer on pricing pages, and BreadcrumbList on navigation. Validate at validator.schema.org before shipping.

Q: Do I need to allow GPTBot and PerplexityBot in robots.txt?

Yes if you want those engines to cite the site. Many client sites block AI crawlers by default. Explicit Allow rules for GPTBot, PerplexityBot, ClaudeBot, Google-Extended, and CCBot are the standard setup.

Q: Does client-side JavaScript hurt GEO?

Yes, if the main content only appears after JS runs. Most AI crawlers do not reliably execute JavaScript. Use server-side rendering or static pre-rendering for any page meant to be cited.

Question

What technical changes does an agency need to make on a client's website for GEO?

Maciej Grabek · Accepted Answer

Six technical changes do most of the work: add valid JSON-LD schema to every page, publish an llms.txt index, fix canonical URLs, make sure content is server-rendered (or pre-rendered) rather than client-only JavaScript, allow AI crawlers in robots.txt, and write meta descriptions in the 150-160 character band. None of these are GEO-specific tricks. They are standard technical-SEO hygiene that AI engines happen to reward heavily. 1. Add valid JSON-LD schema to every page AI engines parse schema.org structured data to understand what a page is and what entity it describes. The minimum every page needs is an Article or WebPage object plus an Organization object identifying the publisher. A real example for a blog post on a client's site: Layer additional types where they fit: FAQPage on Q&A sections, HowTo on step-by-step guides, Product with Offer on pricing pages, BreadcrumbList on navigation. Validate every page with the Schema Markup Validator at validator.schema.org before shipping. 2. Publish an llms.txt at the domain root llms.txt is an emerging convention (proposed by Jeremy Howard, adopted by Anthropic and several others) that gives AI crawlers a clean index of your site's most important content. It lives at https://example.com/llms.txt and looks like: # Example Co > B2B email platform for ecommerce brands. ## Docs - [Getting started](https://example.com/docs/start): How to set up your first campaign - [API reference](https://example.com/docs/api): REST API endpoints and auth ## Guides - [Email deliverability](https://example.com/guides/deliverability): Improving inbox placement Not every engine reads it yet. It costs an hour to ship and is a leading-indicator signal: when engines start using it heavily, sites that already have one rank first. 3. Fix canonical URLs Every page needs a single canonical URL declared with . Trailing slashes, query parameters, and tracking pixels should all resolve to the same canonical. AI engines deduplicate citations by canonical, so a page with three URL variants splits its citation share three ways. 4. Server-render or pre-render the body content If the page's main text only appears after JavaScript runs, most AI crawlers will not see it. ChatGPT's browse tool and Perplexity's crawler execute some JS but not reliably. The fix is server-side rendering (Next.js, Remix, Rails, Django, ASP.NET MVC) or static pre-rendering (Astro, Hugo, Eleventy) for any page meant to be cited. Test it: curl -A "Mozilla/5.0 (compatible; PerplexityBot/1.0)" https://example.com/page and check whether the article text is in the response body. If you see an empty shell with a loading spinner, fix it. 5. Allow AI crawlers in robots.txt Many sites quietly block AI crawlers by default. The fix is explicit allow rules for the bots you want citing you: User-agent: GPTBot Allow: / User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: / User-agent: Google-Extended Allow: / User-agent: CCBot Allow: / Some clients have legal reasons to block specific bots (training-data concerns). Have that conversation explicitly rather than blocking by default. GPTBot is the search crawler for ChatGPT; Google-Extended controls Gemini training. Both can be allowed independently of the other. 6. Write meta descriptions in the 150-160 char band AI engines often lift the meta description as the snippet they cite. Too short (under 120 chars) and engines fall back to extracting an arbitrary sentence; too long (over 165) and it gets truncated mid-thought. Aim for one complete sentence, 150-160 characters, that answers the page's core question and ends with a specific hook. Anti-patterns to remove on inherited client sites Lazy-loaded article body. The first 500 words of a blog post should be in the initial HTML, not loaded on scroll. Author bylines that link to nowhere. AI engines treat anonymous content as lower-trust. Real Person schema with sameAs linking to a LinkedIn or X profile lifts citation rates noticeably in our scans. Pop-ups blocking content. Cookie banners and email modals that hide the article body during initial render confuse crawlers. Keep them out of the critical content area. Duplicate pages. A page reachable at both /product and /products/main splits authority. Pick one canonical and 301 the other.