Webvy.co

Claude Code for Programmatic SEO: Build a Controlled Engine, Not Just Pages

14 min read
Claude Code programmatic SEO engine

Claude Code can help teams scale programmatic SEO, but scale without control is a liability. Google’s March 2026 spam update reinforced the need to take spam policies seriously. For programmatic SEO, the relevant risk is scaled content abuse: publishing many pages mainly to manipulate rankings rather than help people.

Some third-party analyses after the 2026 updates reported severe losses for sites built around large volumes of weak AI-generated content. The safer takeaway is simpler: Google’s policy risk is volume without value, not AI use by itself. The production method was never the violation.

A controlled engine is built for that reality. It validates page patterns before keyword research, drafts only inside approved constraints, runs deterministic QA before AI grading, treats indexation as a publishing governor, and manages pages as a portfolio. One worked example runs through every stage below.

What separates a pSEO engine from a page generator

Page generation creates URLs from a keyword list and a template. A pSEO engine decides which URLs deserve to exist, when they are allowed to publish, and what happens to them afterward. That difference is the whole article, so it gets stated once, here, and then demonstrated.

The page lifecycle

Every page moves through a state machine:

page lifecycle
IDEA → RESEARCHED → SLOT_APPROVED → DRAFTED → QA_PASSED → GRADED
  → PR_OPEN → MERGED → SUBMITTED → INDEXED → PERFORMING
                              ↘ NOT_INDEXED (diagnose, don't republish)
PERFORMING ↘ WEAK → { REFRESH | CONSOLIDATE | NOINDEX | RETIRE }

Every transition writes an event row: page_id, from_state, to_state, evidence reference, actor, timestamp. That table earns its keep twice:

  • Audit trail: every page in production can show how it got there and who approved each gate.
  • Tuning dataset: after a few months it tells you which template families clear grading on the first pass, which fail QA on similarity, and which states pages get stuck in.

Where Claude Code fits

Claude Code fits this system because it is an agentic tool, not a writing interface. It reads the repository, edits files, runs commands, reads the error output, and revises against it. That makes it the right execution layer for templates, schema logic, internal-link rules, QA scripts, and PR workflows.

But Claude Code is the execution layer. The strategy is the control system around it.

The policy backdrop

The policy context shapes every gate that follows:

  • Scaled content abuse entered Google’s spam policies in March 2024, defined as creating many pages primarily to manipulate rankings with little or no value for users, whatever the production method.
  • The March 2026 spam update added no new policies; it was a SpamBrain enforcement improvement, and the fastest confirmed spam-update rollout in the Search Status Dashboard’s history, completed in under a day.

The lesson for pSEO teams: enforcement of existing policy is tightening, so the existing policy is the framework to build against.

Stage 0: the Data Moat Gate (before any keyword research)

Most pSEO failures are decided before the first keyword is pulled. If the template family has no injectable, hard-to-replicate data, every downstream gate is just slowing down the production of pages that should not exist.

Before research, each proposed family files a one-page Differentiation Charter. Here is the worked example we will carry through the article:

Family: AI SEO tool comparison pages (/compare/{tool-a}-vs-{tool-b})

Reader decision: a content lead choosing between two tools for a specific team size and workflow.

Unique data source: a maintained internal dataset: verified pricing pulled monthly, feature attributes tested hands-on, publishing and API limits confirmed against vendor docs, and a short editorial verdict written per tool by someone who used it.

Injection schema (required fields per page): pricing_tiers[], feature_matrix{}, api_limits{}, integrations[], editorial_verdict_a, editorial_verdict_b, last_verified_date.

Kill criterion: if T+28 indexation for the family drops below 50%, or two consecutive cohorts show flat impressions, the family is frozen and the weakest third is consolidated.

The gate question is not “can we rank for this?” It is the keyword removal test: delete the target keyword from the page. Is it still obviously useful to that content lead? A page that only works because the keyword sits in the title, URL, and H1 is a doorway with good manners.

One adversarial trick worth stealing: before approving a charter, run a separate Claude session whose only job is to argue the family violates Google’s doorway and scaled-content definitions, citing the policy text. If the charter survives a motivated attack, approve it. If it does not, you just saved an entire family’s worth of build cost.

The charter is not documentation; it becomes code. The injection schema is enforced by a validator that blocks drafting when fields are missing, and a downstream QA check fails any rendered page where injected data falls below a token-share threshold. Template-only pages become mechanically unbuildable.

Stage 1: research that produces evidence, not drafts

Research turns an idea into an evidence-backed candidate. For the comparison family:

  1. Seed and expand. Seeds like “ToolA vs ToolB,” “ToolA alternatives,” “best AI SEO tools for agencies.” Expand via keyword-ideas, related-keywords, and autocomplete endpoints (DataForSEO or equivalent; never scrape Google directly, since machine-generated queries to Search are themselves a spam-policy violation).
  2. Size and classify. Batch volume lookups, then intent classification. The comparison family only accepts commercial-investigation intent above a probability threshold. An informational query like “what is programmatic SEO” matching the family’s keywords still gets rejected: wrong intent, wrong template.
  3. Sample the SERP; don’t pull it for everything. SERP API calls are the expensive ones. Pull full SERPs for cluster representatives (the top 10 to 20 percent by priority), not every variant. Record result types: if the dominant results are listicles, a head-to-head comparison page is fighting the format. If an AI Overview plus answer box fully satisfies the query, deprioritize it, because there is no click left to win.
  4. Cluster with a hard rule: one cluster, one canonical page. “ToolA vs ToolB,” “ToolB vs ToolA,” and “ToolA or ToolB” are one page. Collisions with existing live URLs get auto-flagged as cannibalization risks before anyone drafts anything.

The output is a candidate record (cluster, intent, SERP features, proposed canonical URL, priority score, cannibalization flags), not a draft. IDEA → RESEARCHED.

Stage 2: slot approval, where the capacity check lives

A slot is an approved publishing position: canonical URL, target cluster, template family, parent path, internal-link plan, indexation rule, required data fields, approver. A worked slot record:

slot record
canonical_url:   /compare/toolname-a-vs-toolname-b
cluster:         [toola vs toolb, toolb vs toola, toola or toolb] (vol 880/mo)
family:          ai-seo-comparison
parent:          /compare/
inbound_links:   /best-ai-seo-tools/ , /compare/ , /reviews/toolname-a/   (≥3 required)
outbound_links:  2 sibling comparisons + parent category               (≥2 required)
indexation_rule: index
required_data:   all 7 injection-schema fields, last_verified ≤ 60 days
approved_by:     [reviewer], 2026-06-02

The slot gate blocks: duplicate clusters, parameterized or faceted URLs in the indexable set, orphans (no inbound link plan), pages whose data payload is incomplete or stale, and, critically, any slot that exceeds this week’s publishing capacity (covered in Stage 7). Slots queue; they do not overflow.

The rule is absolute: no approved slot, no draft. Slot approval is also the cheapest human gate in the whole engine. A reviewer can clear a batch of 20 slot records in minutes, versus reviewing 20 finished pages.

Stage 3: drafting with Claude Code, under constraints that actually bind

Claude Code receives a bounded job, not an ambition:

  • one approved slot
  • a pre-validated data payload
  • template rules
  • repository write boundaries
  • a QA command
  • an exit condition

Three mechanics matter more than the prompt.

Evidence-surfacing completion conditions

Claude Code’s /goal runs the agent across turns until a separate small evaluator model judges the condition met. But that evaluator only sees the conversation transcript; it cannot run commands or read files itself. So a condition like “the page is publishable” is unverifiable and will either clear prematurely or loop forever. The condition must require evidence to be printed:

completion condition
/goal `npm run qa -- --page toolname-a-vs-toolname-b` has been executed and
its full JSON report, printed to the terminal, shows "status":"PASS" with
zero failed checks, and the file exists at
content/compare/toolname-a-vs-toolname-b.mdx

The QA script is the oracle. The goal condition just forces its output into the transcript where the evaluator can see it.

Allowed-diff enforcement via hooks

Hooks run at fixed lifecycle points, which is exactly what you want: rules that fire every time, not rules the model chooses to remember.

  • PreToolUse hook: blocks any file write outside content/compare/, public/data/compare/, and the sitemap manifest. Auth, infra, deployment, and config paths are untouchable, deterministically, not because the model was asked nicely.
  • PostToolUse hook: auto-runs the formatter and frontmatter validation after every content write.

Worktree isolation for parallel batches

Each batch runs in its own git worktree, so concurrent jobs never collide on files. One caveat: worktrees in the same repo share Claude’s project auto-memory. That is useful for shared conventions, but batch-specific state belongs in your metrics store, not in memory.

Model routing and unit economics

Route work by what it actually demands:

  • Premium frontier model: orchestrates the batch (plans, delegates, triages failures, manages the multi-hour run) and handles escalations. It should not draft page #847.
  • Mid-tier model: drafts standard pages.
  • Small model: bulk classification and routine grading.

Two cost rules follow:

  1. Hold model and effort constant within a batch. Prompt-cache reads bill at roughly 10 percent of standard input rates, and switching model or effort mid-run invalidates the cache.
  2. Track cost per published-and-indexed page, not cost per draft. Drafts that never index are pure loss, which is also why the capacity governor in Stage 7 feeds back into economics.

Stage 4: deterministic QA before any AI opinion

The first serious quality gate should be one nobody can argue with. One command, one machine-readable report:

CheckRule
Build / types / lintzero errors
Frontmatter + injected datavalid against the family's schema; last_verified within freshness window
Injection ratioat or above a threshold share of rendered tokens derived from injected data; the mechanical anti-doorway gate
SimilarityMinHash/SimHash distance vs. all live pages and same-batch siblings below family threshold
Structured dataonly family-approved types, valid JSON-LD (for generic pSEO that means Article or Product where truly applicable plus BreadcrumbList)
Linksat least 3 planned inbound, at least 2 outbound, zero broken
Render + performancepreview renders clean; Lighthouse assertions on a per-batch sample
Diff policyre-verified in CI, independent of the hook layer

A sample failing report. This is what Claude Code revises against:

json
{
  "page": "toolname-a-vs-toolname-b",
  "status": "FAIL",
  "checks_failed": [
    {"check": "injection_ratio", "value": 0.31, "min": 0.40},
    {"check": "similarity", "vs": "/compare/toolname-a-vs-toolname-c", "score": 0.87, "max": 0.80}
  ],
  "checks_passed": 11
}

That second failure is the one to respect. When two sibling comparisons are 87 percent similar, the fix is usually not “rewrite one.” It is evidence the data moat is too thin to support both pages, and the right move may be consolidating the cluster. Deterministic QA does not just catch bad pages; it surfaces bad patterns early.

Three consecutive QA failures on a slot escalate it to the stronger model. Failure again parks the slot and logs the pattern. The rule: scripts check what can be measured; AI graders judge only what requires interpretation, and only after the scripts pass.

Stage 5: a verifier that never meets the maker

The page builder must not be the page judge. The grader runs in a separate context and receives only artifacts: the rendered page, the target cluster and intent label, the SERP snapshot from research, the QA report, and three sibling pages from the same family. It never sees the drafting prompt, the revision history, or the maker’s reasoning. Self-approval is the quietest failure mode in agentic pipelines, and context isolation is the cure.

The rubric, scored 1 to 5, pass requires every item at 4 or above:

  1. Originality. Does the page contain information absent from the SERP snapshot?
  2. Keyword removal test. Is the value self-evident without the target phrase?
  3. Intent fit. Does it serve the dominant intent the SERP sample showed?
  4. Decision value. Can the charter’s named reader decide faster after reading?
  5. Sibling distinctiveness. Sampled against 3 family siblings, does a reader learn something new here?

A worked scorecard for our example page: originality 4 (verified pricing plus tested API limits not present in any SERP result), keyword removal 5, intent fit 4, decision value 4, sibling distinctiveness 3. Fail. The verdict notes the feature matrix is near-identical to the ToolA-vs-ToolC page. Routed back with instructions to lead with the two genuinely differentiating attributes, not the shared matrix. That is the rubric doing portfolio work, not copyediting.

One discipline that keeps graders honest over months: maintain a golden set of roughly 20 hand-labeled pages (clear passes and clear fails) and re-grade it weekly. If grader-human agreement drifts below your threshold, fix the rubric or the grader model before any new pages pass. An uncalibrated grader degrades into a rubber stamp without anyone noticing.

Stage 6: PR and human review, concentrated where it pays

Pages reach production through pull requests only. Each batch PR carries the diff, the QA JSON reports, the grader scorecards, preview URLs, and the batch’s capacity accounting (slots consumed versus available). Deployment runs through a protected environment with required reviewers, so nothing ships on model judgment alone.

Human attention is deliberately concentrated rather than spread thin:

  • 100 percent review for a new template family’s first three batches
  • every page flagged YMYL, legal, or brand-sensitive
  • a random sample of routine pages (10 to 20 percent works as a starting rate)
  • every consolidation, redirect, or retirement proposed by the portfolio loop

The reviewer’s job is not copyediting. The deterministic gates and the grader already did that work. The reviewer checks what machines cannot: brand judgment, legal exposure, and whether the batch as a whole still matches the approved charter. GRADED → PR_OPEN → MERGED.

Stage 7: indexation as the governor

Publishing is not the finish line; it is where Google starts answering back. The post-publish loop:

  1. Regenerate the sitemap with canonical URLs only. Update <lastmod> only on substantive change. Google uses lastmod only when it is consistently and verifiably accurate, and an engine that lies with it teaches Google to ignore it.
  2. Submit via the Sitemaps API; poll acceptance.
  3. Poll the URL Inspection API at T+3, T+7, and T+14 per new URL. Know the quota cold: 2,000 inspections per day per property, 600 per minute. That is enough for cohort sampling, not full-site monitoring at scale, so sample new cohorts rather than inspecting everything. And remember the API returns indexed state, not a live test.
  4. Route on the result:
    • Indexed, Google-selected canonical is yours: INDEXED, into performance monitoring.
    • “Crawled, currently not indexed” persisting across a cohort at T+14: treat it as a quality or demand signal first, not merely a submission problem. Google lists several possible causes, including duplication, canonicalization, internal-link weakness, and crawl prioritization, so diagnose before resubmitting. Either way, the engine’s response is the same: route to diagnosis and decrement family capacity.
    • “Discovered, currently not crawled” persisting: a crawl-demand signal: weak internal links, weak site authority for the section, or crawl waste elsewhere. Strengthen links; freeze new slots for the family until it clears. Google’s own guidance says crawl budget is a real constraint mainly for sites with 1M+ pages, or 10k+ rapidly changing pages, or a large share of URLs stuck in Discovered. For everyone else, persistent non-indexing points to demand and quality.
    • Canonicalized to a different URL: Google overrode you because duplicate-consolidation evidence pointed elsewhere. Auto-open a consolidation ticket; you cannot force a canonical onto substantively duplicate pages.
  5. Do not route generic pSEO pages through the Indexing API. Google restricts it to JobPosting and BroadcastEvent pages, and case studies recommending it for bulk submission are recommending a policy violation.

Now the part most pSEO systems skip: indexation results set next week’s publishing capacity.

formula
capacity(week) = base_quota × indexation_factor × performance_factor + pruning_credit

Starting policy that has the right shape:

Rolling conditionEffect
T+14 indexation at or above 80% (last 100 URLs) and newest-cohort impressions risingcapacity ×1.25, capped at +25% per week
Indexation 50 to 80%hold
Indexation below 50%capacity ×0.5, auto-open diagnosis
Indexation below 30%, or any manual action, or site-wide impressions down 25% in 14 dayskill switch: capacity = 0; generation jobs blocked at the hook layer until a human re-enables

Start base_quotasmall: 10 to 25 pages per week on an established domain, 5 to 10 on a young one. An engine that runs this loop inverted (thousands of pages first, quality signals never) is exactly the pattern spam enforcement is built to catch. The governor makes “has this domain earned more pages?” an empirical question the engine answers weekly instead of a guess made once.

Stage 8: the portfolio loop, where pruning is a quota

Pages are monitored by template family and launch cohort, not just URL. A weak page in isolation is noise; a weak cohortis a verdict on the family’s data source, template, or internal linking.

Weak-page triggers

Initial heuristics; tune against your own event data after a quarter:

  • not indexed at T+14
  • CTR materially below family peers at similar average position over 28 days
  • stuck at position 8 to 20 on a meaningful cluster for 2 to 3 weeks with no movement
  • clicks and impressions decaying over 6 to 8 weeks alongside softening rank
  • near-zero impressions for 60 to 90 days while duplicating another canonical asset

Measurement windows

Always use rolling windows, never day-zero reactions. Search Analytics data arrives with a 2 to 3 day lag, and broad API queries return top rows rather than every row, so over-precise long-tail reads are a trap too.

  • 7 days: early signal.
  • 28 days: trend confidence.
  • 60 to 90 days: retirement decisions.

Remediation hierarchy

Work down this list in order. Each action is logged as an experiment with before/after windows, so over time you learn which fix actually works per family.

  1. Consolidate or redirect. The first choice when pages overlap or split signals.
  2. Refresh with new injected data. Updated pricing, new benchmark rows, a revised verdict. Never a cosmetic rewrite of the same information.
  3. Noindex. The page must stay crawlable for the directive to be seen.
  4. Retire. Remove the page and update the redirect map.

The pruning quota

This is our recommendation rather than a Google rule: in a controlled pSEO system, pruning should sit beside publishing, because weak pages create crawl, indexation, and portfolio-quality drag. Every weekly cycle carries a pruning-review quota alongside the publish quota. Net page delta (published minus consolidated or retired) belongs on the dashboard; it should trend small and positive, never large.

Worked example: closing the loop

The first comparison-family cohort of twenty pages reports:

  • T+14: 60 percent indexation; twelve pages indexed and gaining impressions.
  • T+28:five pages still in “Crawled, currently not indexed”; three canonicalized into near-siblings.

The engine’s response:

  1. Consolidate the three canonicalized pages. The similarity warnings from Stage 4 predicted exactly this.
  2. Refresh the five non-indexed pages with newly verified pricing data and stronger inbound links from the indexed winners.
  3. Hold capacity flat. The 60 percent T+14 reading sits in the hold band of the Stage 7 table.
  4. Run the kill-criterion check. The Stage 0 charter set the bar at 50 percent T+28 indexation; the family passes at 60 percent, so it survives but earns no new slots until the next cohort reads above 80 percent.

That is the difference between an engine and a page factory: the system metabolized the feedback.

The five numbers on the weekly dashboard

  1. T+14 indexation rate (rolling last 100 URLs): the governor input and the earliest risk signal.
  2. Cost per published-and-indexed page, by model tier.
  3. Newest-cohort 28-day impression trend vs. the previous cohort.
  4. Grader-human agreement on the golden set.
  5. Net page delta.

If any of the five is red, the engine’s next action is diagnosis, not generation.

FAQs

How is Claude Code different from an AI writing tool for pSEO?

It operates inside the repository: reading files, running your QA command, parsing the failure report, and revising against evidence. That makes it suitable for the parts of pSEO that are actually engineering: templates, schema logic, link rules, and diff-controlled PRs, with hooks and completion conditions providing enforcement a chat interface cannot.

What's the single highest-leverage gate?

The injection-ratio check backed by a real data source. It converts 'don't publish thin pages' from an editorial aspiration into a build failure. Every other gate filters bad pages; this one makes a whole class of them impossible to produce.

How many pages should an engine publish per week?

Start at 10 to 25 on an established domain (5 to 10 on a young one) and let observed T+14 indexation move the number. Capacity growth capped at +25% per week, halved below 50% indexation, zeroed below 30%. Any fixed number that ignores indexation feedback is a guess wearing a spreadsheet.

Is "Crawled, currently not indexed" a technical problem?

Sometimes, but when it persists across a cohort, treat it as a quality or demand signal first. The causes Google and practitioners document include duplication, canonicalization conflicts, weak internal linking, and crawl prioritization. Diagnose the pattern; resubmitting an unchanged page asks Google the same question and usually earns the same answer.

Should I use the Indexing API to speed up pSEO indexation?

No. Google restricts it to JobPosting and BroadcastEvent pages. For everything else, indexation pressure comes from sitemaps, internal links from already-indexed pages, and pages worth indexing. Use the URL Inspection API (2,000 per day per property) to measure index state on sampled cohorts, not to force it.

What counts as a real data moat for programmatic SEO?

Data that costs effort to produce and maintain: verified pricing pulled on a schedule, hands-on feature testing, benchmarks you ran, first-party usage data, or expert verdicts. Public API data everyone has, paraphrased SERP content, and 'AI will research it' do not qualify, because competitors can replicate them in an afternoon.

Why separate the maker from the verifier?

The drafting model carries context about its own trade-offs and fixes, which biases it toward approval. A grader in a separate context, seeing only the rendered page, QA report, SERP snapshot, and sibling pages, judges evidence instead of intentions. Calibrate it weekly against a hand-labeled golden set so it never drifts into rubber-stamping.

How do I apply this engine to an existing pSEO site with thin pages?

Run the portfolio loop first, not the generation loop. Inventory existing pages by template family, pull 28-day performance and index states, and route weak pages through consolidate, refresh, noindex, or retire before publishing anything new. Pruning debt first usually improves the indexation rate that governs how fast you can publish later.

How long before I can judge whether a template family works?

Expect a verdict in roughly one to two months per cohort. T+14 indexation gives the first reliable read, 28-day windows confirm impression trends, and 60 to 90 days supports retirement decisions. Judge the cohort, not individual pages, and check it against the kill criterion written in the family's charter.

When does it make sense to bring in outside help?

When you need the control system, not just the content: charter design, capacity governance, QA and grading architecture, and Claude Code workflow engineering. That systems layer is what Webvy builds for teams running programmatic SEO at scale.

Share
Webvy
Written by
Webvy
GEO & AI Visibility

Webvy helps brands become the default source AI cites. We combine technical strategy, content engineering, and entity optimization to drive visibility across every generative search platform.

Related Articles

Build a controlled pSEO engine

We design the control system: charter design, capacity governance, QA architecture, and Claude Code workflow engineering for programmatic SEO at scale.

Let's talk