Workflow Structured Breakdown


STAGE 1: QUERY-DRIVEN DATA ACQUISITION

Process:

  • Uses a pre-defined set of structured search queries targeting specific themes (e.g. "Unholy Carnival and Twisted Clowns").
  • Queries are passed to the Google Custom Search API to fetch top-ranked URLs.

Type of Analysis:

  • SERP sampling
  • Thematic surfacing via controlled query design

SEO Relevance:

  • Bypasses the speculative phase of keyword ideation by sourcing real-world, high-ranking content.
  • Anchors all subsequent analysis in content deemed relevant by Google's ranking algorithm.

Compared to Best Practice:

  • Tools like SEMrush and Ahrefs rely on historical keyword databases and user-supplied seed terms, often tied to legacy trends or generic assumptions.
  • This workflow instead begins with live Google search results, making it directly aligned with how search engines currently evaluate content relevance, superior in freshness, context alignment, and bias resistance.

STAGE 2: CONTENT HARVESTING AND STRUCTURAL EXTRACTION

Process:

  • Scrapes allowed URLs, extracting structured data: title tags, meta descriptions, body text, header tags, image alt attributes.

Type of Analysis:

  • Content structure analysis
  • Element-level segmentation

SEO Relevance:

  • Mirrors how Google interprets page structure.
  • Enables the analysis of how key terms are deployed across different semantic fields (titles vs. body vs. metadata).

Compared to Best Practice:

  • SEMrush's Site Audit or Content Analyzer modules surface technical issues and general SEO markers, but lack context-aware parsing across layers of content.
  • This workflow captures the full on-page language environment, offering a deeper content signal that better reflects ranking context.

STAGE 3: SUMMARISATION AND NOISE REDUCTION

Process:

  • Applies both extractive (MiniLM) and abstractive (T5) summarisation models to condense raw text into thematic cores.

Type of Analysis:

  • Thematic distillation
  • Dimensionality reduction

SEO Relevance:

  • Filters out boilerplate and irrelevant filler, ensuring that subsequent keyword extraction focuses on substantive content.
  • Abstractive summaries can simulate meta description candidates or content overviews.

Compared to Best Practice:

  • Conventional platforms do not offer NLP-based summarisation. At best, tools like Clearscope and SurferSEO provide keyword density feedback.
  • This step improves signal clarity before keyword extraction, allowing higher precision-a fundamentally more intelligent preprocessing step than anything seen in current toolchains.

STAGE 4: KEYWORD AND KEYPHRASE EXTRACTION

Process:

  • Applies KeyBERT (embedding-based) and T5-based keyphrase models to extract ranked terms.
  • Uses POS tagging to filter by noun-only results for semantic precision.

Type of Analysis:

  • Contextual keyword extraction
  • Semantic ranking
  • Linguistic validation

SEO Relevance:

  • Captures both frequent and semantically salient terms.
  • Focus on nouns ensures topical integrity, avoiding verbs/adjectives with low search intent relevance.

Compared to Best Practice:

  • SEMrush, Ahrefs, and Ubersuggest identify high-volume terms and keyword difficulty scores, but they cannot detect context-specific or semantically emergent language.
  • This workflow identifies keywords as they actually appear in authoritative content-without dependency on historic keyword lists or databases, offering a clearer view of real-world usage.

STAGE 5: VISUALISATION AND RELATIONAL ANALYSIS

Process:

  • Bar charts of word frequencies.
  • Network graphs built from word co-occurrences, with community detection (Louvain) and centrality scoring.

Type of Analysis:

  • Lexical distribution
  • Co-occurrence mapping
  • Community structure analysis

SEO Relevance:

  • Reveals keyword clusters and latent topic groupings.
  • Identifies hubs, i.e. high-centrality terms that connect multiple topics-useful for internal linking or pillar content strategy.

Compared to Best Practice:

  • No traditional SEO platform offers network-based keyword visualisation.
  • This approach exposes semantic architecture, revealing not just what terms are used but how they interact-a dimension missing entirely in current SEO tooling.

STAGE 6: ASSOCIATION RULE MINING (APRIORI)

Process:

  • Applies Apriori algorithm to identify co-occurring keywords.
  • Filters results by confidence and lift, ensuring statistically meaningful associations.

Type of Analysis:

  • Pattern mining
  • Keyword set logic inference

SEO Relevance:

  • Identifies term bundles that appear together in authoritative content.
  • These associations can be used to inform sectioning, topic hierarchy, or semantic HTML structuring.

Compared to Best Practice:

  • Tools like SEMrush or Moz offer no support for mining associative keyword rules.
  • This workflow introduces quantitative logic-based relationships between keywords, producing actionable keyword groupings backed by statistical strength rather than assumed topical clusters.

STAGE 7: INTERACTIVE ANALYSIS AND MANUAL OVERRIDE

Process:

  • Dropdown widget allows user to select a column for focused analysis.
  • Visuals and Apriori results adjust dynamically based on user input.

Type of Analysis:

  • User-led exploratory filtering

SEO Relevance:

  • Supports targeted, iterative refinement - critical for strategy development across different content types (e.g. meta vs. body vs. summary).

Compared to Best Practice:

  • Most keyword research workflows are linear and predefined.
  • This workflow is exploratory by design, enabling deep dives and fluid pivoting-closer to investigative research than checklist SEO.

CRITICAL ASSESSMENT: ROBUSTNESS & STRATEGIC VALUE

Dimension This Workflow Standard Best Practice (e.g. SEMrush, Ahrefs, SurferSEO)
Seed Generation Empirical (via live SERPs) Speculative (via brainstorming/trends)
Bias Resistance High (Google-ranked + model filtering) Low�Moderate (human heuristics)
Granularity of Input High (multi-element extraction) Medium (titles/headings/body skims)
Thematic Resolution High (summarisation + NLP filtering) Low (manual or volume-based sorting)
Keyword Validity Controls Strong (POS + rule mining + co-occurrence) Moderate (search volume, CPC)
Relational Awareness Present (graphs, association rules) Absent
Automation & Scale High Medium
Strategic Integration Requires alignment with broader tools Native to SEMrush-type platforms

CONCLUSION

This workflow is not just a support mechanism-it is a methodological alternative to standard seed keyword generation. By grounding itself in observed discourse and empirical content mining, it:

  • Avoids input bias and trend lag
  • Surfaces contextually grounded, semantically valid keywords
  • Reveals non-obvious relationships between concepts

Compared to conventional tools like SEMrush, Ahrefs, or SurferSEO, which rely on historical search data and top-down seed expansion, this workflow is bottom-up, evidence-driven, and model-mediated. Its output is cleaner, more adaptable, and more semantically precise.

Its limitation-lack of direct search volume and CPC data-is a conscious trade-off, not a flaw. It is best deployed as a front-end research engine that feeds enriched, high-integrity keywords into tools that specialise in competitive metrics and market modelling.

In that capacity, it doesn't just enhance current SEO practice-it restructures its foundation.