All articles
Technical ImplementationDecember 23, 20256 min read

How to Architect a Headless Discovery Engine (Code-First)

58% of searches now happen outside Google. Learn how to architect a headless discovery engine using PostGIS, Python adapters, and Go workers to synchronize local data across fragmented platforms.

Share

The Fragmentation of Discovery (And Why HTML Is Not Enough)

For the last decade, "discovery engineering" was synonymous with Search Engine Optimization (SEO). The architectural pattern was simple: render performant HTML, inject Schema.org JSON-LD, and wait for the Googlebot to crawl, index, and rank. Latency was measured in days (crawl budget), not milliseconds.

That architecture is now failing. With 58% of "Best Restaurant" searches originating outside of Google—fragmented across TikTok, Instagram, Apple Maps, and vertical-specific apps like OpenTable or Yelp—the passive "wait-for-crawl" model creates a massive consistency gap. If a restaurant changes its opening hours or menu pricing, relying on Google's crawler means over half your potential traffic sees stale data on other platforms.

We cannot solve this with meta tags. We need to shift from a passive pull architecture (crawlers reading HTML) to an active push architecture (systems broadcasting state changes). We need to treat local business data not as static content, but as a distributed state synchronization problem.

This post outlines the architecture of a Headless Discovery Engine—a system designed to maintain strict data consistency across a fragmented ecosystem of third-party APIs, treating external platforms (like Maps or Social) as read replicas of our internal source of truth.

Architectural Pattern: The Fan-Out Sync Engine

To tackle the 58% of traffic occurring outside the browser, we must treat external platforms as downstream consumers of our data events.

The system consists of three core components: 1. The Source of Truth (SoT): A normalized relational database storing the "golden record" of location data (hours, menus, lat/long). 2. The Event Bus: A message broker (Kafka or RabbitMQ) that decouples the data mutation from the syndication logic. 3. The Adapter Layer: A set of isolated workers that consume events, transform the payload into platform-specific schemas (e.g., Google My Business API, Meta Graph API, Apple Maps Server-to-Server), and handle the "push" logic.

1. Defining the Golden Record (PostGIS)

The foundation is a rigid schema. While NoSQL offers flexibility, location data requires strict referential integrity and spatial indexing. We use PostgreSQL with the PostGIS extension to handle geospatial queries efficiently.

Here is the DDL for a robust location entity that supports the complexity of modern discovery (multiple delivery providers, granular hours):

  • - Enable PostGIS for spatial queries

CREATE EXTENSION IF NOT EXISTS postgis;

CREATE TABLE locations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), internal_id VARCHAR(50) UNIQUE NOT NULL, name VARCHAR(255) NOT NULL, description TEXT,

  • - Spatial index for "near me" internal lookups

geom GEOMETRY(Point, 4326) NOT NULL, address JSONB NOT NULL, -- { "street": "...", "zip": "..." }

  • - Versioning for conflict resolution

version INT DEFAULT 1, last_updated_at TIMESTAMPTZ DEFAULT NOW() );

CREATE TABLE operating_hours ( location_id UUID REFERENCES locations(id), day_of_week INT CHECK (day_of_week BETWEEN 0 AND 6), open_time TIME NOT NULL, close_time TIME NOT NULL,

  • - Handling "Split hours" (e.g., closed for lunch break)

period_index INT DEFAULT 1, PRIMARY KEY (location_id, day_of_week, period_index) );

  • - Index for geospatial lookups

CREATE INDEX idx_locations_geom ON locations USING GIST (geom);

This schema forces data integrity before it ever hits the API layer. We use a version column to handle potential race conditions if multiple internal services attempt to update the same location simultaneously.

2. The Transformation Layer (Adapter Pattern)

The core complexity lies in translation. Google expects Schema.org/Restaurant. Apple Maps expects a specific JSON format via their Business Connect API. TikTok and Instagram don't have direct "location update" APIs in the same sense, but they consume data via Profile updates or linked landing pages that must render Open Graph tags dynamically.

We implement an Adapter Pattern in Python (using Pydantic for validation) to transform our internal Location model into external payloads.

from pydantic import BaseModel, Field from typing import List, Optional from datetime import time

# Internal Model class InternalLocation(BaseModel): name: str latitude: float longitude: float description: str cuisine_type: List[str]

# Abstract Base Class for Adapters class PlatformAdapter: def transform(self, location: InternalLocation) -> dict: raise NotImplementedError

# Google My Business (GMB) Adapter class GoogleMapsAdapter(PlatformAdapter): def transform(self, location: InternalLocation) -> dict: return { "languageCode": "en", "storeCode": location.internal_id, "title": location.name, "latlng": { "latitude": location.latitude, "longitude": location.longitude }, "category": { "primaryCategoryId": self._map_category(location.cuisine_type[0]) }, "regularHours": self._format_gmb_hours(location.hours) }

def _map_category(self, internal_cat: str) -> str: # Map internal taxonomy to GMB Category IDs (gcid) taxonomy_map = {"pizza": "gcid:pizza_restaurant", "cafe": "gcid:coffee_shop"} return taxonomy_map.get(internal_cat, "gcid:restaurant")

# Schema.org Adapter (for the "Headless" Landing Page) class JSONLDAdapter(PlatformAdapter): def transform(self, location: InternalLocation) -> dict: return { "@context": "https://schema.org", "@type": "Restaurant", "name": location.name, "geo": { "@type": "GeoCoordinates", "latitude": location.latitude, "longitude": location.longitude }, "servesCuisine": location.cuisine_type }

This decoupling allows us to add new "endpoints" (e.g., a new vertical delivery app) without modifying the core domain logic or the database schema.

3. Asynchronous Sync Workers (Go)

Pushing updates to third-party APIs is I/O bound and prone to failures (rate limits, timeouts). We cannot perform these writes synchronously within the HTTP request that updates the database.

Instead, we use Go for the worker layer. Go’s concurrency model (Goroutines) is ideal for fanning out HTTP requests to multiple downstream APIs simultaneously while managing retries independently.

The following Go code demonstrates a worker consuming a "LocationUpdated" event and broadcasting it.

package main

import ( "context" "encoding/json" "log" "sync" "time" )

type LocationEvent struct { LocationID string `json:"location_id"` EventType string `json:"event_type"` // e.g., "UPDATE", "DELETE" Payload map[string]interface{} }

// PlatformPublisher defines the interface for 3rd party APIs type PlatformPublisher interface { Publish(ctx context.Context, data map[string]interface{}) error Name() string }

func Worker(event LocationEvent, publishers []PlatformPublisher) { var wg sync.WaitGroup ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel()

for _, p := range publishers { wg.Add(1) go func(pub PlatformPublisher) { defer wg.Done()

// Simple exponential backoff retry logic for retries := 0; retries < 3; retries++ { err := pub.Publish(ctx, event.Payload) if err == nil { log.Printf("Successfully synced to %s", pub.Name()) return } log.Printf("Failed to sync to %s: %v. Retrying...", pub.Name(), err) time.Sleep(time.Duration(2^retries) * time.Second) }

// Dead Letter Queue logic would go here log.Printf("Permanent failure syncing to %s", pub.Name()) }(p) }

wg.Wait() }

This pattern ensures that a slow response from the Yelp API does not block the update to Apple Maps or Google. It guarantees eventual consistency across the distributed ecosystem.

Handling Consistency and "Stale Reads"

In distributed systems, we often trade consistency for availability. However, in discovery commerce (e.g., ordering food), stale data causes transaction failures. If a user on Instagram sees a "Open Now" badge but the restaurant is actually closed, you lose a customer.

To mitigate this, we implement a Read-Repair mechanism.

The Feedback Loop We cannot blindly trust that the API accepted our write and displayed it correctly. APIs like Google My Business often have a "pending review" state where changes aren't public immediately.

We implement a scheduled scraper/verifier that queries the public facing side of these platforms (or uses their "Get" API) to verify the state matches our SoT.

  • SoT: Closes at 9:00 PM.
  • Google Maps (Public): Closes at 10:00 PM (Mismatch).
  • Action: Alerting system triggers a high-priority re-sync event.

Performance and Trade-offs

Latency vs. Rate Limits The "Fan-Out" architecture introduces massive API overhead. If you manage 1,000 locations and update menus daily, you might hit API quotas on Yelp or Tripadvisor.

  • Mitigation: Implement debouncing. If a location manager changes the description, then fixes a typo 10 seconds later, we should only send one event to the downstream APIs. We buffer events for 60 seconds before flushing to the workers.

Source Reliability The 58% of non-Google traffic often comes from platforms with less mature APIs. Some vertical apps may lack webhooks or reliable PUT endpoints.

  • Mitigation: For platforms without write APIs, we generate a high-fidelity "Micro-Landing Page" (hosted on our infrastructure) and link to it from the profile bio. This page becomes the dynamic endpoint, rendering the JSON-LDAdapter output from the code above.

Retrospective

Building a headless discovery engine acknowledges that "Search" is no longer a single destination. It is a fragmented layer of the internet. By moving the complexity of synchronization to the backend—using strict PostGIS schemas, adapter-based transformation, and concurrent Go workers—we ensure that our data is accurate whether the user is searching on a map, scrolling a feed, or asking an AI assistant.

The future of SEO isn't about keywords in HTML tags; it's about API uptime and structured data propagation.

See it in action

Ready to see what AI says about your business?

Get a free AI visibility scan — no credit card, no obligation.