How Analysis Works

From a domain name to a structured privacy profile — here's what happens under the hood.

The Analysis Pipeline

When a company is analyzed (either by our automated crawl queue or on-demand), the process follows a structured pipeline. Each stage builds on the previous one.

1

Policy Discovery

We start with the company's domain and look for privacy-related pages. We check sitemaps, scan the homepage footer and header for links, match URL patterns (/privacy, /terms, /cookie-policy), and use AI to detect hub pages or language redirects. The goal is to find every relevant legal document: privacy policy, terms of service, cookie policy, DPA, security page, and subprocessor list.

2

Document Retrieval

Each discovered URL is fetched and converted to clean, structured markdown. We handle JavaScript-rendered pages, multi-language redirects, and cookie consent walls. Content is deduplicated by hash — if the policy hasn't changed since the last crawl, we skip re-processing and keep the existing version.

3

Tracker Detection

We scan the company's live website for third-party scripts, tracking pixels, analytics services, session recording tools, and advertising networks. This happens independently of the policy text — it's what the website actually does, not what it claims to do.

4

Breach Check

We check the company's domain against public breach databases to see if there have been any publicly reported data breaches. Recent breaches (within 24 months) are flagged as a signal.

5

AI Analysis

The policy documents are processed by 17 specialized AI extraction tasks running in parallel. Each task focuses on a specific aspect of the policy (see below). The AI extracts structured data and provides evidence quotes — direct excerpts from the policy text that back up each finding.

6

Score Calculation

All extracted signals are fed into the scoring engine, which calculates the composite privacy score across 5 weighted dimensions. Red flags are identified. The final score and breakdown are stored and displayed on the company profile.

AI Extraction Tasks

During step 5, the following specialized extraction tasks run in parallel. Each one reads the relevant policy documents and returns structured data.

TaskWhat It Extracts
Data PartnersIdentifies third parties the company shares data with — advertisers, analytics providers, data brokers, affiliates, and service providers.
Tracking CookiesExtracts cookie names, providers, categories (marketing, analytics, functional), purposes, and retention periods from cookie policies.
Collected Data TypesLists every type of personal data the company collects — identity, contact, behavioral, biometric, financial, location, and more.
Policy SummaryGenerates a concise 2–3 sentence summary of the privacy policy highlighting the most important points.
Claims & SignalsCross-references what the policy claims versus what the website actually does. Detects contradictions and consistency issues.
Data Broker IndicatorsAssesses whether the company exhibits data broker characteristics — selling data, aggregating from multiple sources, offering opt-out registries.
Deletion DifficultyScores how hard it is for a user to delete their data (0–4 scale), identifies specific barriers like verification requirements, waiting periods, or partial-deletion limitations.
Dark PatternsDetects manipulative UX patterns in policy language — misleading consent flows, hidden opt-outs, confusing toggles.
Data PurposesCategorizes why data is processed: core service, security, marketing/advertising, analytics, and legal/compliance.
International TransfersIdentifies where data is transferred outside the EU/EEA, which countries, and what legal mechanisms are used (SCCs, adequacy decisions, etc.).
Retention PoliciesExtracts general and per-data-type retention periods — how long data is kept, what triggers deletion, and whether retention is specific or vague.
SubprocessorsLists third-party processors/vendors used by the company, their purpose, country, and what data types they handle.
Company InfoExtracts the legal entity name, headquarters country, DPO email, contact information, and GDPR/CCPA compliance indicators.
Industry ClassificationClassifies the company into one of 36 industry categories based on what the company does.
AI PracticesAnalyzes whether the company uses AI/ML, trains on user data, shares data for third-party AI, discloses AI usage, and offers opt-outs.
Terms (Core)Extracts key terms-of-service clauses: termination rights, content ownership, license scope, unilateral change policies.
Terms (Financial)Extracts billing practices: auto-renewal, free trial terms, refund availability, cancellation methods.

Policy Versioning

Every time we fetch a policy document, we compute a SHA-256 hash of its content. If the hash differs from the previously stored version, we save a new version automatically. This means:

  • You can see the full version history of a company's privacy policy.
  • Changes between versions are tracked and can be compared.
  • Our analysis always reflects the most recent version available.
  • Duplicate fetches (same content, different date) are deduplicated — no noise.

Evidence-Based Analysis

Every signal we extract includes an evidence quote — a direct excerpt from the source policy. This means:

  • You can verify any finding by reading the original text yourself.
  • Companies can see exactly why they received a particular signal or score factor.
  • If the AI misinterpreted something, the evidence makes it obvious — and companies can submit a correction.

Evidence quotes are shown on company profile pages, in the score breakdown, and in the dashboard for claimed companies.