Signals & Data Sources

Evidence-based analysis built on publicly observable data. Every signal is traceable to its source.

Data Sources

PrivacyFetch draws from four independent data sources to build each company assessment. No single source is relied on exclusively. When sources conflict, the analysis notes the tension.

Policy Scraping

Web scraping

Fetches privacy policies, terms of service, cookie policies, DPAs, subprocessor lists, and other legal documents from the company website. Raw HTML is converted to structured markdown for analysis.

Breach Monitoring

Public breach databases

Checks for publicly reported data breaches within the last 24 months. Breach recency and severity are factored into the overall assessment.

Tracker Detection

Live page scan

Scans the company website for advertising trackers, analytics services, session recording tools, social media pixels, and cookies. Results feed directly into the Tracking dimension.

AI Analysis

17 parallel extraction tasks

Reads the actual policy text to extract structured data: what data is collected, who it is shared with, what rights are offered, retention periods, AI training practices, deletion difficulty, dark patterns, and more.

What We Extract

Signals are organized by the scoring dimension they feed into. Each signal represents a specific, verifiable claim about the company's data practices.

Data Collection

Signal	Description
Data types collected	Biometric, health, behavioral, browsing history, location, financial, and other categories
Collection methods	How data is gathered (direct input, automatic tracking, third-party sources)
Sensitive data evidence	Direct quotes from the policy confirming collection of sensitive categories
Data type count	Total number of distinct data types identified, penalized when exceeding 10

Data Sharing

Signal	Description
Sells personal data	Whether the company sells personal information to third parties
Data broker indicators	Evidence of data broker relationships or broker-like practices
Advertiser sharing	Whether data is shared with advertising networks or ad partners
Partner count and names	Number and identity of data sharing partners and advertising partners
Subprocessor list presence	Whether a public list of data sub-processors is maintained
Sharing categories	Business partners, affiliates, vendors, and other sharing recipient types
Processing purposes	Targeted advertising, profiling, and remarketing as stated data processing purposes

Tracking

Signal	Description
Advertising tracker count and names	Specific ad trackers found on the website (Google Ads, Facebook Pixel, etc.)
Analytics tracker count	Number of analytics services detected beyond the 3-tracker threshold
Session recording tools	Tools like Hotjar, FullStory, or similar session replay services
Cookie types	Marketing, analytics, and essential cookies detected on the site
DNT/GPC support	Whether the site honors Do Not Track or Global Privacy Control signals
Social trackers	Social media tracking pixels and share button integrations
Cross-device tracking	Evidence of tracking users across multiple devices

Transparency

Signal	Description
Policy presence	Whether a publicly accessible privacy policy exists
Sections found	Number of standard policy sections identified (data collection, sharing, retention, rights, etc.)
Word count	Total policy length; readable policies are under 6,000 words, excessive policies exceed 10,000
Retention specificity	Whether specific data retention periods are stated vs. vague language
Contradictions	Inconsistencies found within the policy text, counted and detailed with evidence
DPA published	Whether a Data Processing Agreement is publicly available
Purposes stated	Whether the company explicitly states its data processing purposes

User Rights

Signal	Description
Rights listed	8 recognized rights: access, deletion, correction, portability, opt-out, withdraw consent, restrict processing, object to processing
Request channels	Number and types of channels for exercising rights (web form, email, in-app, postal)
Deletion difficulty	Scored 0–4 based on barriers, requirements, and friction in the deletion process
Data request form	Whether a structured form exists for submitting data access or deletion requests
Privacy email	Whether a dedicated privacy contact email address is published
Appeals process	Whether users can appeal denied requests

AI Practices

Signal	Description
Usage disclosure	Whether the company explicitly, partially, or does not disclose AI usage (yes/partial/no)
Training on user data	Whether personal user data is used to train AI models
Training on interactions	Whether user interactions (clicks, queries, behavior) are used for AI training
Training on public content	Whether publicly posted user content is used for AI training
Third-party AI sharing	Whether data is shared with external AI providers, and whether this is disclosed
Automated decisions	Whether AI is used for automated decision-making that affects users
Opt-out availability	Whether users can opt out of AI training on their data

Evidence-Based

Every signal extracted during analysis includes evidence: direct quotes from the policy text that support the finding. This means users can verify each assessment themselves by reading the original source material.

When the AI extraction identifies a practice (for example, that a company sells personal data), the specific passage from the privacy policy is preserved alongside the structured signal. Evidence is displayed on company profiles so that the assessment is never a black box.

Scoring factors also include evidence. When a penalty or bonus is applied to a dimension score, the factor entry records both a human-readable description and, where available, the supporting details or policy quote that triggered it.

Analysis Pipeline

Each company assessment follows a five-stage pipeline. The entire process runs asynchronously via background jobs.

Discover URLs

Locate the company's privacy policy, terms of service, cookie policy, DPA, and subprocessor list URLs.

Scrape policies

Fetch each document and convert to clean, structured markdown. Cookie tables and supplementary pages are appended.

Detect trackers

Scan the company's live website for advertising trackers, analytics services, session recording tools, and cookies.

Run 17 AI tasks

Execute extraction tasks in parallel: data partners, policy claims, tracking cookies, collected data types, policy summary, broker indicators, deletion difficulty, dark patterns, data purposes, international transfers, retention policies, subprocessors, company info, AI practices, terms core, terms financial, and terms legal.

Calculate score

Apply the 5-dimension scoring engine with weights, penalties, and bonuses to produce the composite 0–100 privacy score.

For a deeper look at each stage, see How Analysis Works.

Limitations

Signals are derived from publicly available documents and infrastructure. Internal data handling practices, employee training, or unpublished policies are not captured. Companies without a published privacy policy receive zeroed scores for Data Collection, Data Sharing, and Tracking dimensions to avoid false positives from missing data.

AI extraction is automated and conservative. When the system is uncertain about a finding, it records the uncertainty rather than making an unsupported claim. All signals should be considered one input among many when evaluating a company's privacy practices.