Signals & Data Sources
Evidence-based analysis built on publicly observable data. Every signal is traceable to its source.
Data Sources
PrivacyFetch draws from four independent data sources to build each company assessment. No single source is relied on exclusively. When sources conflict, the analysis notes the tension.
Policy Scraping
Web scrapingFetches privacy policies, terms of service, cookie policies, DPAs, subprocessor lists, and other legal documents from the company website. Raw HTML is converted to structured markdown for analysis.
Breach Monitoring
Public breach databasesChecks for publicly reported data breaches within the last 24 months. Breach recency and severity are factored into the overall assessment.
Tracker Detection
Live page scanScans the company website for advertising trackers, analytics services, session recording tools, social media pixels, and cookies. Results feed directly into the Tracking dimension.
AI Analysis
17 parallel extraction tasksReads the actual policy text to extract structured data: what data is collected, who it is shared with, what rights are offered, retention periods, AI training practices, deletion difficulty, dark patterns, and more.
What We Extract
Signals are organized by the scoring dimension they feed into. Each signal represents a specific, verifiable claim about the company's data practices.
Data Collection
| Signal | Description |
|---|---|
| Data types collected | Biometric, health, behavioral, browsing history, location, financial, and other categories |
| Collection methods | How data is gathered (direct input, automatic tracking, third-party sources) |
| Sensitive data evidence | Direct quotes from the policy confirming collection of sensitive categories |
| Data type count | Total number of distinct data types identified, penalized when exceeding 10 |
Data Sharing
| Signal | Description |
|---|---|
| Sells personal data | Whether the company sells personal information to third parties |
| Data broker indicators | Evidence of data broker relationships or broker-like practices |
| Advertiser sharing | Whether data is shared with advertising networks or ad partners |
| Partner count and names | Number and identity of data sharing partners and advertising partners |
| Subprocessor list presence | Whether a public list of data sub-processors is maintained |
| Sharing categories | Business partners, affiliates, vendors, and other sharing recipient types |
| Processing purposes | Targeted advertising, profiling, and remarketing as stated data processing purposes |
Tracking
| Signal | Description |
|---|---|
| Advertising tracker count and names | Specific ad trackers found on the website (Google Ads, Facebook Pixel, etc.) |
| Analytics tracker count | Number of analytics services detected beyond the 3-tracker threshold |
| Session recording tools | Tools like Hotjar, FullStory, or similar session replay services |
| Cookie types | Marketing, analytics, and essential cookies detected on the site |
| DNT/GPC support | Whether the site honors Do Not Track or Global Privacy Control signals |
| Social trackers | Social media tracking pixels and share button integrations |
| Cross-device tracking | Evidence of tracking users across multiple devices |
Transparency
| Signal | Description |
|---|---|
| Policy presence | Whether a publicly accessible privacy policy exists |
| Sections found | Number of standard policy sections identified (data collection, sharing, retention, rights, etc.) |
| Word count | Total policy length; readable policies are under 6,000 words, excessive policies exceed 10,000 |
| Retention specificity | Whether specific data retention periods are stated vs. vague language |
| Contradictions | Inconsistencies found within the policy text, counted and detailed with evidence |
| DPA published | Whether a Data Processing Agreement is publicly available |
| Purposes stated | Whether the company explicitly states its data processing purposes |
User Rights
| Signal | Description |
|---|---|
| Rights listed | 8 recognized rights: access, deletion, correction, portability, opt-out, withdraw consent, restrict processing, object to processing |
| Request channels | Number and types of channels for exercising rights (web form, email, in-app, postal) |
| Deletion difficulty | Scored 0–4 based on barriers, requirements, and friction in the deletion process |
| Data request form | Whether a structured form exists for submitting data access or deletion requests |
| Privacy email | Whether a dedicated privacy contact email address is published |
| Appeals process | Whether users can appeal denied requests |
AI Practices
| Signal | Description |
|---|---|
| Usage disclosure | Whether the company explicitly, partially, or does not disclose AI usage (yes/partial/no) |
| Training on user data | Whether personal user data is used to train AI models |
| Training on interactions | Whether user interactions (clicks, queries, behavior) are used for AI training |
| Training on public content | Whether publicly posted user content is used for AI training |
| Third-party AI sharing | Whether data is shared with external AI providers, and whether this is disclosed |
| Automated decisions | Whether AI is used for automated decision-making that affects users |
| Opt-out availability | Whether users can opt out of AI training on their data |
Evidence-Based
Every signal extracted during analysis includes evidence: direct quotes from the policy text that support the finding. This means users can verify each assessment themselves by reading the original source material.
When the AI extraction identifies a practice (for example, that a company sells personal data), the specific passage from the privacy policy is preserved alongside the structured signal. Evidence is displayed on company profiles so that the assessment is never a black box.
Scoring factors also include evidence. When a penalty or bonus is applied to a dimension score, the factor entry records both a human-readable description and, where available, the supporting details or policy quote that triggered it.
Analysis Pipeline
Each company assessment follows a five-stage pipeline. The entire process runs asynchronously via background jobs.
For a deeper look at each stage, see How Analysis Works.
Limitations
Signals are derived from publicly available documents and infrastructure. Internal data handling practices, employee training, or unpublished policies are not captured. Companies without a published privacy policy receive zeroed scores for Data Collection, Data Sharing, and Tracking dimensions to avoid false positives from missing data.
AI extraction is automated and conservative. When the system is uncertain about a finding, it records the uncertainty rather than making an unsupported claim. All signals should be considered one input among many when evaluating a company's privacy practices.