The story behind Matters.AI funding journey

Personally Identifiable Information

PII covers more than names and ID numbers. What counts as personally identifiable information across GDPR, DPDP, and CCPA and why accuracy matters.

Read with AI

What is PII (Personally Identifiable Information)?

PII, or Personally Identifiable Information, is any information that can be used to identify a specific individual, either on its own or when combined with other data. It's the category of data that privacy regulations most explicitly govern, that breach notification obligations most directly apply to, and that data classification programmes must identify with the highest accuracy.

The definition sounds straightforward. In practice, identifying what counts as PII is more nuanced than most compliance checklists suggest, because identifiability depends on context, combination, and the state of available linkage methods not just on the data type itself.

The two categories of PII

Security and compliance practitioners distinguish two functional categories that differ in how directly they enable identification.

Direct identifiers can identify an individual without any additional information. A full name combined with a date of birth. A government-issued national ID number. A social security number. A passport number. A biometric record such as a fingerprint or facial recognition template. An email address that includes a recognisable name. These elements don't require any additional context to enable identification: they point to a specific person on their own.

Indirect identifiers can identify an individual when combined with other available information. An IP address alone may not identify a specific person, but combined with a timestamp and a username in an access log, it enables identification. A job title and employer combination might not uniquely identify someone in a large organisation, but combined with a start date and department it can. A device ID is not inherently personal data, but tied to an authenticated user account it becomes PII. Geographic coordinates accurate to a building or room, especially when associated with time patterns, can identify an individual's home or workplace.

The distinction matters for classification because many organisations treat only direct identifiers as PII and miss the indirect identifiers that, in combination, create the same identification risk. A database containing IP addresses, timestamps, and session identifiers may not look like a PII dataset when any column is examined in isolation. Evaluated together, it contains detailed records of individual user behaviour that are unambiguously PII under GDPR and most equivalent frameworks.

PII examples across data types

The scope of PII is broader than most informal lists suggest. Compliance teams maintaining data inventories should account for all of these categories.

Identity documents and government identifiers

Social security numbers, national ID numbers, passport numbers, driver's licence numbers, tax identification numbers, voter registration numbers.

Contact and locational data

Full names, home addresses, email addresses, phone numbers, GPS coordinates, IP addresses (in many jurisdictions), precise location data, home and work location patterns over time.

Biometric and physical characteristics

Fingerprints, voiceprints, facial recognition data, iris scans, retinal scans, DNA profiles, physical descriptions specific enough to identify an individual.

Financial identifiers

Bank account numbers, credit and debit card numbers, credit history records when linked to an individual, account balances associated with a named individual.

Digital identifiers

Usernames, account IDs, device identifiers (IMEI, MAC address, device fingerprints), cookies when persistent and linked to an individual's behaviour, user agent strings combined with other identifying data.

Demographic combinations

Date of birth alone is not typically PII, but combined with a postal code and gender it can uniquely identify a significant proportion of individuals in a population a well-documented re-identification risk. Age range, income bracket, occupation, and other demographic data become PII when their combination is granular enough to identify specific individuals.

Professional and educational records

Employee records including performance reviews, compensation data, disciplinary records, CV and employment history, educational transcripts and records.

How PII is defined across major regulatory frameworks

The definition of PII isn't universal. Different frameworks define it with different scope and different emphasis, which creates compliance complexity for organisations operating across multiple jurisdictions.

GDPR (EU General Data Protection Regulation) uses the term "personal data" rather than PII, defined as "any information relating to an identified or identifiable natural person." The emphasis on "identifiable" is significant: GDPR captures not just data that currently identifies someone, but data that could identify them if combined with other reasonably available information. GDPR also identifies "special categories" of personal data requiring heightened protection: racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, health data, and data concerning sexual orientation.

DPDP (India's Digital Personal Data Protection Act) defines "personal data" as "any data about an individual who is identifiable by or in relation to such data." Like GDPR, this is a broad functional definition rather than a list. DPDP introduces "sensitive personal data" as a subcategory including financial data, health and medical data, official identifiers, sex life, sexual orientation, biometrics, and religious or political beliefs. DPDP is distinct from GDPR in several ways: universal breach notification requirements that differ from GDPR's risk-based threshold, different consent mechanisms, and India-specific requirements around data fiduciaries and data processors.

CCPA (California Consumer Privacy Act) defines personal information as "information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household." CCPA extends the concept to households, which GDPR doesn't, and explicitly includes inferences drawn from personal information to create profiles about individuals.

HIPAA (Health Insurance Portability and Accountability Act) specifically governs Protected Health Information (PHI), which is a subset of PII that connects health conditions or treatment to an identifiable individual. HIPAA provides a specific list of 18 identifiers that make health information individually identifiable, including names, geographic subdivisions smaller than a state, dates more specific than year, phone numbers, email addresses, and others.

The practical implication: an organisation processing data from US residents, EU residents, Indian residents, and California residents simultaneously operates under all four frameworks simultaneously, with overlapping but non-identical definitions of what requires protection.

Why PII classification accuracy matters

A PII dataset that isn't correctly classified carries no classification-driven protection. DLP policies don't enforce against unclassified data. Access reviews don't flag over-permissioned access to unclassified tables. Risk scores don't prioritise unclassified misconfigurations.

That gap is where classification accuracy produces material security and compliance differences.

Consider a practical scenario. An analytics database contains a table named user_events with columns for session_id, timestamp, event_type, and user_agent. No column name contains obvious PII identifiers. A pattern-matching classification rule looking for names, email addresses, or government ID formats finds nothing and classifies the table as non-sensitive.

Semantically, however, this table is PII. The session_id column contains persistent identifiers tied to authenticated user accounts. Combined with timestamp data, it creates a detailed behavioural record of each user's activity on the platform. Under GDPR's definition, this is personal data. Under DPDP's definition, this is personal data. Under CCPA's definition, this is personal information.

A semantic classification engine evaluates the surrounding context: the table is in a production analytics environment, the session_id column has a consistent format tied to the authentication system, the timestamps span months of continuous usage. The combination is user behaviour data linked to persistent identifiers: PII. A rule-based classifier sees individual columns that don't match any PII pattern and misses it entirely.

That false negative is a classification gap that has real consequences: no DLP policy, no access review, no breach scope inclusion, no regulatory inventory. The data exists outside governance as effectively as if it had never been classified at all.

PII vs PHI vs sensitive personal data

These three terms are frequently conflated.

PII is the broadest category: any information that can identify an individual. It's the umbrella term used primarily in US regulatory contexts.

PHI (Protected Health Information) is a specific regulated subcategory of PII: information that connects a health condition, treatment, or healthcare record to an identifiable individual. PHI is PII with healthcare context that triggers HIPAA's specific requirements. Not all PII is PHI. All PHI is PII.

Sensitive personal data is a category used by GDPR and DPDP to describe PII that carries heightened protection requirements due to the specific harm its exposure could cause: racial origin, health data, biometrics, genetic data, religious beliefs, sexual orientation, political opinions. Processing sensitive personal data requires explicit consent in most frameworks, and its exposure triggers more significant regulatory consequences than exposure of standard PII.

How to protect PII

Protection of PII requires five operational capabilities working together.

Discovery

Finding where PII exists, including shadow data copies and development environments. You can't protect what you don't know about.

Classification

Accurately labelling PII assets with high enough confidence to drive enforcement. Misclassified PII receives no protection.

Access governance

Ensuring only identities with legitimate business need can access PII, and that access is reviewed and revoked when circumstances change.

DLP enforcement

Preventing PII from crossing egress boundaries to unauthorised destinations through policy enforcement at email, web, cloud, and endpoint channels.

Continuous monitoring

Detecting changes to PII access patterns, configuration, or exposure that indicate risk has increased since the last assessment.

Frequently asked questions

What is PII?

PII (Personally Identifiable Information) is any information that can be used to identify a specific individual, either directly or when combined with other available information. It includes direct identifiers like names, government ID numbers, and biometrics, as well as indirect identifiers like IP addresses, device IDs, and demographic combinations that enable identification through linkage.

What are examples of PII?

Direct PII examples: full name, social security number, passport number, email address, home address, phone number, biometric data. Indirect PII examples: IP address combined with a timestamp, device identifier linked to a user account, demographic combinations granular enough to identify individuals, persistent session IDs tied to user behaviour records.

What is the difference between PII and sensitive personal data?

PII is any information that can identify an individual. Sensitive personal data is a subcategory of PII that carries heightened protection requirements due to the specific harm its exposure could cause: racial or ethnic origin, health data, biometric data, genetic data, religious beliefs, sexual orientation, and political opinions. GDPR and DPDP both designate these categories for additional consent and processing requirements.

What is the difference between PII and PHI?

PHI (Protected Health Information) is a regulated subcategory of PII that specifically connects a health condition, treatment, or healthcare record to an identifiable individual. PHI triggers HIPAA's requirements in the United States. All PHI is PII. Not all PII is PHI.

Is an IP address PII?

It depends on jurisdiction and context. Under GDPR, an IP address is personal data when it can be combined with other available information to identify an individual. Under US FTC guidance, IP addresses are PII. Under some narrower US legal interpretations, a standalone IP address without additional linkage may not qualify. In practice, security and compliance programmes should treat IP addresses as PII, particularly when stored in access logs alongside usernames, session IDs, or other identifiers.

Published May 1, 2026
Share

Ready to see Matters in Action?

Join a specialized 30-minute walkthrough. No sales fluff, just pure visibility and security intelligence.