The story behind Matters.AI funding journey

Protected Health Information

PHI is health information linked to an individual identity and learn what HIPAA's 18 identifiers cover, who must comply, and what a breach actually requires.

Read with AI

What is PHI (Protected Health Information)?

PHI, or Protected Health Information, is individually identifiable health information that is created, received, maintained, or transmitted by a HIPAA-covered entity or their business associates. It's the data category at the heart of HIPAA compliance: the specific intersection of health information and individual identity that triggers the regulation's strict handling requirements.

The definition has two components that must both be present. Health information alone isn't PHI if it can't be connected to a specific individual. An individual's identity alone isn't PHI without a health dimension. It's the combination of information that connects a health condition, treatment, payment, or care record to a specific identifiable person that creates PHI and triggers HIPAA obligations.

HIPAA's 18 identifiers: what makes health information individually identifiable

HIPAA's Safe Harbor method for de-identification provides a precise and exhaustive list of 18 identifiers whose presence makes health information individually identifiable. If health information contains any of these identifiers, it's PHI. Removing all 18 satisfies the Safe Harbor de-identification standard and removes HIPAA coverage.

The 18 identifiers are:

Names. All geographic subdivisions smaller than a state, including street address, city, county, precinct, zip code. All dates except year, including birth dates, admission dates, discharge dates, death dates, and all ages over 89. Telephone numbers. Fax numbers. Email addresses. Social security numbers. Medical record numbers. Health plan beneficiary numbers. Account numbers. Certificate or licence numbers. Vehicle identifiers and serial numbers including licence plates. Device identifiers and serial numbers. Web URLs. IP addresses. Biometric identifiers including fingerprints and voiceprints. Full-face photographic images and comparable images. Any other unique identifying number, characteristic, or code.

That last category "any other unique identifying number, characteristic, or code" is the catch-all that prevents de-identification from being a mechanical checklist exercise. It requires judgement about whether any remaining data element could enable re-identification. A rare diagnosis in combination with a geographic region and an age range, even after the listed 18 identifiers have been removed, may still constitute PHI if the combination uniquely identifies an individual in that population.

PHI vs ePHI: the electronic subset

ePHI (Electronic Protected Health Information) is PHI that is created, received, maintained, or transmitted electronically. It's a subset of PHI, not a separate category.

The distinction matters because HIPAA's Security Rule specifically governs ePHI, while the Privacy Rule governs all PHI regardless of format. Paper records containing PHI fall under the Privacy Rule. Electronic records fall under both the Privacy Rule and the Security Rule.

In practice, the vast majority of PHI in modern healthcare and adjacent organisations is ePHI: electronic health records, laboratory information systems, billing and claims systems, patient portal data, wearable health device data, telehealth platform records. Security programmes addressing HIPAA compliance are primarily addressing ePHI security requirements.


Who creates and handles PHI

HIPAA's applicability is defined by entity type, not industry sector alone. Three categories of organisations create PHI obligations.

Covered entities are the primary HIPAA-regulated organisations: healthcare providers who transmit any health information in electronic form, health plans including insurers and employer-sponsored health programmes, and healthcare clearinghouses that process health information. If you're a hospital, a health insurance company, or an electronic claims processor, you're a covered entity.

Business associates are the organisations covered entities share PHI with in the course of conducting their operations. A cloud provider hosting EHR data is a business associate. A billing company processing patient payment information is a business associate. A security firm auditing healthcare IT systems is a business associate if they'll encounter PHI. Business associates are directly liable under HIPAA and must sign Business Associate Agreements (BAAs) with covered entities.

Subcontractors of business associates have the same obligations as business associates when they handle PHI. A cloud infrastructure provider subcontracted by a healthcare software company handling PHI is a subcontractor with HIPAA obligations.

The practical implication: organisations that don't self-identify as "healthcare" organisations may still be subject to HIPAA. HR platforms that process employee health plan data, wellness applications that collect health metrics, analytics firms that process de-identified but potentially re-identifiable patient data, and technology vendors providing services to hospitals: all of these may have PHI handling obligations.

What PHI protection requires under HIPAA

HIPAA's Security Rule establishes three categories of safeguards that covered entities and business associates must implement for ePHI.

Administrative safeguards cover the policies, procedures, and management practices that govern PHI handling: risk analysis and risk management programmes, workforce training, access management policies, contingency planning for system failures, and evaluation processes. The risk analysis requirement is particularly significant: organisations must conduct and document a thorough assessment of the risks to ePHI confidentiality, integrity, and availability across all systems that create, receive, maintain, or transmit it.

Physical safeguards govern the physical infrastructure where ePHI is stored or accessed: facility access controls, workstation security policies, media controls for devices that hold ePHI, and disposal requirements for hardware and media.

Technical safeguards are the security controls implemented in information systems: access controls including unique user identification and automatic logoff, audit controls that record and examine system activity, integrity controls that protect ePHI from improper alteration, and transmission security including encryption for ePHI in transit.

HIPAA doesn't prescribe specific technologies. It requires that covered entities and business associates implement the safeguards that are "reasonable and appropriate" given their size, capability, and the nature of the risks they face. That flexibility is both a feature and a complication: it means HIPAA compliance can't be achieved by checking a box against a fixed technical specification list.

PHI breach notification requirements

HIPAA's Breach Notification Rule requires covered entities to notify affected individuals, the Department of Health and Human Services (HHS), and in some cases the media following a breach of unsecured PHI. "Unsecured" specifically means PHI that hasn't been rendered unusable, unreadable, or indecipherable through encryption or destruction meeting NIST standards.

Notification to affected individuals must occur within 60 days of discovering a breach. If a breach affects more than 500 residents of a state or jurisdiction, the covered entity must also notify prominent media outlets in that jurisdiction. All breaches must be reported to HHS, with large breaches (500+ individuals) reported within 60 days and smaller breaches reported annually.

The investigation that precedes notification determining whether a breach occurred, what data was involved, and who was affected and what requires exactly the PHI classification, data lineage, and incident scoping capabilities that distinguish mature data security programmes from reactive ones. An organisation that doesn't know where its PHI is stored can't determine breach scope. An organisation without continuous data lineage tracking can't determine how far PHI propagated after initial access.

The PHI scope problem in modern healthcare IT

HIPAA's Safe Harbor de-identification standard was designed for a world of structured medical records. Modern healthcare IT environments are considerably more complex.

Patient data appears in clinical notes written in natural language. It appears in medical imaging files. It appears in mobile health application telemetry. It appears in wearable device streams. It appears in genomic databases. It appears in population health analytics platforms that ingest data from dozens of source systems. It propagates through API integrations, ETL pipelines, analytics exports, and research databases in ways that the 1996 regulatory framework didn't anticipate.

Classification programmes designed to find PHI by pattern-matching against the 18 HIPAA identifiers in structured database fields will miss PHI embedded in unstructured clinical notes, PHI in medical imaging metadata, PHI that has been partially de-identified but retains re-identification risk through combination, and PHI that has propagated from original healthcare systems into downstream analytics and research environments.

Semantic classification that evaluates data in context and understanding that a document containing a clinical narrative, a diagnosis code, and a patient account number is PHI regardless of how the fields are labelled that covers the scope of PHI in real healthcare environments more accurately than pattern matching against the formal identifier list.

Frequently asked questions

What is PHI?

PHI (Protected Health Information) is individually identifiable health information created, received, maintained, or transmitted by HIPAA-covered entities or their business associates. It combines health information and diagnoses, treatments, payment records, or any other information related to an individual's past, present, or future physical or mental health — with any of HIPAA's 18 specified identifiers that make the information individually identifiable.

What are the 18 HIPAA identifiers?

The 18 identifiers are: names, geographic subdivisions smaller than a state, dates (except year), telephone numbers, fax numbers, email addresses, social security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate or licence numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number or characteristic.

What is the difference between PHI and ePHI?

ePHI (Electronic Protected Health Information) is PHI that exists in electronic form. All ePHI is PHI, but not all PHI is ePHI: paper records containing health information linked to an individual's identity are PHI but not ePHI. HIPAA's Security Rule specifically governs ePHI, while the Privacy Rule governs all PHI regardless of format.

What is the difference between PHI and PII?

PII (Personally Identifiable Information) is the broader category: any information that can identify an individual. PHI is a regulated subset of PII that specifically connects health information to an identifiable individual. All PHI is PII, but not all PII is PHI. A person's email address is PII. Their email address combined with their diagnosis and treatment record is both PII and PHI.

Who does HIPAA apply to?

HIPAA applies to covered entities which are healthcare providers, health plans, and healthcare clearinghouses and their business associates who handle PHI in the course of providing services. Business associates' subcontractors have the same obligations when they handle PHI. Organisations that don't identify as healthcare companies may still have HIPAA obligations if they process employee health plan data, provide services to healthcare organisations, or operate health-related platforms.

Published May 1, 2026
Share

Ready to see Matters in Action?

Join a specialized 30-minute walkthrough. No sales fluff, just pure visibility and security intelligence.