The story behind Matters.AI funding journey

Data Tokenization

Data tokenization replaces sensitive values with non-sensitive tokens, keeping originals in a secure vault. See how it reduces PCI DSS scope and breach impact.

Read with AI

What is Data Tokenization?

Data tokenization is a data protection technique that replaces a sensitive value with a non-sensitive substitute called a token. The token has the same format as the original value and can be used in its place across systems and processes, but carries no inherent sensitive information. The original value is stored securely in a separate system called a token vault, and can only be retrieved by systems with explicit authorisation to do so.

A customer's credit card number 4532015112830366 becomes a token like 9871234567891234. Both are 16-digit strings. Both pass format validation in payment systems. But the token has no value outside the tokenisation system. An attacker who steals it has nothing usable. Only the payment processor with access to the vault can map the token back to the real card number.

That's the core mechanism. Simple to describe. The operational implications are significant.

How tokenization works

Three components make up a tokenisation system.

The tokenisation engine is the service that generates tokens and manages the vault. When a sensitive value is first presented, the engine generates a token, stores the mapping between token and original value in the vault, and returns the token to the calling system. On subsequent presentations of the same value, the engine can either return the same token (deterministic tokenisation) or generate a new one (random tokenisation), depending on whether consistent token mapping is required.

The token vault is the secure store containing the mapping between tokens and original values. The vault is the highest-security component in a tokenisation system: it's what an attacker would need to compromise to recover original sensitive values from tokens. Vault security typically includes encryption at rest, strict access controls, comprehensive audit logging, and separation from the systems that handle the tokens themselves.

Detokenisation is the reverse operation: retrieving the original value from the vault using a token. Not all systems that handle tokens need detokenisation access. A merchant's order management system can process transactions using a token throughout the order lifecycle and only the payment processor needs to detokenise to charge the actual card. That separation — most systems handle tokens, only privileged systems can detokenise — is what makes tokenisation effective as a scope reduction tool.

Tokenisation in payment card security: PCI DSS scope reduction

The dominant use case for tokenisation is payment card data protection under PCI DSS (Payment Card Industry Data Security Standard). Understanding this use case makes the value proposition concrete.

PCI DSS governs every system, network, and process that stores, processes, or transmits cardholder data. The compliance requirements are extensive: network segmentation, encryption, access controls, logging, vulnerability management, penetration testing, and more. Every system in scope requires those controls. The compliance cost scales directly with scope.

Tokenisation reduces scope by removing card data from most systems entirely. Here's how it works in practice.

A customer enters their card number at checkout. The tokenisation service receives the card number, stores it in the vault, and returns a token. The token — not the card number — is stored in the merchant's order database, passed to the fulfilment system, referenced in the customer's account history, and used throughout any subsequent interactions that reference that payment method. Only the payment processor's systems, which handle the actual transaction, ever see or use the real card number.

The merchant's order database doesn't store card data. The fulfilment system doesn't process card data. The customer account system references tokens, not card numbers. None of those systems fall into PCI DSS scope for cardholder data storage. The compliance boundary shrinks from the entire order management infrastructure to the tokenisation system and the payment processor's integration points. Compliance costs, audit scope, and breach impact all decrease proportionally.

That's the practitioner-level argument for tokenisation in payment contexts. It's not primarily a encryption-equivalent security control. It's an architectural decision that removes sensitive data from systems that don't need the actual value, reducing both breach impact and compliance burden simultaneously.

Format-preserving tokenisation

Standard tokenisation replaces a sensitive value with a random token that may have no resemblance to the original format. A 16-digit card number becomes a random 16-digit string. An email address might become a random alphanumeric identifier.

Format-preserving tokenisation (FPT) produces tokens that match the format of the original value: a 16-digit card number produces a 16-digit token, a Social Security Number produces a nine-digit token, an email address produces a string that looks like an email address. This matters for systems that validate format as part of processing — a system expecting a 16-digit number in a specific field will break if the token is a random string of different length.

FPT preserves the operational properties that downstream systems depend on while still providing the core protection: the token carries no meaningful information and can only be resolved by the vault. It's more technically complex to implement — generating format-preserving tokens while maintaining cryptographic security requires specific algorithms — but it makes tokenisation transparent to downstream systems that weren't designed for it.

Tokenisation vs encryption vs masking

These three techniques are frequently compared because they all protect sensitive data, but they have different properties and different appropriate use cases.

Tokenisation replaces sensitive values with tokens stored in a vault that allows the original to be retrieved by authorised systems. It's reversible by design — that's its purpose. The token can be used in place of the original value in systems that don't need the actual data. Tokenisation doesn't encrypt the original value in place; it removes it from the system entirely and substitutes a reference.

Encryption transforms a value into ciphertext using a key. The original value can be recovered from the ciphertext using the key. Encryption protects the value where it sits: the data remains in the system but in an unreadable form. It doesn't reduce compliance scope for systems that process the encrypted data, because the encrypted data is still considered in-scope cardholder data under PCI DSS until it's been encrypted with a validated cryptographic implementation. Tokenisation removes the data from scope; encryption protects the data within scope.

Masking replaces sensitive values with fictitious values without maintaining a mapping to the original. It's irreversible. Masking is appropriate for non-production environments where the original value is not needed. Tokenisation is appropriate for production environments where the original value must be retrievable for specific authorised operations.

So: encryption for protecting data in place, tokenisation for removing sensitive data from systems that don't need it while preserving the ability to retrieve it, masking for non-production environments where the original value is never needed.

Beyond payment cards: general-purpose tokenisation

Tokenisation was developed primarily for payment card data, but the same mechanism applies to any sensitive value that needs to be removed from systems that don't require its actual content.

Social security numbers and national ID numbers can be tokenised in systems that need to reference an individual's identity record without needing the actual ID number. An employee management system can store a token for each employee's SSN, while only the payroll system holds vault access to retrieve the actual numbers when required for tax processing.

Health record identifiers can be tokenised for use in analytics systems. Researchers working with de-identified datasets can use tokens as consistent identifiers across records without having access to the PHI that the tokens represent.

Email addresses can be tokenised for marketing analytics: campaign tracking and attribution analytics can reference email tokens across systems, with only the email delivery system holding vault access to resolve tokens to actual addresses.

The common pattern: a sensitive identifier that multiple systems need to reference, but only specific systems need to resolve to its actual value. Tokenisation lets all the systems participate in the operational workflow without expanding the access footprint of the actual sensitive value.

Frequently asked questions

What is data tokenisation?

Data tokenisation is a data protection technique that replaces sensitive values with non-sensitive substitutes called tokens. Tokens have the same format as the original value and can be used in its place, but carry no inherent sensitive information. The original value is stored securely in a token vault and can only be retrieved by systems with explicit authorisation to detokenise.

How does tokenisation differ from encryption?

Encryption transforms a value into ciphertext that remains in the same system and requires a key to decrypt. The encrypted value is still considered sensitive data in compliance scope. Tokenisation removes the original value from the system entirely, replacing it with a token that references the value in a separate vault. Tokenisation reduces compliance scope; encryption protects data within scope.

What is the difference between tokenisation and data masking?

Tokenisation is reversible: the original value can be retrieved from the vault by authorised systems. Masking is irreversible: the original value is replaced with a fictitious value and cannot be recovered from the masked version. Tokenisation is for production environments where the original value must be retrievable for specific operations. Masking is for non-production environments where the original value is never needed.

How does tokenisation reduce PCI DSS scope?

PCI DSS applies to every system that stores, processes, or transmits cardholder data. Tokenisation removes actual card numbers from most systems, replacing them with tokens. Systems that handle tokens but never see the original card number are removed from PCI DSS cardholder data scope, dramatically reducing the number of systems subject to PCI requirements and the associated compliance cost and audit burden.

What is format-preserving tokenisation?

Format-preserving tokenisation (FPT) generates tokens that match the format of the original value: a 16-digit card number produces a 16-digit token, preserving the format expectations of downstream systems. Standard tokenisation may produce tokens in a different format, which can break systems that validate field format. FPT makes tokenisation transparent to downstream systems while maintaining the core security property.

Published May 1, 2026
Share

Ready to see Matters in Action?

Join a specialized 30-minute walkthrough. No sales fluff, just pure visibility and security intelligence.