Detect and/or Redact Personally Identifiable Information v1.0.0 Help
Inspects text for personally identifiable information (PII) entities and returns details about them; can redact identified PII entities with provided masks. Refers to Named entity recognition (NER).
How can I use the Step?
The Step lets you find and redact PII entities in the text. This way, you can automate PII data collection or implement specific policies to deal with sensitive personal data. The Step only supports English.
How does the Step work?
A PII entity is a text reference to information that identifies a person, such as an address, bank account number, driver's license, etc.
For example, in a text, "Dear John Doe! The credit balance on your card number 0000-1111-0000-1111 has been updated," the Step recognizes John Doe as a name
and 0000-1111-0000-1111 as a creditDebitNumber
.
In addition, the Step assigns a confidence score to each PII entity found in a text. This score indicates confidence that the Step correctly identified the PII entity type. To learn more, see the Output example.
You can also mask found PII entities using different Redaction options.
Input settings
To set up the section, do the following:
For Operations, select at least one of the following options:
Detect
Redact
: Enables Redaction options
For Input text, enter text to analyze.
For PII entity types, select entity types you want to detect/redact in the text.
Input text
The input text must be a UTF-8 string. The string must contain at least 1 character. The maximum string size is 100 KB. English is the only valid language.
PII entity types
The Step uses a set of 22 PII entity types, which you can find in the following table:
PII entity type | Description |
---|---|
address | A physical address, such as "100 Main Street, Anytown, USA" or "Suite #12, Building 123". An address can include a street, building, location, city, state, country, county, zip, precinct, neighborhood, and |
age | An individual's age, including the quantity and unit of time. For example, in the phrase "I am 40 years old," the Step recognizes "40 years" as an age. |
awsAccessKey | A unique identifier that's associated with a secret access key; the access key ID and secret access key are used together to sign programmatic AWS requests cryptographically. |
awsSecretKey | A unique identifier that's associated with an access key; the access key ID and secret access key are used together to sign programmatic AWS requests cryptographically. |
bankAccountNumber | A US bank account number. These are typically between 10 - 12 digits long, but the Step also recognizes bank account numbers when only the last 4 digits are present. |
bankRouting | A US bank account routing number. These are typically 9 digits long, but the Step also recognizes routing numbers when only the last 4 digits are present. |
creditDebitCvv | A 3-digit card verification code (CVV) that is present on VISA, MasterCard, and Discover credit and debit cards. In American Express credit or debit cards, it is a 4-digit numeric code. |
creditDebitExpiry | The expiration date for a credit or debit card. This number is usually 4 digits long and formatted as month/year or MM/YY. For example, the Step can recognize expiration dates such as 01/21, 01/2021, and Jan 2021. |
creditDebitNumber | The number for a credit or debit card. These numbers can vary from 13 to 16 digits in length, but the Step also recognizes credit or debit card numbers when only the last 4 digits are present. |
dateTime | A date can include a year, month, day, day of week, or time of day. For example, the Step recognizes "January 19, 2020" or "11 am" as dates. The Step will identify partial dates, date ranges, and date intervals. It will also recognize decades, such as "the 1990s". |
driverId | The number assigned to a driver's license is an official document permitting an individual to operate one or more motorized vehicles on a public road. A driver's license number consists of alphanumeric characters. |
email | An email address, such as marymajor@email.com. |
ipAddress | An IPv4 address, such as 198.51.100.0. |
macAddress | A media access control (MAC) address is a unique identifier assigned to a network interface controller (NIC). |
name | An individual's name. This entity type does not include titles, such as Mr., Mrs., Miss, or Dr. the Step does not apply this entity type to names that are part of organizations or addresses. For example, the Step recognizes the "John Doe Organization" as an organization, and it recognizes "Jane Doe Street" as an address. |
passportNumber | A US passport number. Passport numbers range from 6 - 9 alphanumeric characters. |
password | An alphanumeric string that is used as a password, such as "Very20special#pass". |
phone | A phone number. This entity type also includes fax and pager numbers. |
pin | A 4-digit personal identification number (PIN) that allows someone to access their bank account information. |
ssn | A Social Security Number (SSN) is a 9-digit number that is issued to US citizens, permanent residents, and temporary working residents. the Step also recognizes Social Security Numbers when only the last 4 digits are present. |
url | A web address, such as www.example.com. |
username | A user name that identifies an account, such as a login name, screen name, nickname, or handle. |
Datetime settings
The Datetime feature converts dates found in the text from one timezone to another and returns the converted date in a selected date and time format.
To set up the section, follow these steps:
- For Timezone, select input and output timezones for date conversion.
- For Output format, select the date and time format and specify options that suit your application.
Redaction options
Redact operation lets you mask PII entities using two following options:
Entity type mask
(default)Custom mask
Entity type mask
Entity type mask
redact PII entities with predefined PII types.
For example, using the text, "Dear John Doe! The credit balance on your card number 0000-1111-0000-1111 has been updated," with a Redact
operation and Entity type mask
, the Step returns the following text:
"Dear [NAME]! The credit balance on your card number [CREDIT DEBIT NUMBER] has been updated,"
Custom mask
Custom mask
works similarly to Entity type mask
but redact PII entities with characters you provide instead of predefined PII types.
Output and exit behavior
To set up this section, take the following steps:
- For Output data options, select the appropriate options to configure the output structure. The setting is available only for
Detect
operation. - In Output data structure, ensure that the output structure suits your application.
Merge field settings
The Step returns the result as a JSON object and stores it in the Merge field variable. Thus you can access the output JSON object from any point of your Flow. To learn more about this Step's output, see the Output example.
Skip logic exit
Use this setting to handle cases where duplicate Merge field variable names exist in your Flow, whereas the previously defined variable holds value.
By default, in such cases, the Step overwrites the existing variable with the new value. Another option is to skip the Step execution and direct the Flow down the selected exit. To do so, follow these steps:
- Enable the Skip step execution if existing merge field has data toggle.
- In the Skip logic exit list, select exit to direct the Flow.
Output example
The Step's output contains information about each detected PII entity, including its type, confidence score, start and end points in the text, and redacted text (if applicable).
For example, using the Detect
and Redact
operations with default settings and the input text "Dear John Doe! The credit balance on your card number 0000-1111-0000-1111 has been updated," the Step returns the following JSON object:
{
"count": 2,
"byOrder": [
{
"score": 0.9999272227287292,
"type": "name",
"beginOffset": 5,
"endOffset": 13,
"text": "John Doe"
},
{
"score": 0.9999970197677612,
"type": "creditDebitNumber",
"beginOffset": 54,
"endOffset": 73,
"text": "0000-1111-0000-1111"
}
],
"redacted": "Dear [NAME]! The credit balance on your card number [CREDIT DEBIT NUMBER] has been updated"
}
{
"count": 2,
"byOrder": [
{
"score": 0.9999272227287292,
"type": "name",
"beginOffset": 5,
"endOffset": 13,
"text": "John Doe"
},
{
"score": 0.9999970197677612,
"type": "creditDebitNumber",
"beginOffset": 54,
"endOffset": 73,
"text": "0000-1111-0000-1111"
}
],
"redacted": "Dear [NAME]! The credit balance on your card number [CREDIT DEBIT NUMBER] has been updated"
}
Error Handling
By default, the Step handles errors using a separate exit. So if any error occurs during the Step execution, the Flow proceeds down the error
exit.
Note: If you disable the Handle error toggle, the Step does not handle errors. With this setup, if any error occurs during the Step execution, the Flow fails immediately after exceeding the Flow's timeout. To prevent the Flow from being suspended while continuing to handle errors in the Flow, place the Flow Error Handling Step before the main Flow logic.
Reporting
The Step reports once after its execution. You can change the Step log level and add new tags in the section.
Log level
By default, the Step inherits its log level from Flow's log level. You can change the Step's log level by selecting an appropriate option from the Log level list.
Tags
Tags help organize and filter session information when generating reports. You can specify the tag category, label, and value when adding a new tag.
Service dependencies
- flow builder - v2.28.3
- event-manager - v2.3.0
- deployer - v2.6.0
- comprehend provider - v0.9.0
Release notes
v1.0.0
- Initial release