DataPrep performs a series of data cleansing and manipulation actions on inputs and returns the results of those actions. DataPrep actions can generate new fields, enriching the input records with extra data points. They can also overwrite existing fields, replacing messy inputs with cleansed data. Control over whether a new field is created or an existing field is overwritten lies completely with the end user.
Actions can take multiple inputs and have multiple outputs depending on the particular operation they perform. Actions can be chained together in a sequence such that the output(s) of one action becomes the input(s) of a subsequent action. The API takes the name of an action, input fields, output fields, and sometimes options that change the behavior of the action.
There are two ways to call the API, a Long-form API and a Short-form API form. Both forms provide the same functionality, except that the short-form only allows a single input record at a time while the long-form allows for batching multiple records into a single request.
List of DataPrep Actions
The list of available actions to perform are listed below. The inputs, outputs, and options for each action are included with their definition.
Checks whether a field looks like a potential 9-digit US postal code and if so, splits it into two columns containing the first 5 digits and the other 4.
Parses a name field into separate columns for category (individual/business/unknown), prefix, firstname, middlename, lastname, suffix, and business name if applicable.
Checks if a name appears to match an email and outputs component match count (how many name components match the email) and a weighted score. Higher scores are better.
Formats a phone number into a selected standard format. Only works with 10-digit (or 11-digit with country code) North American phone numbers.
[+1 417.576 3685]
[+1 (417) 576-3685]
Clean Email
Clean Email
API Action String: email
Checks an email field for validity. Outputs the email on success and blank on fail. Provides light correction (e.g. gmail.co becomes gmail.com).
Clean name
API Action String: name
Cleans bad characters from a name (only allows alphabetic characters).
Input Idx
Input Type
0
Fullname
Output Idx
Output Type
0
Fullname
Clean First Name
Clean First Name
API Action String: first
Cleans bad characters from a first name (only allows alphabetic characters).
Input Idx
Input Type
0
First
Output Idx
Output Type
0
First
Clean Last Name
Clean Last Name
API Action String: last
Cleans bad characters from a last name (only allows alphabetic characters).
Input Idx
Input Type
0
Last
Output Idx
Output Type
0
Last
Clean Business Name
Clean Business Name
API Action String: busname
Cleans bad characters from a business name (only allows alphanumeric characters).
Input Idx
Input Type
0
Business
Output Idx
Output Type
0
Business
Clean City Name
Clean City Name
API Action String: city
Cleans bad characters from a city name (only allows alphabetic characters).
Input Idx
Input Type
0
City
Output Idx
Output Type
0
City
Clean State Name
Clean State Name
API Action String: state
Checks for a valid US state and converts to 2 letter abbreviation if possible.
Input Idx
Input Type
0
State
Output Idx
Ouput Type
0
State
US ZIP5 Check
US ZIP5 Check
API Action String: uszip5
Checks whether a field looks like a 5-digit US postal code. Does NOT check for full validity (i.e. 00005 passes this check).
Input Idx
Input Type
0
Zip
Outut Idx
Output Type
0
Zip
US ZIP9 Check & Split
US ZIP9 Check & Split
API Action String: uszip9
Checks whether a field looks like a potential 9-digit US postal code and if so, splits it into two columns containing the first 5 digits and the other 4.
Input Idx
Input Type
0
Zip
Output Idx
Output Type
0
Zip
Clean Country Code
Clean Country Code
API Action String: country
Given a country name field, outputs a country code or blank if it is invalid (allows only alphabetic characters).
Input Idx
Input Type
0
Country
Output Idx
Output Type
0
Country
Clean Phone Number
Clean Phone Number
API Action String: phone
Checks a phone field for validity, if invalid, outputs the number if valid, or blank otherwise.
Input Idx
Input Type
0
Phone
Output Idx
Output Type
0
Phone
Uppercase
Uppercase
API Action String: strtoupper
Converts a field to all uppercase
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Capitalize Words
Capitalize Words
API Action String: ucwords
Capitalizes the first letter of each word in a field.
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Standardize & Rank Job Role
Standardize & Rank Job Role
API Action String: titlerank3
Standardizes a job role field and ranks it, lower number indicates a higher rank.
Input Idx
Input Type
0
Title
Output Idx
Output Type
0
Title Rank 3
1
Generic String
Split Full Name
Split Full Name
API Action String: splitfullname2
Splits a full name into separate columns for prefix, firstname, middlename, lastname, and suffix, all uppercased.
Input Idx
Input Type
0
Fullname
Output Idx
Output Type
0
Generic String
1
First
2
Generic String
3
Last
4
Generic String
Categorize & Split Name
Categorize & Split Name
API Action String: namecatparse
Parses a name field into separate columns for category (individual/business/unknown), prefix, firstname, middlename, lastname, suffix, and business name if applicable.
Fix Fuzzy Location
API Action String: tlilocmap
Attempts to parse a fuzzy location like "Greater Seattle Area" into a meaningful location like "Seattle, WA, US".
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Address
1
City
2
State
3
Zip
4
Country
Extract Domain
Extract Domain
API Action String: domain
Extracts the domain from an email.
Input Idx
Input Type
0
Email
Output Idx
Output Type
0
Domain
ESP Email Check
ESP Email Check
API Action String: isespfree
Checks if an email address appears to be from a free Email Service Provider or not (i.e. 1 = Free ESP, 0 = Private ESP)
Input Idx
Input Type
0
Email
Output Idx
Output Type
0
Generic String
Public IP Address Check
Public IP Address Check
API Action String: ip
Checks if an IP looks like a valid public IP address, outputs blank if the check fails.
Input Idx
Input Type
0
Ip
Output Idx
Output Type
0
Ip
IP Address to Integer
IP Address to Integer
API Action String: ip2long
Converts a dotted quad IP address into an integer value. (e.g. 206.40.146.40 becomes 3458765352)
Input Idx
Input Type
0
Ip
Output Idx
Output Type
0
Generic String
Integer to IP Address
Integer to IP Address
API Action String: long2ip
Converts an integer value into a dotted quad IP address. (e.g. 3458765352 becomes 206.40.146.40)
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Ip
Generate Patterned Email
Generate Patterned Email
API Action String: gpe
Generates a probable email for a given first name, last name, and domain.
Input Idx
Input Type
0
First
1
Last
2
Domain
Output Idx
Output Type
0
Email
Name-To-Email Check
Name-To-Email Check
API Action String: n2echeck
Checks if a name appears to match an email and outputs component match count (how many name components match the email) and a weighted score. Higher scores are better.
Input Idx
Input Type
0
Email
1
First
2
Last
Output Idx
Output Type
0
Generic String
1
Generic String
Clean/Extract Year
Clean/Extract Year
API Action String: year
Attempts to extract a valid year from a field. Will filter out non-numeric characters.
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Merge Date Fields
Merge Date Fields
API Action String: dobmerge
Merges separate year, month, and day fields into a single field.
Input Idx
Input Type
0
Any
1
Any
2
Any
Output Idx
Output Type
0
Date
Date Extract
Date Extract
API Action String: dob
Attempts to extract a date and format it as YYYYMMDD.
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Date
DateTime Merge
DateTime Merge
API Action String: tsmerge
Merges separate date and time fields into a single field with the format YYYY-MM-DD HH:MM:SS
Input Idx
Input Type
0
Date
0
Time
Output Idx
Output Type
0
Datetime
Time to Hour
Time to Hour
API Action String: time2hour
Extracts the hour from a timestamp field.
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Format DateTime
Format DateTime
API Action String: timestamp
Reformats a full datetime field into YYYY-MM-DD HH:MM:SS
Input Idx
Input Type
0
Datetime
Output Idx
Output Type
0
Datetime
Date Transform
Date Transform
API Action String: datetransform
Transforms a date field from one format to another.
Input Idx
Input Type
0
Date
Output Idx
Output Type
0
Date
Option Idx
Options
Value
0
datetransform (Transform Type)
0 = (MMDDYYYY to YYYYMMDD)
1 = (MMDD to 0000MMDD)
2 = (DDMMYYYY to YYYYMMDD)
3 = ('Month YYYY' to YYYYMM01)
4 = ('MM DD YYYY' to YYYYMMDD)
Filter Values
API Action String: mvzonk
Filters out values matching a certain pattern.
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Option Idx
Option
Value
0
pattern (Pattern)
email (Is Email)
!email (Is Not Email)
domain (Is Domain)
!domain (Is Not Domain)
url (Is URL)
!url (Is Not URL)
phone (Is Phone)
!phone (Is Not Phone)
Stem Domain
Stem Domain
API Action String: stemdomain
Outputs the domain name for an email or url stemmed without the host.
Input Idx
Input Type
0
Domain
Output Idx
Output Type
0
Domain
Transliterate
Transliterate
API Action String: transliterate
Converts accented characters to non-accented equivalents
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Hashmap
Hashmap
API Action String: hashmap
Takes an input and applies a hashing algorithm with an optional salt.
Input Idx
Input Type
0
Any
Output Idx
Output Type
0
Generic String
Option Idx
Options
Value
0
algorithm
md5U
md5l
sha256U
sha256I
sha1U
sha1I
1
salt
any
Phoneline Type
Phoneline Type
API Action String: linetype-pe
Returns the line type (mobile, landline, etc.) and the name of the Carrier
Input Idx
Input Type
0
Phone
Output Idx
Output Type
0
Line Type
1
Generic String
LibPostal Address Standardizer
LibPostal Address Standardizer
API Action String: lpaddrstd
Takes one or more inputs and tries to extract a postal address and standardize it. Returns House number, Street, Unit, City, State, Zip, Country
Input Idx
Input Type
0
Full Address
0
...
0
Generic String
1
Generic String
2
Generic String
3
City
4
State
5
Zip
6
Country
Combine & Standardize Address
Combine & Standardize Address
API Action String: addrstd
Combines and standardizes address components into a single field.
Input Idx
Input Idx
0
Address
1
City
2
State
3
Zip
Output Idx
Output Type
0
Address
Option Idx
Option
Value
0
blankifinvalid
0 = No
1 = Yes
Reverse Geocode
Reverse Geocode
API Action String: reversegeocode
Accepts (US-based) latitude and longitude fields and attempts to output an address for the given location.
Input Idx
Input Type
0
Any (Latitude)
1
Any (Longitude)
Output Idx
Output Type
0
Generic String (Precision)
1
Address
2
City
3
State
4
Zip
5
Zip4
Simple Tag Individual
Simple Tag Individual
API Action String: stindiv
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that individual.
Input Idx
Input Type
0
First
1
Last
2
Address
3
Zip
Output Idx
Output Type
0
Generic String
Option Idx
Option
Value
0
length
1-50
Simple Tag Household
Simple Tag Household
API Action String: sthhld
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that household.
Input Idx
Input Type
0
First
1
Last
2
Address
3
Zip
Output Idx
Output Type
0
Generic String
Option Idx
Option
Value
0
lenght
1-50
Zip to Congressional District
Zip to Congressional District
API Action String: z54tocongrdist
Takes a Zip5 and Zip4 and returns the congressional district for that area.
Input Idx
Input Type
0
Zip
1
Zip4
Output Idx
Output Type
0
Generic String
Global Region
Global Region
API Action String: globalregion
Takes the name of a country and outputs the region (Africa, APAC, US/CA, LATAM, etc.)
Input Idx
Input Type
0
Country
Output Idx
Output Type
0
Generic String
IP To Location
IP To Location
API Action String: ip2loc
Takes in an IP address and outputs IP Country, IP City, IP Zip, IP ISP Name, IP Domain Name, IP Usage Type, Proxy Type, IP Block ID, IP Block Len.
Input Idx
Input Type
0
IP
Output Idx
Output Type
0
Country
1
City
2
Zip
3
Generic String
4
Generic String
5
Generic String
6
Generic String
7
Generic String
8
Generic String
Format Phone Number
Format Phone Number
API Action String: naphonefmt
Formats a phone number into a selected standard format. Only works with 10-digit (or 11-digit with country code) North American phone numbers.