List of DataPrep Actions
The list of available actions to perform are listed below. The inputs, outputs, and options for each action are included with their definition.
What it does | Cleans this | To This | |
---|---|---|---|
Checks an email field for validity. |
|||
Cleans bad characters from a name. |
Jonath4an Doe |
Jonathan Doe |
|
Cleans bad characters from a first name. |
Jonath4an |
Jonathan |
|
Cleans bad characters from a last name. |
Smi#th |
Smith |
|
Cleans bad characters from a business name. |
Micro$soft |
Microsoft |
|
Cleans bad characters from a city name. |
Seatt?le |
Seattle |
|
Checks for a valid US state and converts to 2 letter abbreviation if possible. |
Washington |
WA |
|
Checks whether a field looks like a 5-digit US postal code. Does NOT check for full validity (i.e. 00005 passes this check). |
98052 |
98052 (valid) |
|
Checks whether a field looks like a potential 9-digit US postal code and if so, splits it into two columns containing the first 5 digits and the other 4. |
90601-1051 |
[90601] |
|
Given a country name field, outputs a country code or blank if it is invalid. |
Italy |
IT |
|
Checks a phone field for validity, if invalid, outputs the number if valid, or blank otherwise. |
+14175763685 |
4175763685 |
|
Converts a field to all uppercase. |
uppercase |
UPPERCASE |
|
Capitalizes the first letter of each word in a field. |
washington |
Washington |
|
Standardizes a job role field and ranks it, lower number indicates a higher rank. |
Irrigation Sales Manager |
Sales, Agriculture |
|
Splits a full name into separate columns for prefix, firstname, middlename, lastname, and suffix, all uppercased. |
[Chris Angelo Smith] |
[CHRIS][ANGELO][SMITH] |
|
Parses a name field into separate columns for category (individual/business/unknown), prefix, firstname, middlename, lastname, suffix, and business name if applicable. |
|
|
|
Attempts to parse a fuzzy location like "Greater Seattle Area" into a meaningful location like "Seattle, WA, US". |
[Greater Seattle Area] |
[Seattle] [WA] [US] |
|
Extracts the domain from an email. |
gmail.com |
||
Checks if an email address appears to be from a free Email Service Provider or not. |
1 |
||
Checks if an IP looks like a valid public IP address, outputs blank if the check fails. |
50.242.100.253 |
50.242.100.253 |
|
Converts a dotted quad IP address into an integer value. |
206.40.146.40 |
3458765352 |
|
Converts an integer value into a dotted quad IP address. |
3458765352 |
206.40.146.40 |
|
Generates a probable email for a given first name, last name, and domain. |
John, Doe, Versium.com |
||
Checks if a name appears to match an email and outputs component match count (how many name components match the email) and a weighted score. Higher scores are better. |
Name = Wendi |
N2E Matches = 7 |
|
Attempts to extract a valid year from a field. Will filter out non-numeric characters. |
|||
Merges separate year, month, and day fields into a single field. |
|||
Attempts to extract a date and format it as YYYYMMDD. |
|||
Merges separate date and time fields into a single field with the format YYYY-MM-DD HH:MM:SS |
|||
Extracts the hour from a timestamp field. |
|||
Reformats a full datetime field into YYYY-MM-DD HH:MM:SS |
|||
Transforms a date field from one format to another. |
|||
Filters out values matching a certain pattern. |
|||
Outputs the domain name for an email or url stemmed without the host. |
|||
Converts accented characters to non-accented equivalents. |
|||
Takes an input and applies a hashing algorithm with an optional salt. |
|||
Returns the line type (mobile, landline, etc.) and the name of the Carrier. |
|||
Takes one or more inputs and tries to extract a postal address and standardize it. Returns Housenumber, Street, Unit, City, State, Zip, Country. |
[5550 Newcastle ave #555, Encino, CA, 91316, United States] |
[5550] [Newcastle ave] [#555] [Encino] [CA] [91316] [United States] |
|
Combines and standardizes address components into a single field. |
[5550 Newcastle Ave #555] [Encino] [CA] [91316] |
[5550 Newcastle Ave #555, Encino, CA, 91316] |
|
Accepts (US-based) latitude and longitude fields and attempts to output an address for the given location. |
[47.62230901][-122.3486291] |
[1] [114 REPUBLICAN ST] [SEATTLE] [WA] [98109] [4534] |
|
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that individual. |
|||
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that household. |
|||
Takes a Zip5 and Zip4 and returns the congressional district for that area. |
|||
Takes the name of a country and outputs the region (Africa, APAC, US/CA, LATAM, etc.) |
Italy |
EMEA |
|
Takes in an IP address and outputs IP Country, IP City, IP Zip, IP ISP Name, IP Domain Name, IP Usage Type, Proxy Type, IP Block ID, IP Block Len. |
[161.69.123.10] |
[US] [NY] [New York City] [DCH [VPN] |
|
Formats a phone number into a selected standard format. Only works with 10-digit (or 11-digit with country code) North American phone numbers. |
[+1 417.576 3685] |
[+1 (417) 576-3685] |
Clean Email
Clean Email
API Action String: email
Checks an email field for validity. Outputs the email on success and blank on fail. Provides light correction (e.g. gmail.co becomes gmail.com).
Input Idx | Input Type |
---|---|
0 |
Output Idx | Output Type |
---|---|
0 |
Option Idx | Option | Values |
---|---|---|
0 |
aggressive |
0 = No |
Examples:
{
"inputs": [
{
"FirstName": "John",
"LastName": "Smith",
"EmailAddr": "[email protected]"
},
{
"FirstName": "Jane",
"LastName": "Williams",
"EmailAddr": "[email protected]"
}
],
"actions": [
{
"name": "email",
"inputFields": [
"EmailAddr"
],
"outputFields": [
"EmailAddrClean"
],
"options": {
"aggressive": 1
}
}
],
"output": [
"FirstName",
"LastName",
"EmailAddrClean",
"EmailAddr"
]
}
https://api.versium.com/v2/dataprep?actions[]=email:EmailAddr:1&FirstName=John&LastName=Smith&[email protected]&output=FirstName,LastName,EmailAddrClean,EmailAddr
https://api.versium.com/v2/dataprep?actions[]=email:EmailAddr:1&FirstName=Jane&LastName=Williams&[email protected]&output=FirstName,LastName,EmailAddrClean,EmailAddr
{
"versium": {
"version": "2.0",
"match_counts": [],
"num_matches": 0,
"num_results": 1,
"query_id": "0fe4aa159cca6853dd",
"query_time": 0.145,
"results": [
{
"FirstName": "John",
"LastName": "Smith",
"EmailAddrClean": "",
"EmailAddr": "[email protected]"
},
{
"FirstName": "Jane",
"LastName": "Williams",
"EmailAddrClean": "[email protected]",
"EmailAddr": "[email protected]"
}
]
}
}
Clean Name
Clean name
API Action String: name
Cleans bad characters from a name (only allows alphabetic characters).
Input Idx | Input Type |
---|---|
0 | Fullname |
Output Idx | Output Type |
---|---|
0 | Fullname |
Clean First Name
Clean First Name
API Action String: first
Cleans bad characters from a first name (only allows alphabetic characters).
Input Idx | Input Type |
---|---|
0 | First |
Output Idx | Output Type |
---|---|
0 | First |
Clean Last Name
Clean Last Name
API Action String: last
Cleans bad characters from a last name (only allows alphabetic characters).
Input Idx | Input Type |
---|---|
0 | Last |
Output Idx | Output Type |
---|---|
0 | Last |
Clean Business Name
Clean Business Name
API Action String: busname
Cleans bad characters from a business name (only allows alphanumeric characters).
Input Idx | Input Type |
---|---|
0 | Business |
Output Idx | Output Type |
---|---|
0 | Business |
Clean City Name
Clean City Name
API Action String: city
Cleans bad characters from a city name (only allows alphabetic characters).
Input Idx | Input Type |
---|---|
0 | City |
Output Idx | Output Type |
---|---|
0 | City |
Clean State Name
Clean State Name
API Action String: state
Checks for a valid US state and converts to 2 letter abbreviation if possible.
Input Idx | Input Type |
---|---|
0 | State |
Output Idx | Ouput Type |
---|---|
0 | State |
US ZIP5 Check
US ZIP5 Check
API Action String: uszip5
Checks whether a field looks like a 5-digit US postal code. Does NOT check for full validity (i.e. 00005 passes this check).
Input Idx | Input Type |
---|---|
0 | Zip |
Outut Idx | Output Type |
---|---|
0 | Zip |
US ZIP9 Check & Split
US ZIP9 Check & Split
API Action String: uszip9
Checks whether a field looks like a potential 9-digit US postal code and if so, splits it into two columns containing the first 5 digits and the other 4.
Input Idx | Input Type |
---|---|
0 | Zip |
Output Idx | Output Type |
---|---|
0 | Zip |
Clean Country Code
Clean Country Code
API Action String: country
Given a country name field, outputs a country code or blank if it is invalid (allows only alphabetic characters).
Input Idx | Input Type |
---|---|
0 | Country |
Output Idx | Output Type |
---|---|
0 | Country |
Clean Phone Number
Clean Phone Number
API Action String: phone
Checks a phone field for validity, if invalid, outputs the number if valid, or blank otherwise.
Input Idx | Input Type |
---|---|
0 | Phone |
Output Idx | Output Type |
---|---|
0 | Phone |
Uppercase
Uppercase
API Action String: strtoupper
Converts a field to all uppercase
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Capitalize Words
Capitalize Words
API Action String: ucwords
Capitalizes the first letter of each word in a field.
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Standardize & Rank Job Role
Standardize & Rank Job Role
API Action String: titlerank3
Standardizes a job role field and ranks it, lower number indicates a higher rank.
Input Idx | Input Type |
---|---|
0 | Title |
Output Idx | Output Type |
---|---|
0 | Title Rank 3 |
1 | Generic String |
Split Full Name
Split Full Name
API Action String: splitfullname2
Splits a full name into separate columns for prefix, firstname, middlename, lastname, and suffix, all uppercased.
Input Idx | Input Type |
---|---|
0 | Fullname |
Output Idx | Output Type |
---|---|
0 | Generic String |
1 | First |
2 | Generic String |
3 | Last |
4 | Generic String |
Categorize & Split Name
Categorize & Split Name
API Action String: namecatparse
Parses a name field into separate columns for category (individual/business/unknown), prefix, firstname, middlename, lastname, suffix, and business name if applicable.
Input Idx | Input Type |
---|---|
0 | Fullname |
Output Idx | Output Type |
---|---|
0 | Generic String |
1 | Generic String |
2 | First |
3 | Generic String |
4 | Last |
5 | Generic String |
6 | Business |
Examples:
{
"inputs": [
{
"FullName": "John Smith"
},
{
"FullName": "Jane Williams"
}
],
"actions": [
{
"name": "namecatparse",
"inputFields": [
"FullName"
],
"outputFields": [
"EntityCategory",
"Prefix",
"First",
"Middle",
"Last",
"Suffix",
"BusName"
]
}
],
"output": [
"EntityCategory",
"First",
"Middle",
"Last",
"BusName"
]
}
http://api.versium.com/v2/dataprep?actions[]=namecatparse:FullName:EntityCategory,Prefix,First,Middle,Last,Suffix,BusName&FullName=John Smith&output=EntityCategory,First,Middle,Last,BusName
http://api.versium.com/v2/dataprep?actions[]=namecatparse:FullName:EntityCategory,Prefix,First,Middle,Last,Suffix,BusName&FullName=Jane Williams&output=EntityCategory,First,Middle,Last,BusName
{
"versium": {
"version": "2.0",
"match_counts": [],
"num_matches": 0,
"num_results": 1,
"query_id": "0fe4aa159cca6853dd",
"query_time": 0.145,
"results": [
{
"EntityCategory": "Individual",
"First": "John",
"Middle": "",
"Last": "Smith",
"BusName": ""
},
{
"EntityCategory": "Individual",
"First": "Jane",
"Middle": "",
"Last": "Williams",
"BusName": ""
}
]
}
}
Fix Fuzzy Location
Fix Fuzzy Location
API Action String: tlilocmap
Attempts to parse a fuzzy location like "Greater Seattle Area" into a meaningful location like "Seattle, WA, US".
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Address |
1 | City |
2 | State |
3 | Zip |
4 | Country |
Extract Domain
Extract Domain
API Action String: domain
Extracts the domain from an email.
Input Idx | Input Type |
---|---|
0 |
Output Idx | Output Type |
---|---|
0 | Domain |
ESP Email Check
ESP Email Check
API Action String: isespfree
Checks if an email address appears to be from a free Email Service Provider or not (i.e. 1 = Free ESP, 0 = Private ESP)
Input Idx | Input Type |
---|---|
0 |
Output Idx | Output Type |
---|---|
0 | Generic String |
Public IP Address Check
Public IP Address Check
API Action String: ip
Checks if an IP looks like a valid public IP address, outputs blank if the check fails.
Input Idx | Input Type |
---|---|
0 | Ip |
Output Idx | Output Type |
---|---|
0 | Ip |
IP Address to Integer
IP Address to Integer
API Action String: ip2long
Converts a dotted quad IP address into an integer value. (e.g. 206.40.146.40 becomes 3458765352)
Input Idx | Input Type |
---|---|
0 | Ip |
Output Idx | Output Type |
---|---|
0 | Generic String |
Integer to IP Address
Integer to IP Address
API Action String: long2ip
Converts an integer value into a dotted quad IP address. (e.g. 3458765352 becomes 206.40.146.40)
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Ip |
Generate Patterned Email
Generate Patterned Email
API Action String: gpe
Generates a probable email for a given first name, last name, and domain.
Input Idx | Input Type |
---|---|
0 | First |
1 | Last |
2 | Domain |
Output Idx | Output Type |
---|---|
0 |
Name-To-Email Check
Name-To-Email Check
API Action String: n2echeck
Checks if a name appears to match an email and outputs component match count (how many name components match the email) and a weighted score. Higher scores are better.
Input Idx | Input Type |
---|---|
0 | |
1 | First |
2 | Last |
Output Idx | Output Type |
---|---|
0 | Generic String |
1 | Generic String |
Clean/Extract Year
Clean/Extract Year
API Action String: year
Attempts to extract a valid year from a field. Will filter out non-numeric characters.
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Merge Date Fields
Merge Date Fields
API Action String: dobmerge
Merges separate year, month, and day fields into a single field.
Input Idx | Input Type |
---|---|
0 | Any |
1 | Any |
2 | Any |
Output Idx | Output Type |
---|---|
0 | Date |
Date Extract
Date Extract
API Action String: dob
Attempts to extract a date and format it as YYYYMMDD.
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Date |
DateTime Merge
DateTime Merge
API Action String: tsmerge
Merges separate date and time fields into a single field with the format YYYY-MM-DD HH:MM:SS
Input Idx | Input Type |
---|---|
0 | Date |
0 | Time |
Output Idx | Output Type |
---|---|
0 | Datetime |
Time to Hour
Time to Hour
API Action String: time2hour
Extracts the hour from a timestamp field.
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Format DateTime
Format DateTime
API Action String: timestamp
Reformats a full datetime field into YYYY-MM-DD HH:MM:SS
Input Idx | Input Type |
---|---|
0 | Datetime |
Output Idx | Output Type |
---|---|
0 | Datetime |
Date Transform
Date Transform
API Action String: datetransform
Transforms a date field from one format to another.
Input Idx | Input Type |
---|---|
0 | Date |
Output Idx | Output Type |
---|---|
0 | Date |
Option Idx | Options | Value |
---|---|---|
0 |
datetransform (Transform Type) |
0 = (MMDDYYYY to YYYYMMDD) |
Examples:
{
"inputs": [
{
"FirstName": "John",
"LastName": "Smith",
"DOB": "01 15 1980"
},
{
"FirstName": "Jane",
"LastName": "Williams",
"DOB": "06 24 1990"
}
],
"actions": [
{
"name": "datetransform",
"inputFields": [
"DOB"
],
"outputFields": [
"DOB"
],
"options": {
"datetransform": 4
}
}
]
}
http://api.versium.com/v2/dataprep?actions[]=datetransform:DOB:DOB:4&FirstName=John&LastName=Smith&DOB=01 15 1980
http://api.versium.com/v2/dataprep?actions[]=datetransform:DOB:DOB:4&FirstName=Jane&LastName=Williams&DOB=06 24 1990
{
"versium": {
"version": "2.0",
"match_counts": [],
"num_matches": 0,
"num_results": 1,
"query_id": "0fe4aa159cca6853dd",
"query_time": 0.145,
"results": [
{
"FirstName": "John",
"LastName": "Smith",
"DOB": "19800115",
"EmailAddr": "[email protected]"
},
{
"FirstName": "Jane",
"LastName": "Williams",
"DOB": "19900624",
"EmailAddr": "[email protected]"
}
]
}
}
Filter Values
Filter Values
API Action String: mvzonk
Filters out values matching a certain pattern.
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Option Idx | Option | Value |
---|---|---|
0 |
pattern (Pattern) |
email (Is Email) |
Stem Domain
Stem Domain
API Action String: stemdomain
Outputs the domain name for an email or url stemmed without the host.
Input Idx | Input Type |
---|---|
0 | Domain |
Output Idx | Output Type |
---|---|
0 | Domain |
Transliterate
Transliterate
API Action String: transliterate
Converts accented characters to non-accented equivalents
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Hashmap
Hashmap
API Action String: hashmap
Takes an input and applies a hashing algorithm with an optional salt.
Input Idx | Input Type |
---|---|
0 | Any |
Output Idx | Output Type |
---|---|
0 | Generic String |
Option Idx | Options | Value |
---|---|---|
0 |
algorithm |
md5U |
1 |
salt |
any |
Phoneline Type
Phoneline Type
API Action String: linetype-pe
Returns the line type (mobile, landline, etc.) and the name of the Carrier
Input Idx | Input Type |
---|---|
0 | Phone |
Output Idx | Output Type |
---|---|
0 | Line Type |
1 | Generic String |
LibPostal Address Standardizer
LibPostal Address Standardizer
API Action String: lpaddrstd
Takes one or more inputs and tries to extract a postal address and standardize it. Returns House number, Street, Unit, City, State, Zip, Country
Input Idx | Input Type |
---|---|
0 | Full Address |
0 | ... |
0 | Generic String |
1 | Generic String |
2 | Generic String |
3 | City |
4 | State |
5 | Zip |
6 | Country |
Combine & Standardize Address
Combine & Standardize Address
API Action String: addrstd
Combines and standardizes address components into a single field.
Input Idx | Input Idx |
---|---|
0 | Address |
1 | City |
2 | State |
3 | Zip |
Output Idx | Output Type |
---|---|
0 | Address |
Option Idx | Option | Value |
---|---|---|
0 |
blankifinvalid |
0 = No |
Reverse Geocode
Reverse Geocode
API Action String: reversegeocode
Accepts (US-based) latitude and longitude fields and attempts to output an address for the given location.
Input Idx | Input Type |
---|---|
0 | Any (Latitude) |
1 | Any (Longitude) |
Output Idx | Output Type |
---|---|
0 | Generic String (Precision) |
1 | Address |
2 | City |
3 | State |
4 | Zip |
5 | Zip4 |
Simple Tag Individual
Simple Tag Individual
API Action String: stindiv
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that individual.
Input Idx | Input Type |
---|---|
0 | First |
1 | Last |
2 | Address |
3 | Zip |
Output Idx | Output Type |
---|---|
0 | Generic String |
Option Idx | Option | Value |
---|---|---|
0 | length | 1-50 |
Simple Tag Household
Simple Tag Household
API Action String: sthhld
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that household.
Input Idx | Input Type |
---|---|
0 | First |
1 | Last |
2 | Address |
3 | Zip |
Output Idx | Output Type |
---|---|
0 | Generic String |
Option Idx | Option | Value |
---|---|---|
0 | lenght | 1-50 |
Zip to Congressional District
Zip to Congressional District
API Action String: z54tocongrdist
Takes a Zip5 and Zip4 and returns the congressional district for that area.
Input Idx | Input Type |
---|---|
0 | Zip |
1 | Zip4 |
Output Idx | Output Type |
---|---|
0 | Generic String |
Global Region
Global Region
API Action String: globalregion
Takes the name of a country and outputs the region (Africa, APAC, US/CA, LATAM, etc.)
Input Idx | Input Type |
---|---|
0 | Country |
Output Idx | Output Type |
---|---|
0 | Generic String |
IP To Location
IP To Location
API Action String: ip2loc
Takes in an IP address and outputs IP Country, IP City, IP Zip, IP ISP Name, IP Domain Name, IP Usage Type, Proxy Type, IP Block ID, IP Block Len.
Input Idx | Input Type |
---|---|
0 | IP |
Output Idx | Output Type |
---|---|
0 | Country |
1 | City |
2 | Zip |
3 | Generic String |
4 | Generic String |
5 | Generic String |
6 | Generic String |
7 | Generic String |
8 | Generic String |
Format Phone Number
Format Phone Number
API Action String: naphonefmt
Formats a phone number into a selected standard format. Only works with 10-digit (or 11-digit with country code) North American phone numbers.
Input Idx | Input Type |
---|---|
0 | Phone |
Output Idx | Output Type |
---|---|
0 | Phone |
Option Idx | Option | Value |
---|---|---|
0 |
phoneformat |
0 = (XXX) XXX-XXXX |
1 |
includenacountrycode |
0 = No (Always Stripped) |