DataPrep API

DataPrep performs a series of data cleansing and manipulation actions on inputs and returns the results of those actions. DataPrep actions can generate new fields, enriching the input records with extra data points. They can also overwrite existing fields, replacing messy inputs with cleansed data. Control over whether a new field is created or an existing field is overwritten lies completely with the end user.

Actions can take multiple inputs and have multiple outputs depending on the particular operation they perform. Actions can be chained together in a sequence such that the output(s) of one action becomes the input(s) of a subsequent action. The API takes the name of an action, input fields, output fields, and sometimes options that change the behavior of the action.

There are two ways to call the API, a Long-form API and a Short-form API form. Both forms provide the same functionality, except that the short-form only allows a single input record at a time while the long-form allows for batching multiple records into a single request.

List of DataPrep Actions

The list of available actions to perform are listed below. The inputs, outputs, and options for each action are included with their definition.

What it doesCleans thisTo This
Clean EmailChecks an email field for validity.[email protected][email protected]
Clean NameCleans bad characters from a name.Jonath4an DoeJonathan Doe
Clean First NameCleans bad characters from a first name.Jonath4anJonathan
Clean Last NameCleans bad characters from a last name.Smi#thSmith
Clean Business NameCleans bad characters from a business name.Micro$softMicrosoft
Clean City NameCleans bad characters from a city name.Seatt?leSeattle
Clean State NameChecks for a valid US state and converts to 2 letter abbreviation if possible.WashingtonWA
US ZIP5 CheckChecks whether a field looks like a 5-digit US postal code. Does NOT check for full validity (i.e. 00005 passes this check).9805298052 (valid)
[blank] (in invalid)
US ZIP9 Check & SplitChecks whether a field looks like a potential 9-digit US postal code and if so, splits it into two columns containing the first 5 digits and the other 4.90601-1051[90601]
[1051]
Clean Country CodeGiven a country name field, outputs a country code or blank if it is invalid.ItalyIT
Clean Phone NumberChecks a phone field for validity, if invalid, outputs the number if valid, or blank otherwise.+141757636854175763685
UppercaseConverts a field to all uppercase.uppercaseUPPERCASE
Capitalize WordsCapitalizes the first letter of each word in a field.washingtonWashington
Standardize & Rank Job RoleStandardizes a job role field and ranks it, lower number indicates a higher rank.Irrigation Sales ManagerSales, Agriculture
Split Full NameSplits a full name into separate columns for prefix, firstname, middlename, lastname, and suffix, all uppercased.[Chris Angelo Smith][CHRIS][ANGELO][SMITH]
Categorize & Split NameParses a name field into separate columns for category (individual/business/unknown), prefix, firstname, middlename, lastname, suffix, and business name if applicable.- [Chris Angelo Smith]
- [Versium Analytics]
- [INDIVIDUAL][CHRIS][ANGELO][SMITH]
- [BUSINESS][Versium Analytics]
Fix Fuzzy LocationAttempts to parse a fuzzy location like "Greater Seattle Area" into a meaningful location like "Seattle, WA, US".[Greater Seattle Area][Seattle] [WA] [US]
Extract DomainExtracts the domain from an email.[email protected]gmail.com
ESP Email CheckChecks if an email address appears to be from a free Email Service Provider or not.[email protected]
[email protected]
1
0
Public IP Address CheckChecks if an IP looks like a valid public IP address, outputs blank if the check fails.50.242.100.253
127.0.0.1
50.242.100.253
[blank]
IP Address to IntegerConverts a dotted quad IP address into an integer value.206.40.146.403458765352
Integer to IP AddressConverts an integer value into a dotted quad IP address.3458765352206.40.146.40
Generate Patterned EmailGenerates a probable email for a given first name, last name, and domain.John, Doe, Versium.com[email protected]
Name-To-Email CheckChecks if a name appears to match an email and outputs component match count (how many name components match the email) and a weighted score. Higher scores are better.Name = Wendi
Last = Mitchell
Email = [email protected]
N2E Matches = 7
N2E Score = 113
Clean/Extract YearAttempts to extract a valid year from a field. Will filter out non-numeric characters.
Merge Date FieldsMerges separate year, month, and day fields into a single field.
Date ExtractAttempts to extract a date and format it as YYYYMMDD.
DateTime MergeMerges separate date and time fields into a single field with the format YYYY-MM-DD HH:MM:SS
Time to HourExtracts the hour from a timestamp field.
Format DateTimeReformats a full datetime field into YYYY-MM-DD HH:MM:SS
Date TransformTransforms a date field from one format to another.
Filter ValuesFilters out values matching a certain pattern.
Stem DomainOutputs the domain name for an email or url stemmed without the host.
TransliterateConverts accented characters to non-accented equivalents.
HashmapTakes an input and applies a hashing algorithm with an optional salt.
Phone Line TypeReturns the line type (mobile, landline, etc.) and the name of the Carrier.
LibPostal Address StandardizerTakes one or more inputs and tries to extract a postal address and standardize it. Returns Housenumber, Street, Unit, City, State, Zip, Country.[5550 Newcastle ave #555, Encino, CA, 91316, United States][5550] [Newcastle ave] [#555] [Encino] [CA] [91316] [United States]
Combine & Standardize AddressCombines and standardizes address components into a single field.[5550 Newcastle Ave #555] [Encino] [CA] [91316][5550 Newcastle Ave #555, Encino, CA, 91316]
Reverse GeocodeAccepts (US-based) latitude and longitude fields and attempts to output an address for the given location.[47.62230901][-122.3486291][1] [114 REPUBLICAN ST] [SEATTLE] [WA] [98109] [4534]
Simple Tag IndividualTakes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that individual.
Simple Tag HouseholdTakes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that household.
Zip to Congressional DistrictTakes a Zip5 and Zip4 and returns the congressional district for that area.
Global RegionTakes the name of a country and outputs the region (Africa, APAC, US/CA, LATAM, etc.)ItalyEMEA
IP to LocationTakes in an IP address and outputs IP Country, IP City, IP Zip, IP ISP Name, IP Domain Name, IP Usage Type, Proxy Type, IP Block ID, IP Block Len.[161.69.123.10][US] [NY] [New York City] [DCH [VPN]
Format Phone NumberFormats a phone number into a selected standard format. Only works with 10-digit (or 11-digit with country code) North American phone numbers.[+1 417.576 3685][+1 (417) 576-3685]

Clean Email

Clean Email
API Action String: email
Checks an email field for validity. Outputs the email on success and blank on fail. Provides light correction (e.g. gmail.co becomes gmail.com).

Input IdxInput Type
0Email
Output IdxOutput Type
0Email
Option IdxOptionValues
0aggressive0 = No
1= Yes

Examples:

{
  "inputs": [
    {
      "FirstName": "John",
      "LastName": "Smith",
      "EmailAddr": "[email protected]"
    },
    {
      "FirstName": "Jane",
      "LastName": "Williams",
      "EmailAddr": "[email protected]"
    }
  ],
  "actions": [
    {
      "name": "email",
      "inputFields": [
        "EmailAddr"
      ],
      "outputFields": [
        "EmailAddrClean"
      ],
      "options": {
        "aggressive": 1
      }
    }
  ],
  "output": [
    "FirstName",
    "LastName",
    "EmailAddrClean",
    "EmailAddr"
  ]
}
https://api.versium.com/v2/dataprep?actions[]=email:EmailAddr:1&FirstName=John&LastName=Smith&[email protected]&output=FirstName,LastName,EmailAddrClean,EmailAddr

https://api.versium.com/v2/dataprep?actions[]=email:EmailAddr:1&FirstName=Jane&LastName=Williams&[email protected]&output=FirstName,LastName,EmailAddrClean,EmailAddr
{
  "versium": {
    "version": "2.0",
    "match_counts": [],
    "num_matches": 0,
    "num_results": 1,
    "query_id": "0fe4aa159cca6853dd",
    "query_time": 0.145,
    "results": [
      {
        "FirstName": "John",
        "LastName": "Smith",
        "EmailAddrClean": "",
        "EmailAddr": "[email protected]"
      },
      {
        "FirstName": "Jane",
        "LastName": "Williams",
        "EmailAddrClean": "[email protected]",
        "EmailAddr": "[email protected]"
      }
    ]
  }
}

Clean Name

Clean name
API Action String: name
Cleans bad characters from a name (only allows alphabetic characters).

Input IdxInput Type
0Fullname
Output IdxOutput Type
0Fullname

Clean First Name

Clean First Name
API Action String: first
Cleans bad characters from a first name (only allows alphabetic characters).

Input IdxInput Type
0First
Output IdxOutput Type
0First

Clean Last Name

Clean Last Name
API Action String: last
Cleans bad characters from a last name (only allows alphabetic characters).

Input IdxInput Type
0Last
Output IdxOutput Type
0Last

Clean Business Name

Clean Business Name
API Action String: busname
Cleans bad characters from a business name (only allows alphanumeric characters).

Input IdxInput Type
0Business
Output IdxOutput Type
0Business

Clean City Name

Clean City Name
API Action String: city
Cleans bad characters from a city name (only allows alphabetic characters).

Input IdxInput Type
0City
Output IdxOutput Type
0City

Clean State Name

Clean State Name
API Action String: state
Checks for a valid US state and converts to 2 letter abbreviation if possible.

Input IdxInput Type
0State
Output IdxOuput Type
0State

US ZIP5 Check

US ZIP5 Check
API Action String: uszip5
Checks whether a field looks like a 5-digit US postal code. Does NOT check for full validity (i.e. 00005 passes this check).

Input IdxInput Type
0Zip
Outut IdxOutput Type
0Zip

US ZIP9 Check & Split

US ZIP9 Check & Split
API Action String: uszip9
Checks whether a field looks like a potential 9-digit US postal code and if so, splits it into two columns containing the first 5 digits and the other 4.

Input IdxInput Type
0Zip
Output IdxOutput Type
0Zip

Clean Country Code

Clean Country Code
API Action String: country
Given a country name field, outputs a country code or blank if it is invalid (allows only alphabetic characters).

Input IdxInput Type
0Country
Output IdxOutput Type
0Country

Clean Phone Number

Clean Phone Number
API Action String: phone
Checks a phone field for validity, if invalid, outputs the number if valid, or blank otherwise.

Input IdxInput Type
0Phone
Output IdxOutput Type
0Phone

Uppercase

Uppercase
API Action String: strtoupper
Converts a field to all uppercase

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String

Capitalize Words

Capitalize Words
API Action String: ucwords
Capitalizes the first letter of each word in a field.

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String

Standardize & Rank Job Role

Standardize & Rank Job Role
API Action String: titlerank3
Standardizes a job role field and ranks it, lower number indicates a higher rank.

Input IdxInput Type
0Title
Output IdxOutput Type
0Title Rank 3
1Generic String

Split Full Name

Split Full Name
API Action String: splitfullname2
Splits a full name into separate columns for prefix, firstname, middlename, lastname, and suffix, all uppercased.

Input IdxInput Type
0Fullname
Output IdxOutput Type
0Generic String
1First
2Generic String
3Last
4Generic String

Categorize & Split Name

Categorize & Split Name
API Action String: namecatparse
Parses a name field into separate columns for category (individual/business/unknown), prefix, firstname, middlename, lastname, suffix, and business name if applicable.

Input IdxInput Type
0Fullname
Output IdxOutput Type
0Generic String
1Generic String
2First
3Generic String
4Last
5Generic String
6Business

Examples:

{
  "inputs": [
    {
      "FullName": "John Smith"
    },
    {
      "FullName": "Jane Williams"
    }
  ],
  "actions": [
    {
      "name": "namecatparse",
      "inputFields": [
        "FullName"
      ],
      "outputFields": [
        "EntityCategory",
        "Prefix",
        "First",
        "Middle",
        "Last",
        "Suffix",
        "BusName"
      ]
    }
  ],
  "output": [
    "EntityCategory",
    "First",
    "Middle",
    "Last",
    "BusName"
  ]
}
http://api.versium.com/v2/dataprep?actions[]=namecatparse:FullName:EntityCategory,Prefix,First,Middle,Last,Suffix,BusName&FullName=John Smith&output=EntityCategory,First,Middle,Last,BusName

http://api.versium.com/v2/dataprep?actions[]=namecatparse:FullName:EntityCategory,Prefix,First,Middle,Last,Suffix,BusName&FullName=Jane Williams&output=EntityCategory,First,Middle,Last,BusName
{
  "versium": {
    "version": "2.0",
    "match_counts": [],
    "num_matches": 0,
    "num_results": 1,
    "query_id": "0fe4aa159cca6853dd",
    "query_time": 0.145,
    "results": [
      {
        "EntityCategory": "Individual",
        "First": "John",
        "Middle": "",
        "Last": "Smith",
        "BusName": ""
      },
      {
        "EntityCategory": "Individual",
        "First": "Jane",
        "Middle": "",
        "Last": "Williams",
        "BusName": ""
      }
    ]
  }
}

Fix Fuzzy Location

Fix Fuzzy Location
API Action String: tlilocmap
Attempts to parse a fuzzy location like "Greater Seattle Area" into a meaningful location like "Seattle, WA, US".

Input IdxInput Type
0Any
Output IdxOutput Type
0Address
1City
2State
3Zip
4Country

Extract Domain

Extract Domain
API Action String: domain
Extracts the domain from an email.

Input IdxInput Type
0Email
Output IdxOutput Type
0Domain

ESP Email Check

ESP Email Check
API Action String: isespfree
Checks if an email address appears to be from a free Email Service Provider or not (i.e. 1 = Free ESP, 0 = Private ESP)

Input IdxInput Type
0Email
Output IdxOutput Type
0Generic String

Public IP Address Check

Public IP Address Check
API Action String: ip
Checks if an IP looks like a valid public IP address, outputs blank if the check fails.

Input IdxInput Type
0Ip
Output IdxOutput Type
0Ip

IP Address to Integer

IP Address to Integer
API Action String: ip2long
Converts a dotted quad IP address into an integer value. (e.g. 206.40.146.40 becomes 3458765352)

Input IdxInput Type
0Ip
Output IdxOutput Type
0Generic String

Integer to IP Address

Integer to IP Address
API Action String: long2ip
Converts an integer value into a dotted quad IP address. (e.g. 3458765352 becomes 206.40.146.40)

Input IdxInput Type
0Any
Output IdxOutput Type
0Ip

Generate Patterned Email

Generate Patterned Email
API Action String: gpe
Generates a probable email for a given first name, last name, and domain.

Input IdxInput Type
0First
1Last
2Domain
Output IdxOutput Type
0Email

Name-To-Email Check

Name-To-Email Check
API Action String: n2echeck
Checks if a name appears to match an email and outputs component match count (how many name components match the email) and a weighted score. Higher scores are better.

Input IdxInput Type
0Email
1First
2Last
Output IdxOutput Type
0Generic String
1Generic String

Clean/Extract Year

Clean/Extract Year
API Action String: year
Attempts to extract a valid year from a field. Will filter out non-numeric characters.

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String

Merge Date Fields

Merge Date Fields
API Action String: dobmerge
Merges separate year, month, and day fields into a single field.

Input IdxInput Type
0Any
1Any
2Any
Output IdxOutput Type
0Date

Date Extract

Date Extract
API Action String: dob
Attempts to extract a date and format it as YYYYMMDD.

Input IdxInput Type
0Any
Output IdxOutput Type
0Date

DateTime Merge

DateTime Merge
API Action String: tsmerge
Merges separate date and time fields into a single field with the format YYYY-MM-DD HH:MM:SS

Input IdxInput Type
0Date
0Time
Output IdxOutput Type
0Datetime

Time to Hour

Time to Hour
API Action String: time2hour
Extracts the hour from a timestamp field.

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String

Format DateTime

Format DateTime
API Action String: timestamp
Reformats a full datetime field into YYYY-MM-DD HH:MM:SS

Input IdxInput Type
0Datetime
Output IdxOutput Type
0Datetime

Date Transform

Date Transform
API Action String: datetransform
Transforms a date field from one format to another.

Input IdxInput Type
0Date
Output IdxOutput Type
0Date
Option IdxOptionsValue
0datetransform (Transform Type)0 = (MMDDYYYY to YYYYMMDD)
1 = (MMDD to 0000MMDD)
2 = (DDMMYYYY to YYYYMMDD)
3 = ('Month YYYY' to YYYYMM01)
4 = ('MM DD YYYY' to YYYYMMDD)

Examples:

{
  "inputs": [
    {
      "FirstName": "John",
      "LastName": "Smith",
      "DOB": "01 15 1980"
    },
    {
      "FirstName": "Jane",
      "LastName": "Williams",
      "DOB": "06 24 1990"
    }
  ],
  "actions": [
    {
      "name": "datetransform",
      "inputFields": [
        "DOB"
      ],
      "outputFields": [
        "DOB"
      ],
      "options": {
        "datetransform": 4
      }
    }
  ]
}
http://api.versium.com/v2/dataprep?actions[]=datetransform:DOB:DOB:4&FirstName=John&LastName=Smith&DOB=01 15 1980

http://api.versium.com/v2/dataprep?actions[]=datetransform:DOB:DOB:4&FirstName=Jane&LastName=Williams&DOB=06 24 1990
{
  "versium": {
    "version": "2.0",
    "match_counts": [],
    "num_matches": 0,
    "num_results": 1,
    "query_id": "0fe4aa159cca6853dd",
    "query_time": 0.145,
    "results": [
      {
        "FirstName": "John",
        "LastName": "Smith",
        "DOB": "19800115",
        "EmailAddr": "[email protected]"
      },
      {
        "FirstName": "Jane",
        "LastName": "Williams",
        "DOB": "19900624",
        "EmailAddr": "[email protected]"
      }
    ]
  }
}

Filter Values

Filter Values
API Action String: mvzonk
Filters out values matching a certain pattern.

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String
Option IdxOptionValue
0pattern (Pattern)email (Is Email)
!email (Is Not Email)
domain (Is Domain)
!domain (Is Not Domain)
url (Is URL)
!url (Is Not URL)
phone (Is Phone)
!phone (Is Not Phone)

Stem Domain

Stem Domain

API Action String: stemdomain
Outputs the domain name for an email or url stemmed without the host.

Input IdxInput Type
0Domain
Output IdxOutput Type
0Domain

Transliterate

Transliterate

API Action String: transliterate
Converts accented characters to non-accented equivalents

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String

Hashmap

Hashmap

API Action String: hashmap
Takes an input and applies a hashing algorithm with an optional salt.

Input IdxInput Type
0Any
Output IdxOutput Type
0Generic String
Option IdxOptionsValue
0algorithmmd5U
md5l
sha256U
sha256I
sha1U
sha1I
1saltany

Phoneline Type

Phoneline Type

API Action String: linetype-pe
Returns the line type (mobile, landline, etc.) and the name of the Carrier

Input IdxInput Type
0Phone
Output IdxOutput Type
0Line Type
1Generic String

LibPostal Address Standardizer

LibPostal Address Standardizer

API Action String: lpaddrstd
Takes one or more inputs and tries to extract a postal address and standardize it. Returns House number, Street, Unit, City, State, Zip, Country

Input IdxInput Type
0Full Address
0...
0Generic String
1Generic String
2Generic String
3City
4State
5Zip
6Country

Combine & Standardize Address

Combine & Standardize Address

API Action String: addrstd
Combines and standardizes address components into a single field.

Input IdxInput Idx
0Address
1City
2State
3Zip
Output IdxOutput Type
0Address
Option IdxOptionValue
0blankifinvalid0 = No
1 = Yes

Reverse Geocode

Reverse Geocode

API Action String: reversegeocode
Accepts (US-based) latitude and longitude fields and attempts to output an address for the given location.

Input IdxInput Type
0Any (Latitude)
1Any (Longitude)
Output IdxOutput Type
0Generic String (Precision)
1Address
2City
3State
4Zip
5Zip4

Simple Tag Individual

Simple Tag Individual

API Action String: stindiv
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that individual.

Input IdxInput Type
0First
1Last
2Address
3Zip
Output IdxOutput Type
0Generic String
Option IdxOptionValue
0length1-50

Simple Tag Household

Simple Tag Household

API Action String: sthhld
Takes a first name, last name, address, and zip as inputs and creates a pseudo-unique identifier for that household.

Input IdxInput Type
0First
1Last
2Address
3Zip
Output IdxOutput Type
0Generic String
Option IdxOptionValue
0lenght1-50

Zip to Congressional District

Zip to Congressional District

API Action String: z54tocongrdist
Takes a Zip5 and Zip4 and returns the congressional district for that area.

Input IdxInput Type
0Zip
1Zip4
Output IdxOutput Type
0Generic String

Global Region

Global Region

API Action String: globalregion
Takes the name of a country and outputs the region (Africa, APAC, US/CA, LATAM, etc.)

Input IdxInput Type
0Country
Output IdxOutput Type
0Generic String

IP To Location

IP To Location

API Action String: ip2loc
Takes in an IP address and outputs IP Country, IP City, IP Zip, IP ISP Name, IP Domain Name, IP Usage Type, Proxy Type, IP Block ID, IP Block Len.

Input IdxInput Type
0IP
Output IdxOutput Type
0Country
1City
2Zip
3Generic String
4Generic String
5Generic String
6Generic String
7Generic String
8Generic String

Format Phone Number

Format Phone Number

API Action String: naphonefmt
Formats a phone number into a selected standard format. Only works with 10-digit (or 11-digit with country code) North American phone numbers.

Input IdxInput Type
0Phone
Output IdxOutput Type
0Phone
Option IdxOptionValue
0phoneformat0 = (XXX) XXX-XXXX
1 = XXX-XXX-XXXX
2 = XXX XXX XXXX
3 = XXX.XXX.XXXX
1includenacountrycode0 = No (Always Stripped)
1 = Yes (Always Included)