Matching/Deduplication

What is Survivorship and How Does it Improve Data Quality?

Survivorship is the process of creating a contact record that consolidates all the pertinent information needed about a customer into a single contact record.


Survivorship is the process of creating a contact record that consolidates all the pertinent information needed about a customer into a single contact record.

Survivorship is the process of using Golden Record criteria to determine the best single contact record in a group of duplicate records. After a single best record is determined to be the master contact record, the record can consolidate or merge data from other matching record(s) into this “single source of truth”, or the record that captures all the pertinent information needed about a customer, resource, product, or service. Consolidation methods may include verified data points, but also custom methods to ensure multiple values can be added, stacked, or evaluated as being the most relevant.

How Does the Survivorship Process Work?

During the record-matching process, various techniques are used in selecting the best possible candidate. Factors such as structure and source of data, how it’s populated and what kind of data is stored all come into play when performing survivorship. Three techniques commonly used to select the surviving record include:

  • Most Recent: Date-stamped records in order from most to least recent can be ordered
  • Most Frequent: Matches repeating records that contain the same information
  • Most Complete: Considers field completeness as an indication of record accuracy

Below are visual examples of these record-matching techniques, which illustrate why pattern recognition methods are not always the most reliable way to determine the surviving record.

Most Recent

Here, the record with a valid phone number is determined to be the surviving rather than the most recent one:

survivorship-most-recent

Most Frequent

Though the last two records are the most frequent, the data is invalid. Therefore, the first record is the surviving record as it contains a valid address:

survivorship-most-frequent

Most Complete

In this example, the most complete record contains and invalid phone number, so the surviving record would be the second one:

survivorship-most-complete

These pattern-based techniques are just the first step in determining the surviving record. The next step is to add in a data quality tool like Melissa’s MatchUp, which quickly finds and links customer data, plus consolidates data across multiple sources to remove unwanted business and customer records. Using the Melissa approach, the Golden Record is determined by a data quality score which takes into consideration the quality of information provided and uses that as a basis of survivorship.

Similar posts

Get notified on new data quality features and insights

Be the first to know about new data quality and product features.