Entity matching tools are essential for a robust data governance program. This software simplifies your data by linking duplicate entries and drawing your attention to semantically equivalent records—those that are identical if not for a spelling mistake or abbreviation that would otherwise create two entries. Record linking software is particularly useful for managing client, customer, or patient interactions across distinct corporate divisions or platforms. While the process of record linking is valuable for data-driven companies, the most important linking measurement for your company’s data governance program will depend on whether you value recall or precision.
Recall and Precision
Two inversely related factors used to measure record linking are recall and precision. Records fall on a continuum of being considered relevant or irrelevant to the identities with which they are paired, which is how recall and precision are calculated. At their most basic, precision is the likelihood that links made between identities are for the same person, and recall is the likelihood that records for the same person were linked.
- Precision is a term used to describe the ratio of relevant records correctly linked with an identity compared to the total number records linked, which can include irrelevant, incorrectly linked records. Precision measures the record linkage accuracy of ER matching tools. If a system has a precision value of 100%, every linked record can be verified as being equivalent, which is also known as a true positive match. The higher the precision, the lower the recall.
- Recall is a term used to describe the ratio of records matched with an identity compared to the total number of potentially relevant records in a data set. Recall measures the efficiency of ER matching tools. If a system has a recall value of 100%, every possible record that could be matched is matched. The higher the recall, the lower the precision.
Evaluating Your Use Case
Because linking more records leads to lower precision and vice versa, these factors will always be in conflict. As a result, your company must examine its use case for the data to determine which of these factors will be labeled most important for the success of your master data management system. Failing to choose the correct factor can lead to inaccurate reporting, poor decision making, and even legal issues.
Health care and legal companies value high precision because it is crucial that entities are matched to a single identity with the lowest margin of error possible. Failing to secure high precision in the record linking process can lead to personally identifiable information (PII) being sent to the wrong patient or client, which could trigger costly policy violations for these organizations. As a result, they typically consider recall the less important factor. It should be noted that some health care institutions, such as clinics, still value recall because it is considered a liability to have two records for one person.
Business analysts who are relying on data to manage marketing outreach efforts and client bases typically favor high recall because matching a larger number of entities to an identity allows them to track key interactions more closely. Linking more records to a single identity makes it easier to retain a valid contact—a benefit that outweighs the risks of less precision. Higher recall also makes it easier to link duplicate records. This can cut costs associated with sending multiple materials to a person who appears twice on a mailing list but with slightly different records.
Choosing the Right Factor
It is crucial to receive buy-in from all involved parties when deciding whether your company’s record linkage software should value precision or recall. This conversation should be started by your chief data officer, who will be familiar with the current structure of the company’s master data management system. With input from other key stakeholders, the CDO will be able to weigh the intended use cases and recommend entity matching tools that will help your company reach its business goals while also avoiding any risks associated with bad data.
If you are interested in improving your data quality through record linkage software, contact Black Oak Analytics today at (877) 805-0736 or request a consultation to learn more about our HiPER platform.