At this point in every industry, big data is a term that gets thrown around often in discussions on leveraging new and existing data sets for marketing opportunities and improved customer engagement. While big data can generate new insights into customer behavior patterns through the capturing, analysis, and comparison of both structured and unstructured data, there is more to this simple term that describes a complex series of processes and procedures. Knowing some of the key terms used right now in big data will help your business better frame the discussion for how to use big data for marketing.
- Structured Data
Structured data is the machine readable data in which each data attribute is clearly and separately labeled in fields like first name, last name, and telephone number. Historically, operational data is stored in a structured format such as a relational database table or a comma separated value (CSV) format in order to simplify programming. Traditional entity resolution and identity resolution systems are restricted to matching on structured data and have a difficult time with unstructured data sets.
- Unstructured Data
Unstructured data is often text data with little to no labeling of data items such as social media posts or data in non-character format such as images or recordings. Unstructured data is difficult to process in traditional data systems that expect each data item to be labeled. Unstructured text data must be cleaned and analyzed to extract and label specific data items before it can be integrated with structured data and used in operational processes. However, these extraction processes are often complex and unable to decipher data accurately. With nearly 80% of new data being unstructured, being able to match on all new data is crucial to crafting a 360-degree view of customers.
- 360-Degree View of Customers
This is a new buzz word in big data as it relates to marketing to a customer from a comprehensive view of all their data touch points. At its core, developing a 360-degree view involves aggregating all known data assets – both structured and unstructured – to better engage customers with potential products and services and help foster long-term loyalty to your business.
Many businesses keep information about their employees, customers, and potential customers in different data stores, depending upon their lines of business or business units. Just because one customer has a checking account, credit card, and likes your bank’s Facebook page, that does not mean your bank has all of this information in one place to create that elusive 360-degree view of the customer. Linking across these data stores, regardless of location or quality of the data, is necessary to create a 360-degree view of your customers. Then, using data management tools, you can leverage numerous business applications for better ROI.
- Data Governance
Data governance is a system of policies, processes, and procedures intended to guide the management of data as a shared enterprise asset. Data governance establishes a single point of communication and control over enterprise data:
-
-
- Communication in the sense that all data stakeholders in the enterprise agree to the definition, level of quality, and business rules and regulations applying to each data item under governance.
- Control in the sense that procedures and processes have been put in place to assure all data stakeholders comply with the policies applying to each data item under governance.
-
Policies usually include change notification, which could involve data definitions and formats not to be changed without agreement by all data stakeholders. Data governance is built on the concept of data stewardship, the idea that all data belongs to the enterprise, individual business units and employees are only caretakers (stewards) of the data.
- Entity Resolution
Entity resolution determines whether an ambiguous record in data sources is referencing the same entity – a customer, patient, vendor, product, location, etc. – or referencing different entities. The entity resolution process is necessary to produce and maintain persistent entity identifiers that bring clarity to ambiguous references and data with slight variations (such as “Robert Smith” or “Bob Smith”). Entity resolution provides unique identifiers within a coherent dataset that can be used for improved analysis, customer engagement, and product development.
- Identity Resolution
Identity resolution is the process for resolving whether a data record is referencing an entity whose identity has previously been determined and stored in an operational system. This process allows for an identity to be referenced and identified in different ways across data silos. For example, if a patient is admitted to a hospital, the patient’s name, address, birth date, and other identifying information are used to search the patient database to see if they have been previously admitted.
Entity resolution and identity resolution are similar, but the difference between the two can be illustrated through an analogy with criminal investigation. Suppose two thumb prints are found at a crime scene. A fingerprint expert can determine if the two thumb prints are for the same suspect, or if two different suspects are involved. This is like the process of entity resolution, determining if the thumb prints for the same, or for different, suspects. If one of the thumb prints is sent to the FBI to search its criminal database and it matches one on record, then the suspect’s identity becomes known to the investigators. This is like the process of identity resolution, determining if the thumb print is the same as a person whose identity has previously been determined and stored in the criminal database.
- Persistent Entity Identifiers
Most organizations label records referencing a key business entity, such as a customer or product, with a unique identifier value to represent the entity, like an employee I.D. number. Persistent entity identifiers are used to quickly link, query, and find information about a particular entity. The problem is when business units have different data management systems, and the same entity has different identifiers in different data stores and business units. At Black Oak, for the purposes of data governance and master data management, we recommend each record referencing the same entity consistently has the same unique identifier value across all data stores and processes. Establishing persistent entity identifiers is the first step in allowing managers, data scientists, marketers, and data stakeholders to have a complete, comprehensive view of the company’s data assets over time.
Black Oak Analytics employs High Performance Entity Resolution (HiPER), an entity identity information management (EIIM) system that allows you to effectively identify and market to a targeted audience with increased efficiency through entity matching and resolution.
If you want to know more about how to use big data for marketing, contact Black Oak Analytics today.