By Dr. John R. TalburtBig data tools are key pieces of marketing technology in all corporate environments today. This has become so as the use cases for big data have changed and grown along with their potential. Now that this paradigm has clearly shifted, it is just as important to understand where it is going and where it came from, because it is here to stay.
1) Processing Big Data
In the beginning, Big Data tools were invented because some companies really needed to process big data, like all of the webpages on the Internet. As a result, Google brought forth Hadoop map/reduce that represented a new approach to distributed computing. There were already super computers, but they were also super expensive. Many organizations attempted to design parallel programming languages, but these were really hard to learn and use because programmers generally think of algorithms in a sequential way, not in a parallel one.
“Thread safety” is still an alien concept for most of us. Hadoop avoided all of this by letting programmers continue to write sequential style code (Java in this case), while under the covers it took care of the internal steps necessary to distribute the data and the code across machines. This was the first use case for big data, a way to process really big datasets on relatively inexpensive computers without having to have a PhD in parallel processing.
2) Analyzing Internal Big Data
Once other companies started learning how to use this new technology to process large data sets, it became apparent that this could create a competitive advantage for them. It was sort of like a data warehouse on steroids. Processing bigger data could result in deeper insights. In highly competitive markets, even marginal improvements in business intelligence can result in huge gains. So even if you weren’t faced with processing every webpage on the Internet, if you still had the capability to analyze a billion transactions over ten years instead of only a million transactions over a few months, it could be an advantage. So the second use case for big data was the idea of deeper insights through analytics.
An interesting artifact of the first two use cases is that they gave the term “data science” two different meanings, one for each use case.
Some view data scientists as those who understand and are able to employ the big data tools, more on the IT side from the first use case. Others expect data scientists to be experts in analytics and modeling, conversant with such things as inferential statistics and machine learning, a reflection of the second use case. Now, data scientists are being called on to even more evolved roles inside organizations, as noted by the appointment of the United States’ first Chief Data Scientist.
3) Analyzing Unstructured Big Data
Just as the new big data processing paradigm opened the door to the widespread use of distributed and parallel computing for processing large data sets, the same technology supports the large-scale, online interactive networks that have popularized social media. Due to the popularity of such sites, social media has produced an immense amount of a new kind of unstructured data called “user-generated data.”
Even though unstructured data is not new, traditional unstructured data such as reports, emails, and contracts are for a specific use and recipient. However, social media users generate data that is much less pragmatic. Users often express a multitude of ideas, thoughts, opinions, and random status updates that are not nearly so purposeful. Nevertheless, companies are feverishly attempting to mine this data for customer insights to understand consumer intent. This is the third use case, joining unstructured social media with traditional structured data to create improve marketing and customer relations.
4) Using Big Data as the New IT
The fourth use case is more of an emergent property not directly related to a specific business problem. It is simply that the new big data tools have created a more attractive paradigm for data processing than the traditional approach anchored in relational database management systems (RDBMS). In the new paradigm, data is ingested first, then models, structure, and data cleansing are imposed later. This is quite different from starting with an abstract schema, then forcing the incoming data to first be structured and cleansed to fit the schema.
Another significant change is the idea of letting middleware move processes (programs) to the data, rather than writing applications that move the data to processes. These paradigms have unburdened computing from overhead such as relational database normalization, data typing, and parallel coding that are legacies of performance optimization and storage conservation. The bottom line is that the big data processing paradigm will soon become the norm for IT shops. Traditional client-server RDBMS tools will begin to recess into a supporting role rather than being the driver for most applications. This will hold true with or without big data, simply because the new approach is simply too compelling.
For more information about how big data tools can help your company, contact Black Oak Analytics at 501-379-8008.