Black Oak recently did two things: We sent our CSO Dr. Talburt to the White House to participate in a round table of 80 industry thought leaders who were discussing how to improve data quality in open data. We also just published a blog about how Collection is Not Enough; You have the Data, Now What? That seems to be the problem a lot of people are having. Bernard Marr published an article recently where he reminded us “But in the rush to avoid being left behind, I also see that many companies risk becoming data rich but insight poor.” And private companies aren’t the only ones sharing this problem.
The news has been rife with articles about government agencies collecting massive amounts of data, but not being able to create meaningful information from that data in a timely manner. The problem is no longer the ability to find or collect data.
Kristen Honey policy advisor for OSTP says data quality is a pain point for open data #opendata #dataquality @BlackOakUSA
— John Talburt (@JohnTalburt) April 27, 2016
One problem seems to be a labor shortage, that is a lack of data scientists to analyze and find meaning in large amounts of data.
Prioritizing datasets to clean is a major issue for federal agencies participating in open data #opendata @BlackOakUSA #dataquality
— John Talburt (@JohnTalburt) April 27, 2016
Without properly trained people to look at the data, and figure out what data needs to be thoroughly analyzed, the collection of that data is a non-issue.
Megan Smith US CTO says data science talent unevenly distributed across agencies, must change, #datascience #cto @BlackOakUSA
— John Talburt (@JohnTalburt) April 27, 2016
Fortunately, because a crux of the issue is that of data quality, Black Oak Analytics has some strategies to solve these issues. One of the newest buzz words in the Big Data news sector is Citizen Data Scientist. This refers to giving the tools used and developed by data scientists to analysts who can then participate in the work flow. The news feeds are about with cool article titles like How the Citizen Data Scientist Will Democratize Big Data and The Rise of the Citizen Data Scientists. The conversation is really about the aforementioned shortage of data scientists, people trained to sift through large amounts of data and glean value from the data.
In 2016, we have enough software tools developed by data scientists so that business/data analysts and are now able to perform some of the functions that, in the past, only highly trained data scientists could perform. One such example is our High Performance Entity Resolution (HiPER) system. In the past, entity resolution (also known as deduplication) and data integration required two things: software and standardized structured data, such as name and address. Trying to resolve entities on unstructured data or non-standard entity attributes, especially on extremely large data sets, remained the work of specialized experts, like data scientists.
HiPER allows analysts to access the data and provide value from large datasets. The ability to control data quality on large amounts of data, open or internal, allows data scientists at all level to sift through data and get back to the analytics.
Open data in government sees need for large-scale entity resolution, HiPER seen as a scalable solution @BlackOakUSA #WhiteHouseOSTP
— John Talburt (@JohnTalburt) April 27, 2016
But the need for large-scale entity resolution is beyond just the government using open data. Any industry relying on data analytics, which should be most of them at this point, can leverage data quality software like HiPER and the experts who support those systems. Now, even the restaurant industry is beginning to leverage big data technologies.
For more information about HiPER, or to find out what your entity resolution software can do for you, contact Black Oak Analytics today at (877) 805-0736 or request a consultation to learn more about our HiPER platform. For more event coverage from Dr. Talburt, follow us on Twitter @BlackOakUSA.