Today, organizations want to derive more value than before from their data – they want their data, their understanding of the data and the insights they can gain from the data to be an asset and a competitive advantage. A key enabler has been more sophisticated analytics.
Analytics capabilities have exploded in multiple ways in the recent years. We are now able to consume and analyze larger volumes of data in less time. There have been vast improvements in how structured or unstructured data can be analyzed. Statistical engines can be accessed from most analytics packages and this has led to wider usage of multivariate analysis, principal component analysis etc. Visualization capabilities have brought specialized graphical techniques into the mainstream (e.g. heat maps, radar plots, tree maps, box and whiskers, combination charts, GIS maps etc.)
More sophisticated analytics does not immediately mean better insight. There are issues at two levels that hold back the analysis. At the surface are issues that can be managed by data wrangling. Along with analysis capabilities, data wrangling capabilities have also grown. Options for cleaning, transforming data include scripts, functionality built into analytics packages and a wide range of data wrangling software. Some data wrangling is always going to be required given the various data sources and complexity seen in data today. The second level of issues are more systemic and harder to notice as they are a result of changing assumptions and differences in definitions across the organization. When noticed fixes are attempted with furious data wrangling efforts. When the only tool you have is a hammer, people tend to nail everything. What causes these types of issues and what is a better way to tackle these?
An illustration of this was something I faced with a large hotel chain. My last name is the same as my husband’s first name – this was an uncommon cultural variation in the names that could not be handled correctly within the hotel chain’s data records. As we both had the same residential address, the hotel reservations and frequent stay accounts were repeatedly mixed up. It would get fixed every time we called customer service, but would happen again the next time. The result was we stopped using that hotel chain because of the inconvenience. Another illustration of this was Chevron’s experience with cleaning up its well data as described in a Harvard business review article. Chevron realized comprehensively cleaning its data would be a multi-year effort but would not necessarily leave them at better place than where they started. They realized it would be much better to change the way they operated to eliminate the errors in how they acquired their data.
In other words, it is important to methodically improve data quality across the organization. It is more strategic to change the culture in the organization to improve data quality at the source than dealing with it in a reactive manner. On the surface, this looks tougher to accomplish than data wrangling. But a fully thought out approach to data management implemented with a coordinated effort across all parts of the organization will ensure the quality of data and the data flow across the organization remains good. The benefits of cleaning the data at the source go beyond just improving outputs from analytics. Organizations that realize this have a significant edge in being able to execute a successful digital transformation.