MetaData - The Key to Findability of Data

Mar 30, 2021
5 min read

Updated: May 9, 2022

We talked previously about how organizations are often drowned with a large volume of meaningless data that makes data-driven decision-making almost impossible. A great example is the number of security alerts and logs that get routinely generated about cloud infrastructure.

Organizations inundated with security alerts and logs struggle to eliminate false positives and focus on the threats that matter. This is not restricted to just security information and event management (SIEM) but is prevalent across most parts of an organization.

Organizations need to develop better strategies to manage their data to quickly identify data and data risks relevant for their business purposes. Spending some effort to properly tag, classify and describe data will allow teams to be able to clearly understand risks associated with the types of data they manage.

Enterprises can benefit significantly by following the approach used by search engines. While the search appears transparent to the users, there is a lot that goes on behind the scenes to improve the accuracy and relevance of a search. Metadata plays a very important role in this.

What is Metadata?

Metadata is data about data. This can be routine information like size, location, owner, etc. but can also include richer aspects and characteristics about the data like the source of data, keywords, who uses it, how frequently is it used, importance to the business, how it was acquired, whether it includes sensitive or confidential information, etc. Metadata is a crucial component for ensuring that relevant results are returned from searches and is an integral part of most major search engines.

How Metadata Improves Findability and Relevance

When we think of search, Google and Amazon are some of the examples that come to mind immediately. What might not be instantly obvious is the active effort that goes on to curate and manages the metadata that helps significantly with returning relevant results to a query.

These efforts can be broadly classified as search engine optimization (SEO), and they encompass both active updates to metadata and to the search algorithm parameters to ensure the relevance of returned results.

Metadata is critical in defining the search ranking of a webpage. This helps the search engine understand what a webpage is about and helps with displaying the relevant phrases and keywords of that page as part of the search results. It is important that a webpage’s metadata is actively managed for that particular webpage to be identified and ranked high by search engines in a search request.

The metadata demonstrates the relevancy of the page and promotes that particular page in the search engine rankings. Metadata in websites (in case of an internet search) or product pages (in case of a search in an online marketplace) are continuously curated by owners as part of SEO to ensure they are correctly appearing in search queries.

There is also an active and continuous effort by search engines to refine the search algorithms so that the most relevant results are returned (especially to weed out false-positive results from a consequence of websites trying to boost their rankings).

For instance, Amazon employs a product-based search algorithm called A9 that uses metadata to determine if a product or page shows up in the top results. The metadata is harvested by AI and Machine Learning but is also managed, tweaked, and presented by humans to ensure proper classification and ranking.

The A9 algorithm pulls relevant results from the catalog and then ranks them according to their relevancy to the customer. Amazon’s algorithms keep on learning to combine several relevant aspects and analyze the user’s previous search patterns to provide the most optimal results.

The metadata helps in product content optimization, removal of duplicate content, and making the results more structured. It is also used in identifying the targeted region, language and promoting the pages across various search engines.

Amazon continuously evaluates its algorithms “using human judgments, programmatic analysis, key business metrics, and performance metrics” and metadata plays a key factor in determining what shows up in search results. They also keep track of conversion rates, relevancy, customer feedback, retention, and product descriptions to rank a product.

The product descriptions and pages are also continuously updated and tweaked by the sellers as the search algorithms are updated – in fact, a whole industry of experts (Amazon SEO) is constantly tracking and providing guidance on how to tweak the product details for better ranking in Amazon searches.

A similar process happens with Google search rankings as well; there is a lot of effort on both the search algorithms and the webpages to ensure the creating, managing, and tweaking of metadata makes these searches effective.

Enterprise data – how is it different? Why?

In an enterprise context, the teams generating the data, the data owners, or the teams using the content are not thinking about metadata or rankings. Their focus is on business problems that they are solving, and typically not much time is spent on curating and managing information about the data. Users might not be thinking about the findability of their data and workflows to others in the organization outside their immediate teams.

While pages on the internet or product listing might have is a motivation to optimize the metadata for search rankings, enterprise data is not actively tagged or labeled, or classified. The footprint of enterprise data is also broader – it includes not only the data that is produced and stored but also the data that needs to be disposed of. Enterprise data also have restrictions around how sensitive data should be handled and accessed.

How can enterprise data be better managed?

Some key elements are needed to better manage enterprise data. It is important to have clearly identified data owners for different systems within the organization. Having clearly identified data owners helps to establish responsibilities and accountabilities towards data.

While data governance has a significant focus on how to preserve and protect data, it is also important to expand the focus on how to manage the metadata about different systems. Having current and accurate metadata about different systems can help organizations to more effectively utilize their data to address the different needs of the organization.

Maintaining metadata accurately would require a combination of automation to determine the type of data contained within the system and the active involvement of the data owners. It will be equally critical to define key metrics around governance and what type of metadata needs to be tracked for these metrics.

Meru Data has significant experience helping our customers manage this. Our DataMaps can help you maintain and update information about data, systems, and data flows with minimal effort from users by utilizing powerful automation capabilities.

It is vital you understand where your data exists and how it flows within your organization. Different stakeholders and teams within an organization should be able to easily see and understand how their data is used. We also help organizations to track and manage their key risks around data.