MetaData - The Key to Findability of Data

We talked previously about how organizations are often drowned with a large volume of meaningless data that makes data-driven decision-making almost impossible. A great example is the number of security alerts and logs that get routinely generated about cloud infrastructure.

Organizations inundated with security alerts and logs struggle to eliminate false positives and focus on the threats that matter. This is not restricted to just security information and event management (SIEM) but is prevalent across most parts of an organization.

Organizations need to develop better strategies to manage their data to quickly identify data and data risks relevant for their business purposes. Spending some effort to properly tag, classify and describe data will allow teams to be able to clearly understand risks associated with the types of data they manage.

Enterprises can benefit significantly by following the approach used by search engines. While the search appears transparent to the users, there is a lot that goes on behind the scenes to improve the accuracy and relevance of a search. Metadata plays a very important role in this.

What is Metadata?

Metadata is data about data. This can be routine information like size, location, owner, etc. but can also include richer aspects and characteristics about the data like the source of data, keywords, who uses it, how frequently is it used, importance to the business, how it was acquired, whether it includes sensitive or confidential information, etc. Metadata is a crucial component for ensuring that relevant results are returned from searches and is an integral part of most major search engines.

How Metadata Improves Findability and Relevance

When we think of search, Google and Amazon are some of the examples that come to mind immediately. What might not be instantly obvious is the active effort that goes on to curate and manages the metadata that helps significantly with returning relevant results to a query.

These efforts can be broadly classified as search engine optimization (SEO), and they encompass both active updates to metadata and to the search algorithm parameters to ensure the relevance of returned results.

Metadata is critical in defining the search ranking of a webpage. This helps the search engine understand what a webpage is about and helps with displaying the relevant phrases and keywords of that page as part of the search results. It is important that a webpage’s metadata is actively managed for that particular webpage to be identified and ranked high by search engines in a search request.

The metadata demonstrates the relevancy of the page and promotes that particular page in the search engine rankings. Metadata in websites (in case of an internet search) or product pages (in case of a search in an online marketplace) are continuously curated by owners as part of SEO to ensure they are correctly appearing in search queries.

There is also an active and continuous effort by search engines to refine the search algorithms so that the most relevant results are returned (especially to weed out false-positive results from a consequence of websites trying to boost their rankings).

For instance, Amazon employs a product-based search algorithm called A9 that uses metadata to determine if a product or page shows up in the top results. The metadata is harvested by AI and Machine Learning but is also managed, tweaked, and presented by humans to ensure proper classification and ranking.

The A9 algorithm pulls relevant results from the catalog and then ranks them according to their relevancy to the customer. Amazon’s algorithms keep on learning to combine several relevant aspects and analyze the user’s previous search patterns to provide the most optimal results.

The metadata helps in product content optimization, removal of duplicate content, and making the results more structured. It is also used in identifying the targeted region, language and promoting the pages across various search engines.

Amazon continuously evaluates its algorithms “using human judgments, programmatic analysis, key business metrics, and performance metrics” and metadata plays a key factor in determining what shows up in search results. They also keep track of conversion rates, relevancy, customer feedback, retention, and product descriptions to rank a product.