Data Governance Series: Part 2 – The Importance of Data Taxonomy
Data governance can be a powerful agent in scaling the use and distribution of trusted data throughout the company. However, more often than not, it conjures up the idea of a central authority strictly guarding against such access. In this 3-part series, we’ll cover the critical and often misunderstood components of data governance and offer perspective on how to implement data governance strategies that deliver trusted data at the speed of business. If you missed it, make sure to catch up on Part 1 – Data Timeliness.
What Is Data Taxonomy?
A taxonomy, very broadly, is a system of organized information that allows the user to classify and show relationships between things. A common example of a taxonomy is the Dewey Decimal System of library classification, in which numbers form a code that correlate to topics, subtopics, and sub-subtopics. Wikipedia illustrates the way this hierarchy is set up:
500 Natural sciences and mathematics
510 Mathematics
516 Geometry
516.3 Analytic geometries
516.37 Metric differential geometries
516.375 Finsler geometry
In the Dewey classification system, each number is associated unambiguously with a single entry in the hierarchy. A number such as 516.375 above identifies a book or other resource specifically as dealing with Finsler Geometry. That number also shows how that book relates to others above and below it in the hierarchy.
A data taxonomy uses a system of unambiguous metadata terms (such as a filename or tags attached to a file) that allow an enterprise to classify a file or dataset into important business categories. Categories can be configured in any way that meets the needs of the organization, but some common ones include the date of creation, date last modified, account name of the creator/modifier, required access privileges, personal identifying information (PII), the department that owns the dataset, and the primary business use of the dataset.
Properly designed and developed, a data taxonomy improves discoverability, observability, and security for your data. Data that is properly classified, catalogued, and tagged is usually well-governed data.
How Does Taxonomy Aid Data Governance?
A proper data taxonomy addresses many problems in your data and metadata, including:
- Ambiguity – The same term can have different meanings according to different business users, leading to confusing query results. Taxonomy provides a hierarchy that helps remove ambiguity. It includes mechanisms for understanding context and making meaning precise.
- Consistency – Sometimes the terms used in legacy applications differ from those used in newer systems. Terminology can also vary between business units. For various reasons, the data sometimes can’t be re-tagged to provide consistent metadata. A taxonomy lets you create a thesaurus to map disparate terms to a single label, mitigating the impact of natural inconsistency.
- Connections – Taxonomies can also represent related concepts (technically also part of a thesaurus) that can be used to connect processes, business logic, or dynamic/related content to support specific tasks.
- Incompleteness – Data may be missing important attributes. Consistent terminology, through a taxonomy, makes it easier to identify missing fields or metadata. The missing attributes may then be added from other sources or the incomplete data may be deleted or set aside and not used.
- Utilization – A data taxonomy can emphasize the suitability of a dataset for a particular purpose (for example, by tags that identify department ownership and business purposes). This can increase discoverability and thereby promote the use of that dataset for the intended purpose.
How to Build a Data Catalog with Taxonomy
The first and most important step to data discoverability is a data catalog. The first essential step in building a catalog is tagging data with business vocabulary so users can easily find the data they need. A data taxonomy makes cataloging much more powerful, improving data quality and discoverability. DataOS® can automate tagging and indexing to add incoming data to your catalog immediately.
How to Build a Taxonomy for Your Data
The two keys to building a usable data taxonomy from scratch are focused changes and using the language of your users as much as possible.
Focused Changes
Focus your taxonomy on one business area at a time. Balance your choice of area by beginning with high-priority targets, while keeping your scope manageable. For example, don’t begin with something like compliance with HIPAA or GDPR. Those are too large and too sweeping to start with. Save those to address after you build the taxonomies for a few smaller areas, such as marketing, sales, or security. Not only will this give you more practice with the methods of taxonomy, but much of what you build there will be needed for something like GDPR, so you’re whittling the scope of that project down as you go.
Use your narrow focus to plan and keep milestones as your taxonomy progresses from one target to the next.
User Vocabulary
More than many other data projects, a data taxonomy is a team effort. Your IT team or data steward can’t do it on their own. A data taxonomy needs to use the language of your business users, which means a polling process and meetings with users to learn how they think of their data.
You may add a hierarchy to your taxonomy to address the variety of terms that users may have for the same thing. If users have terms like “POS revenues,” “sales,” and “revenues,” then you can set up the taxonomy so all of those searches point back to “sales,” which is the tag that appears in your metadata. This is one of the primary ways in which a taxonomy enforces consistency and aids discoverability.
The focus of your taxonomy efforts can also help users see the value of the taxonomy to their particular projects, increasing enthusiasm and interest in developing the vocabulary for their area.
A Data Taxonomy Improves Data Value
Most modern businesses spend a lot of money on collecting their data. The ROI on that effort depends on deriving business insights from the data. A data taxonomy makes data easier to find and easier to use while improving data governance and data quality. It makes your data more valuable to your business.
Subscribe to
Our Blog
Be the first to know about the latest insights from Modern.
People Also Read
Solving The Persistent Challenges of Data Modeling
The elegance of Data Products is undeniable, but many leaders question the efficacy of their data strategies: Why does the return on data investments often disappoint? Why is proving data's value becoming harder? Why do data models become more cumbersome than...
Building Your Data Product Machine: Less Tech, More Strategy
Data is vital to business but the process of getting from data to insights is often murky. Many on the business side may not even care how it happens but understanding this process matters. It matters a lot. With this in mind, let's explore how to demystify the...
How a Data Product Strategy Impacts Both Business and Tech Stakeholders
We don't want to restrict the scope of this article to only data leaders and influential executives. As startup folks, we are confident in how individual contributors or ICs, such as Data Engineers, DevOps experts, or even the surprising intern, could influence the...
Why Evolutionary Architecture is Important in a Data-Driven World
It's a tale as old as time. A startup manages to disrupt an entire industry only to find itself at a critical juncture a few years down the road. Data, the lifeblood of its operations, was becoming increasingly complex and unwieldy. With each new product launch and...
The Just-In-Time Revolution for Data-Driven Enterprises
For today's Chief Data Officers (CDOs) and data teams, the struggle is real. We're drowning in data yet thirsting for actionable insights. Traditional data architectures, with their centralized data lakes and batch-oriented processing, are like bloated, slow-moving...
Latest Resources
A Modern Data Strategy for Enterprises – 2nd Edition
A Modern Data Strategy for Enterprises – 2nd EditionStuck with unusable data? Enterprises drowning in information can unlock its true value with Modern DataOS. Take a data products approach and create secure, high-quality data packages designed for specific business...
A Paradigm Shift in Data Management – 2nd Edition
A Paradigm Shift in Data Management – 2nd Edition Buried in data silos? Traditional data management is slow, rigid, and keeps valuable insights locked away. Enter a paradigm shift: Data Products. These are user-friendly, pre-packaged datasets designed for specific...
Creating a Single Source of Truth (SSOT)
Creating a Single Source of Truth (SSOT)[placeholder] Traditional project-centric data management stifles AI innovation with siloed data, slow workflows, and limited reusability. Enter the era of data products: self-contained modules of data, logic, and infrastructure...
DataOS Sales Accelerator for Food & Beverage
DataOS Sales Accelerator for Food & Beverage The dynamic food & beverage industry demands a data-driven approach to success. The Modern Data Company's DataOS® Sales Accelerator acts as your all-in-one data concierge. Our pre-built solutions, designed...
Unleashing the Power of AI with Data Products
Unleashing the Power of AI with Data Products Traditional project-centric data management stifles AI innovation with siloed data, slow workflows, and limited reusability. Enter the era of data products: self-contained modules of data, logic, and infrastructure that...