How to clean up your data with Master Data Management
In your organization, you may find multiple applications that independently manage some or all of a particular business object (e.g., address, service, or vendor). As a result, you may have inconsistent data across your applications.
In this article, you’ll learn:
What is master data management (MDM)?
What is considered master data?
How to start an MDM program
What makes an MDM program successful
Who should be involved in an MDM program
Questions and inconsistencies result in fragmented data. When these inconsistencies are multiplied across many systems, the scale of the problem is vast.
Master data management works to eliminate duplication and inconsistencies in commonly-used business data, giving all staff a trusted, single source-of-truth.
What is master data management?
Master Data Management (MDM) is the practice of cleansing, rationalizing and integrating data into an enterprise-wide “system of record” for core business activities.
Master data management involves using a combination of ongoing practices and technology across an organization to maintain consistent, accurate, and reliable core business data.
Your organization may require fundamental changes in its business processes to maintain clean master data. Issues are more often interpersonal and political than technical. Once you’ve identified the issues and defined processes, your organization can implement technical tools to facilitate these processes.
An MDM program involves creating and maintaining master data. You’ll need tools and processes as your master data expands and is updated over time.
What is master data?
Gartner defines master data as “the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.”
Master data is essential data around which business is conducted. This data changes infrequently. For example, a customer’s address may need to be changed if they move.
Master data is often categorized into major domains. These domains are major components of what you make or deliver, to whom you deliver, and where delivery occurs. Typically, domains are:
Customer: Your customers, citizens, employees, and vendors. Think of each business unit and who they consider both suppliers and customers. Some are internally facing, others are external.
Product / Service: Your organization provides products and/or services to customers, citizens, businesses, and visitors. This domain organizes what you make or deliver to your customers. It includes the breakdown of products into parts and materials, and services into their constituent parts (including service delivery).
Location: You can start by thinking of the locations you deliver your products or services from, such as agencies, branches, or facilities. These can then be broken down into sub-parts, such as rooms. Governmental bodies that encompass a geographic territory can also consider all properties and rights-of-way forming the territory, and geographic divisions representing service areas, electoral geographies, and transit routes (part of service delivery).
Other: Anything else that doesn’t fall into one of the above categories.
How do you determine what qualifies as master data?
Some criteria to consider when identifying master data:
Behaviour Data: Describes the way that data interacts with other data; often a noun-verb relationship between Master and Transactional data. For example, a Customer buys a Product. When you go on Amazon to buy something, you are logging into your Customer profile – so Amazon knows you’re a Customer. You buy a Product, which Amazon represents as a product catalog to select from. Amazon records your purchase as a transaction that links two master data elements; you - the Customer, and what you bought - the Product.
Lifecycle (CRUD Cycle): A data lifecycle considers the Creation, Read, Update, and Deletion of data (CRUD - and potentially searching for data by exploring its’ metadata). The important thing to note about Lifecycle is that there is a formal business process to support the creation, update, and deletion of data. Adding a new Customer often means setting up an account or profile (a business process) to capture information about a new Customer, and potentially linking the profile to other data (claiming your water bill account for example).
Cardinality: Here, you’re counting the number of data elements (records) related to a business entity (dataset). Typically, the more items managed, the more important the data. For Customer data, if you have 3 records, it may be valuable but not important enough to consider as master data. Conversely, tens of thousands of Customer records increase the importance of the data.
Lifetime: Master data tends to be less volatile. It only changes occasionally, and through a well-regulated process. Consider a Contract, which may be considered to have a long or a short lifespan. For example, a Supply Arrangement may be a multi-year contract with one or more vendors. The Supply Arrangement may be considered master data, whereas each task authorization (a short-term contract) may be considered transactions against the Supply Arrangement.
Complexity: You can consider complexity by looking at simple things. If a local government distributes a lot of water, it makes money distributing water, so the water is valuable. However, considering each cubic meter of water as master data isn’t really a challenge to manage. You would simply count the volume delivered.
Value: If a data element has value, it is more likely to be considered master data. You may want to look at the corollary: what is the impact to your organization for getting a value wrong? If you have the wrong street address associated with a property owner, and you want to notify the owner that they may not have access to their property due to emergency water repairs, they may be upset when they don’t receive notification and have construction work scheduled at the same time on their property. They face a delay-of-work claim from their constructor because they could not provide timely notification to their constructor. Value and complexity are often correlated.
Volatility: Master data changes slowly over time as compared to transactional data. However, entities whose attributes never change are not typically considered master data. If you have a rare coin collection, you may store information in a database about each coin. You may add new coins to your collection, but you’ll never change the attribute values in your database for coins you already own.
Reuse: If a business object is widely used across your organization, reusing a standardized version of the information has a high return on the cost of formal maintenance through master data management practices. Reuse is one of the primary drivers of a master data program.
Determining what qualifies as master data can be a complex project. If you need help identifying your organization's requirements, book a free consult with us:
How to start a master data management program
There are four major activities involved in starting and maintaining an MDM project:
Plan and analyse
Set up governance
Select and configure technology
Creation and maintenance process
As with any successful initiative, it’s important to start small and win (big). Demonstrating success after success makes it easier to invite stakeholders into the conversation and get them engaged. Answering the questions “What’s in it for me?” and “What’s in it for us?” are equally important.
Plan and analyse
The first major group of activities, as in any project, is the analysis. What does the current situation look like? In your analysis, you’ll be looking at the potential sources of master data and considering the following:
Identify sources of master data
Which systems may have master data?
Identify the producers and consumers of master data
Which applications create/update master data?
Which applications read/search master data?
Collect and analyse metadata for master data, including:
Attribute name and data type
Allowable values, constraints, and default values
Who owns the definition and maintenance of the data?
Set up governance
Master Data Management places a heavy emphasis on governance. Remember, changing master data typically involves a formal business process. So likewise, there is oversight in the definition and curation of master data.
Appoint data stewards
People with knowledge of the current source data
Appointed by owners of each master data source, the architects responsible for MDM software, and the business users of master data
Implement a data governance program and data governance council
The group has the knowledge and authority to make decisions on how master data is maintained, what it contains, how long it is kept, and how changes are made and audited.
A well-defined decision-making body and process is required for project success. There are hundreds of decisions to make, and politics can get in the way without an effective decision-making process.
Select and configure technology
The technology selection and configuration considers the data model, the software and supporting tools for connecting to systems and cleaning data, and the scalable technology infrastructure to ensure availability of master data for the organization.
Develop the Master Data model
What does a master record look like (attributes, data types, allowable values)
Data mapping between master record and current data sources
Choose a toolset
Tools required to clean and merge master data - techniques are different per domain, so multiple tools may be required.
Toolset should have support for fixing data quality issues, as well as managing versions and hierarchies
Design the infrastructure
Many applications will need to access master data, so scalability and reliability are critical to the design.
Creation and maintenance process
The final set of activities revolves around the build of your master data itself. Here you will generate and test master data. You may modify producing or consuming systems to update data or automate the change process for local copies.
You’ll look at issues around generating and testing master data and implementation patterns that dictate any modifications to producing and consuming systems in the next section. Once your master data is built, your data steward will need to implement maintenance processes for continuous improvement to data quality, and to maintain data currency. Your software should help your data steward find and correct inconsistencies in master data.
Generate and test Master Data
Merge source data into the master data list. A 100% match is not possible.
Data stewards will need to correct match and merge deficiencies.
Modify the producing and consuming systems
Depending on the MDM Implementation Pattern, source and consuming systems may need updates.
Source systems may need to provide updates to MDM system, while consuming systems should obtain or look-up master data.
Implement maintenance processes
The data steward is responsible for maintaining the data quality of master data.
Tools should be available to detect inconsistencies for correction by the data steward.
What makes a master data management program successful?
Categorizing an activity makes things both easier to explain and gives us a framework to guide us through a process. There are 6 disciplines for a strong MDM program:
Governance: A key part of any MDM program. A cross-functional team from across the organization defines and oversees various aspects of your MDM program.
Measurement: Any business activity requires measurement to ensure it has the resources to achieve goals and a measure of its progress. Measurement for an MDM program looks at data quality and continuous improvement to judge performance.
Organization: Get the right people in the right seats on the bus. You’ll need to draw from key business people from across the organization to bridge business and technical knowledge about your master data; these are your data stewards. Data owners are key stakeholders, as are participants in governance activities.
Policy: Requirements, policies and standards need to be defined before they can be measured. Policies orient the activities around MDM.
Process: Remember that changes and access to master data is governed by formal business processes. So you’ll need to define the processes around the data lifecycle (CRUD Cycle) used to manage Master Data.
Technology: A key enabler of an MDM program. The Master Data Hub (or Data Virtualization) organizes Master Data, contains tools for cleaning, matching, and merging, as well as tools to identify and correct inconsistencies. Technology also includes the data load/transform (extract-transform-load or ETL) tools, and application integration tools to interface with your applications portfolio.
Who should be involved in the master data management program?
There is no exhaustive list of stakeholders that can and should be included in an MDM program. Here, we’ll look at 3 key roles that are critical to the success of a program:
Administrators – technologists working in your IT team that are responsible for setting up and administering the master data, and the supporting technology solution and infrastructure.
Data Governance – individuals that are driving definitions and requirements for master data. They identify the master data list. They also provide feedback from the MDM solution relating to data quality and continuous improvement of master data.
Data Stewards – responsible for fixing, cleaning, and managing the data in the MDM solution once it is available and/or loaded from source system(s). These people ideally are technically oriented (they don’t have to be adept) from each department where source system(s) are identified. Their activities are generally defined by the data governance team.
Answers to your questions about MDM
Some questions we’ve been asked by organizations about master data management:
Do you have recommendations for best practice Master Data sets?
Some sets that are of interest to our customers are:
Employee (including organization, position profile, reporting structure), customer profile, infrastructure assets, business and property addresses, and vendors/suppliers.
Data Lake vs. Data Warehouse: which is right for my organization?
The industry is moving toward a data platform which usually includes both. The principles in terms of maintaining data are the same. Data Lake provides a more current view and doesn’t require restructuring info in terms of input; instead, restructuring occurs during output. The decision you’re faced with is whether you want restructuring to occur during input or during output.
Do you have any recommendations for data warehouses with a mix of spatial and non-spatial data? How can we layer analytics packages?
Data warehouses have been around for 25-30 years and tend to focus on non-spatial data. When we layer in spatial information, it tends to be hierarchical data (e.g., census information, administrative areas) or point data.
Linear data tends to be poorly represented in a data warehouse concept. There are no strong visual tools to view or analyse linear data. However, here are tools and approaches coming out that use dynamic linear referencing to support visualization, which allows for separation of geographical or geometric view from a data view. Data visualization tools have limited support for working with linearly-referenced data. We can incorporate geography in a view and overlay data onto it.
Point data tends to have the best support in data warehouses right now and can be easily visualized in your analytics package.
What are typical implementation costs?
Costs vary widely, as there are many variables unique to each organization. In order to look at costs to implement, an in-depth analysis of objects and workflows must first be conducted.
We suggest thinking about Master Data Management as a program, starting small with focused data sets. Once adoption occurs across the organization, it becomes easier to invest further.
Master data management is primarily a people and processes issue, and secondly, a technology issue. You can start small, by addressing some key datasets, formalizing your creation and update process, and designating a data steward. This doesn’t necessarily involve adopting new technology if you are designating a particular system to hold the master record. This can be your first master data project. Your biggest expense in this case is people: the first 3-5 people to start the first project.
Once you move beyond the initial project, the governance, process, and technology become more formalized and require a cross-organizational approach. Once you start to explore beyond a single-source application contributing master data, you get into managing a master data program. Exploring a core business object and focusing on data quality and reuse across the organization reaps a significant return on investment. This return is a result of better decisions, less time spent searching for or compiling data, and automating the access to a master record.
We have built out a Master Data Management Framework, which helps with the identification and specification of master data into a data model. This can be used to estimate implementation costs for a particular core business object. The cost of this engagement typically ranges from $10,000 to $20,000.
What risks are involved?
Ongoing support and maintenance can be challenging. Skilled technicians are difficult to retain internally in the industry.
Sometimes, beginning a Master Data Management process can be like ‘opening a can of worms’. It requires your organization to engage all departments, and can seem like a big undertaking.
There is also a cost risk, in the event that programs or systems need to be updated, acquired, or replaced as a result of implementing the program or roadmap.
Involving the right people to support the master data program is key to mitigating risk. This means appointing a data governance council and data stewards, and involving IT administrators, as well as a sponsor from each department to manage data in your master list.
Implementing appropriate technical tools is also important in addressing potential risks.
What are some best practices for documenting Master Data Management? There is so much information to track!
Tools are available for managing the master data lists and program, and it’s important to have documentation in place that is updated as things change. There are data modelling tools and master data management tool providers.
Appointing stewards and a governance council is necessary. They will be responsible for determining how to organize information so that everyone knows what’s going on, and formalizing this as business data.
Is Master Data a shared responsibility that no department owns? What happens if one department spearheads it and wants people to conform?
There is typically a strong sponsor to set up a program, which often comes out of finance due to financial reporting requirements. A governance council should involve higher management, voices across departments, and authority should be delegated to the governance council.
How can I get further advice and support to start an MDM program in my organization?
Book a free consultation with Spatial DNA to discuss your master data challenge: