Wednesday 26 October 2011

'Data Layers' in a master data management hub

“Data layers” is a diagrammatic representation of high level data entities to assist the scoping and road mapping of an MDM program. This is done generically here and using “customer” as example mdm domain. This need to be customised for each organization.

In this diagram concentric layers of data is represented in circles around a core mdm domain concept.  The core represents the ID of the concept. Eg: the Enterprise level Customer ID. Then data entities with closest affinity with the IDs are drawn in the immediate circle. And then concentric circles are drawn with decreasing affinity to the core.

These layers denotes a logical and optimal sequence to build out these layers from a technical perspective. So this can represents various large phases in an MDM program, with each phase running months or even years. However, business needs and value always get more weight and it is okay. What we are looking for is a way to balance the technically optimal world and business need driven world and come to practical roadmap.  

A sample picture of a ‘Single View of Customer’ using customer master data layers:


 




This is only a ‘logical’ picture of MDM data entities and Non-MDM data entities. It is definitely not a suggestion that we should clog the master data stores with data from all these non-MDM system in the physical architecture of MDM. But it is to get a conceptual understanding on what business team could be looking when asking for things like ‘Customer Profile’, ‘360 degree customer view’s etc. And how to build it using the master data as a starting point.

1.1      Master Data Layer

1.1.1     Layer 1.1 a.k.a The Core



The ‘Core’ means the centralisation of the IDs from all the important internal and external DBs and linking to one another representing the ‘same record’ or ‘instance of information’. As the maturity level increases, this linking turn into identifying or creating a new enterprise ID as required. The legacy IDs will then be decommissioned from the enterprise slowly over many years as other downstream systems also able to completely use the new enterprise IDs. This transformation will take years and can run into decades.

The assumption here is that, in any large organization, the master data is spread across the organization in multiple data stores. Mostly with their own unique ids.

This forms something like an initial ‘Registry’ architecture of MDM. In a pure registry form, no properties of the data concepts are maintained or consolidated as part of these, including hierarchies. The IDs mostly represent the highest parents.

To create a core, usually we may need lot more data to identify the links. Like names, addresses, status, phone/email, their external ids etc.. These information is used to identify the duplicate and finally link the records to each other and form a core. However, storing this non-key information is moved to the next phase/layer. The main reason is, each of these entities, like names, addresses, status bring in its own data lifecycle and bring in more integration and maintenance needs. So the scope increases a lot.

However, most organizations keep some additional information along with these ids, if they build a registry MDM. Like name, addresses or hierarchy. This varies by organization and would be necessary, if a quick identification of the master data entity is required without going to the source systems. So, for any practical purposes, they would actually be doing a little more than a pure registry form of MDM actually. But in that case, it is important to take steps on the data quality, data integrity of the information. Or in other words, then it starts to bring out the different aspects of a true MDM implementation, in terms of processes, data quality, data governance etc.


1.1.2     Layer 1.2 a.k.a Operational MDM


The second layer represents all the master data which fit the classical definition of master data for the given domain. Ie; “slowly changing”. Eg: A customer will have only one legal name at a point in time. He/She will have only couple of mailing addresses at maximum etc. All these are set one time and then expected to remain constant unless some specific life events, re-evaluation etc occur. Any of the regular business operations like a customer order or customer bill generation does not change this data.

This information is more or less same across organizations and even industries. This is the core reason giving rise to the ‘Master Data Management’ industry. However each of these type of information, like name, or say phone number has its own information lifecycle, in addition to the core data concept. Eg: Customer has a lifecycle which represent his relationship with the organization. He opens accounts, buys products etc.. and may be he will discontinue to be a customer closing all relationships. However he has a lifecycle outside the given organization. Ie; He get married, have children, move to new house changing addresses etc.. And each portion of the information can also change. Like changes his name, changes his addresses, changes his email, changes his phone etc..All these information lifecycles are managed in this operational MDM layer and then made available to the entire organization over period of time. And also send to downstream systems as a single source of truth.


1.2      Layer 2 : The transactional data and events

This layer represents the core transactions of the organization. Sales, Orders, Shipments, Bills, Product Usage etc.. For large organizations, most of these are done by major IT systems in the organization (Not the MDM system). Most of the legacy applications and vendor packages which execute these functions unfortunately has built in provisions inside these systems to capture and mange master data also. Like customer data, product data etc.. They forget that customer data is not owned by those transactions. Most of the vendor products claim that they can interact with outside systems with lot of interfaces starting with file formats to web services. But the reality is that most of these systems expect the master data to be resident inside these systems to properly function. And it could be very costly to re-direct the customer management portion inside these systems to master data stores. However this is a major integration challenge in the adoption of master data management. This layer of data represents these types of transactional data and the information it provide to understand the customer holistically.

1.3      Layer 3 : The analytical data : BI/data mining

Generally this layer does analysis on the data produced by the two inner layers and produce interpretations, summary etc.. As part of it, it generate lot of new data. Almost always it has to replicate the data from the master and transactional systems to the BI systems. There will be lot of other analytical programs, especially in the web centric organizations about click stream analytics etc. which produce lot data. This can also be classified as analytical data.



1.4      Beyond Operational MDM : Enterprise MDM


The master data layer (1.1 and 1.2) represents master data in its pure form. Ie; The remaining layers are NOT master data. However, the introduction of analytical MDM, 360 degree view of customer, or even the recent trends around highly analytical information based operational processes demand lot more information than these two layers. And master data management is asked to take more and more of this scope on to their plate. Or it is just ‘ordered’ by the senior most level of management.

So the industry has started using very creative terms to satisfy these needs. Some of the terms used for that ‘Analytical MDM’, ‘Enterprise MDM’ etc.

it is important to note that the objective of MDM is to manage the core data assets of the organization, like the customers and products and provide a single source for that data. The moment we step away from that, we are making a conscious choice to deviate from the basic definitions of MDM. However  a demand is a demand and it need to be met. The IT side need to architect for it. And we are making choices.

One of the choice is to stay clearly in at the operational MDM level. And count on the service layer to assemble the rest of the data on the fly from other systems, including legacy systems and provide it to the requesting applications. There are vendor products who claim to do this. But often, they ask for data to persist in the product. 

Often the choice will be to bring the data needed and expand the MDM data store for this ‘additional data elements’ also.

1.5      Scoping considerations and phasing beyond Layer two.


As discussed before, eventhough master data ends with layer two, it is very inadequate to address many of the frequent business expectations and needs. For example, it doesnot make sense to have a simple customer records without knowing the account he has with the organization or the product he has subscribed. Or even the recent interactions or trouble tickets raised by the customer. Business users even expect this by default without even knowing master data management doesn’t really address these needs.

So in order to meet these needs we need a framework to explore the rest of the customer centric data. We need to use this framework to identify data in transactional and analytical domains and link it and use it in conjunction with the master data in the operational master data.

This is done by identifying the scope of the master data domain needed for the organization in the end state. So let us consider these as a set of layers of data which is either added to the master data store or made available along with master data via the integration at services layer.


1.6      The transactional data layer and MDM data store.




1.6.1     Replicated transactional data skeletons in master data store

The first step is, we need to identify the type of data that is needed along with master data almost always. Or at least some information about these types of data.

Eg:
1.    Customer accounts ( The saving/checking /mortgage accounts they have in case of banking). The other types of accounts they pay monthly recurring bills in case of utilities etc.
2.    Products of the customer (The products currently the customer have). Sometimes this will overlap with accounts, especially in the financial industry. But many of the retail, utility and manufacturing domains there will be big difference.
3.    Users of the customer. The web users etc. especially in the case of a business to business scenario. The contact people of the customer etc..
These type of data is  so interwoven with transactional processes and these processes maintain this data. To take this out of these transactional systems may not be a valuable exercise, as it is costly and would mean too much of integration difficulties. So a select few data elements of these  are usually replicated from these transactional systems to the master data stores or created as federated data components – however this is an architectural consideration.

These type of data, even though it is occurring throughout the various business processes in the company, would be sought after every time master data is requested. So it is a good idea to replicate this some of the important data elements of this data to master data store and keep the source system reference IDs of the master data store. Let us call this ‘Transactional summary data’  or transactional summary data layer. It is important to remember that this data will be physically present in the master data store.

1.6.2     Identification of customer centric light weight transactions (a.k.a candidate federated components)



The next step is, we need to identify the light weight transactions, which probably is not that important or frequent as the first layer.

Good example is the “Identification and Authorisation” information. Ie; who has access to data in the organization.  Usually the services layer check this information before giving out this information. Also used in case of users of web portals which the customers or their authorised representative use to pay bills, interact with the organization etc.  This is arguably master data but not kept along with master data due to security reasons.  

Another example is a survey information, which collect feed back from customers is very important database which is important to understand the pulse of customer. It is not master data as it could collected frequently and mostly considered as part of a customer touchpoint. So this is transactional data which is collected and maintained in close association with customer data.


Some other examples
·         Customer alerts :This send a bill due notification etc. send to the customers.  
·         Customer  calls : A a call from the customer about a problem.
·         Trouble tickets raised by customers.


These are definitely transactional data. But are lightweight and very customer centric. Some or all these are candidates for considering as federated components of master data. Ie; separate standalone systems and databases which uses the customer enterprise IDs from the master data hub.

1.6.3     Pure transactional entities. Integration via services layer.


The third and final step in analysing the transactional data, is to take steps or at least be mindful of linking the master data and the enterprise IDs to the rest of the transactional systems so that the services layer can integrate the data among these master data systems and transactional data systems. The importance is to make arrangements in the overall architectural standards and design or the MDM solution to ensure a smooth operation with this type of information.


At this point the master data store at the target state becomes something like this.



1.7      The analytical data layer and MDM data store.




Output of customer focused business intelligence, data mining, click stream analytics, social media analytics etc. are relevant at this layer. However, again all information is not considered here. Only the most summarised and important attributes are considered which are used widely in the enterprise.

A common technical property of these ‘selected’ attribute is the singular relationship of these data elements to the master data concept being addressed. Eg: A customer satisfaction index could be measured on a scale of 1 to 10 and say it is recalculated every quarter. So at any point in time there is only one value. Or it could be such attribute which has one or a few values.

This type of “Business Value Vs Volume Vs Frequency” considerations keep the principle of master data in check and from being runaway systems that store and handle every kind of data for the enterprise.

At the analytical data layer, the type of data is not so well defined as in the two inner layers. It is derived or say invented information and vary greatly by industry and organization. So it could be hard list down all the entities and then figure out what can be moved to the operational master data store. So we would need a different approach here.

It is better to provide a flexible mechanism to add new data elements and values of them to the master data store. Similar to the replication from the transactional layer, few key data elements can be replicated to master data store from the analytical layer.

Also it is imperative to have a clear understanding of the purpose of this layer. This is not for replicating all the customer(or say the MDM domain) related analytical data. The purpose is to support the business applications handing transactional process but needing some of these analytical summaries for the transactional operations.

Eg: A customer inbound call system (Commonly known as IVR, Interactive voice response) system want to give a higher priority for customers likely to switch to competitors. Or they want to prioritize based on customers value to the organization. Most of these would be derived information from analytical systems.  The popular option is to store this in the IVR database itself. However, there could be many other systems that would have used this data if it was readily available. Like the ordering system when ordering new products for existing customers. Or the marketing system who has plans to mail a certain customers etc..

The purpose of this layer is to provide a few key data elements which represents these most important indicators and make it available for the operational processes. But also it will have the links (by storing corresponding IDs or link information) to the source systems for retrieving more information in an automated fashion.
Overall, this avoid some un-necessary replications of the master data and these systems access the ‘best available data’ directly.

It could be difficult to foresee these type of requirements in the initial phases of a master data management program. So this layer can be considered completely optional and can be added later on. However need to put enough thought in designing the services layer etc to include these in future.



This bring the final end state picture of the customer master data store to,


1.8      Summary


This represents a way, a complete ‘enterprise’ view of master data can be developed in systematized, modular and multi phased approach, starting even with a registry model, and leading to an Enterprise MDM solution in an ideal and technologically optimal case. So we initially focus our resources in building or cleaning up the ‘Core aka Layer1, then move on to Operational MDM aka Layer 2, Then to Layer 3 etc. So let this be the starting point approach. Depending on the needs, skill set available etc. of the organization, a direct adoption of an enterprise MDM could even be considered.

However as mentioned initially, these represent a data dependency driven sequence which results in a logical and optimal way of building/refining these components. But the business needs whether they are band-aid solutions or strategic business capabilities draws more weight. And that is okay. In a practical world, any MDM implementation should be okay with this reality. After all, all these are done to support the business. For example, if a trouble ticket resolution system is totally broken, we cannot say, wait until we create or clean up the ‘Core’ customer master data hub. And then fix the next layer and then the trouble ticket resolution. It is the modularisation, overall architecture and road mapping portions of master data management which solves that puzzle.