Friday, April 4, 2008

Reading notes for "Building and Managing Meta Data Repository," Part I

This is wonderful book by David Marco. As Mr. Inmon suggested, one has to come to Mr. Marco's writings if looking for "meta data" education.

Here comes my notes of the first part.

Part 1, Laying the Foundation

One reason to have a metadata system is to keep information flexibility and integrity in an enterprise’s IT system. Another reason is that, due to the fast growth of data volume, many enterprises have to split data from a single server to multiple systems and maybe, get a federation server to help people to use it. In such conditions, it becomes much more important to control the metadata.

Most of the metadata systems nowadays are kind of provider of information, not a monitor. But what makes a metadata system the most powerful is when you are able to, not only get the metadata information, but also modify and manage the modification of metadata.

What can be the ROI of metadata? There are quite a few benefits.

1. Data definition reporting
It is, indeed, a very basic metadata solution and it is somewhat a data dictionary. Normally very experienced people cannot sense the importance of this benefit. But for less-experienced IT people and business users, this is a must-have thing.

2. Data quality tracking
Controlling the data redundancy, accuracy, and completeness is always a good issue.

3. Business user access to metadata
If there is a semantic layer between the IT systems and the business users, it will become quite easy for the business users to understand the data. For example, a business user may get a report but want to know how the values in the columns are calculated. Here the business metadata comes into the play.

4. Impact analysis
If an enterprise has a whole-wide metadata system, it becomes very easy to do impact analysis on most subjects. And if the data is kept at a high quality, the result of the impact analysis will be a very excellent input to decision-making or enterprise analysis.

A good example to understand what metadata is, is the card catalog in the library. Normally, in a data warehouse environment, there are two types of metadata, technical and business. You can look at the target group for these two types. The technical metadata is the metadata that supports technical and IT users. The business metadata supports business users.

Even external data is quite ad hoc and unstable, it is quite important, when external data source is used, to have and maintain the metadata of external data source.

There are majorly three types of users for metadata, business users, technical users, and power users.

In a data life cycle of a data warehouse, there are many parts, or components that can lead to metadata. For example, the ETL tools, the data modeling tools, the reporting tools, and the data quality tools. There are vendors that provide independent systems on metadata management. But such systems look more like a metadata source rather than a solution. When there are third-party applications that are focused on one or a few business areas, such as CRM or ERP systems, the management of metadata may become a bit more complicated. The reason is that these vendors do not want users to manipulate its internal infrastructure (because this may lead users to create own systems other than use theirs).

Metadata of an enterprise comes from two types of sources, structured and unstructured. The structured sources are those that people have discussed, documented and agreed on. They are kept well in tools and documents. On the other hand, much of the most useful information is actually unstructured. They are on a Post-It note or just in some people’s mind (and be assumed as commonsense). Still, such information should be well captured and recorded and managed if possible.

What has not been quite established, even nowadays (8 years after this book is published), is the metadata security issue. In general, there are two ways of metadata security, proactive security that prevents unwanted access before it occurs, and reactive security that use audits to check what has happened.

Meta model is important when it comes to a standard for different tools to interchange information. There used to be MDC (based by Microsoft) and CWM meta models (based by OMG folks) but MDC merged into CWM around 2002. Any since XML has been so popular now, the meta model should also be represented by XML.

No comments: