Tuesday, April 17, 2012

Initial Impression of Metadata Tools in Information Server 8.7

Just got my hands on a sandbox of Information Server 8.7 installation. After looking at the InfoCenter and the Redbook about metadata management, I have had an initial view of the new metadata offerings from the Information Server 8.7.

Just to add a bit background here. I have been using Information Server 8.1.x in the past 4 years.
  • In InfoSphere Business Glossary, I can see that the BG "Editor" and BG "Browser" interfaces are merged into a single user interface which is a good improvement on the user experience side. The new UI looks more "modern" and gives overview to both terms, categories, and IT assets very easily. In addition, Business Glossary has added a workflow feature that enables an "approval workflow" for managing BG terms. This workflow is disabled in the default installation. 
  • The meta-model in the Metadata Server has been extended with a few new elements. Among other, the most existing one to me is the Logical Data Model. In the 8.1.x version, only physical data model (database table schemes) can be imported to the Metadata Server. Now a logical data model created in InfoSphere Data Architect, including the documentations in IDA, can be imported to the Metadata Server. This is a very good step towards making BG and Metadata Workbench more widely accepted and used by both business and IT developers in enterprises. 
  • Metadata Workbench looks similar to the previous versions. The "Administration" tab in Metadata Workbench has an improved interface. And as I have heard (not tried myself yet), the data lineage and business lineage functionality has been improved a lot, so is the automated metadata service (which has been a pain in the 8.1.x version).
  • A new tool, "Metadata Asset Management" seems to provide an excellent way of administrating metadata in an enterprise setup. This tool allows users to import and compare different types of metadata, such as data models, data files, BI reports. As described in the Redbook (link), one can , based on this tool, design an enterprise metadata environment for all kinds of metadata related to data integration process. 
  • istool seems to keep the same functionalities as before and add a "reporting asset" for import/export. 
I also heard that Business Glossary provides more ways of integration with other tools. And the "Blueprint Director" is a very interesting architecture tool. Hopefully I will post more findings when I begin to do more work with this sandbox in the coming days :)

Monday, April 9, 2012

The Need of Overviewing Enterprise Data Lineage

The concept of data lineage comes to proliferation in the Information Management world due to the need for more detailed, quality-focused metadata of the data flows and deliveries across enterprise IT systems.

In a large enterprise where regulatory reporting, dash boards, score card and analytics are widely used, different IT systems need to delivery their data to fulfill the various purpose of business intelligence applications. As a best practice, such enterprise would already have established a data warehouse team to maintain a central repository of all kinds of data. Use of data warehouse makes the architecture of enterprise data flows quite simple. The warehouse is just like a centralized hub that collects all kinds of input and delivers outputs to various parties. And in such case, managing data flows and data deliveries across the enterprise seems simple (although still very challenging) because all management focus can be put around the data warehouse. Data lineage in such an architecture form is focused on the lineage of data pre- and post- the data warehouse.

An very good example is the data lineage tools inside the IBM InfoSphere Metadata Workbench. In the data lineage tool, a source delivery file can be connected to an ETL job that takes this file to the data warehouse table. Then the table can be connected to the ETL job that loads the data warehouse table to the data mart tables which are then extracted and loaded to BI reports. With a naive (yes, not native but naive) support of defining external applications and extension mappings, the data lineage can be (manually) extended to external systems and tools to include the whole data life cycle, i.e., starting from the front-end system where the data was initially created, to the deliveries, the data warehouse, the data mart, and the BI applications, not to mention the various ETL jobs, data quality checks, etc., during the "life cycle" of this data-to-information flow. 

The data lineage functionality in InfoSphere Metadata Workbench does give us an good example of how enterprise users can benefit from having such a view across different architectural elements. Here is a short list:
  1. Traceability of technical elements brought by the data lineage functionality can reduce the cost of system maintenance and support. For example, a developer in the system management group will easily find out all the relevant tables and jobs that are related to a piece of  "problematic" data. 
  2. The cost of doing impact analysis can be dramatically reduced if the data lineage information is available. Whenever a change is to be applied on an architectural elements, estimating the impact in terms of cost or even possibility  can save a great budget. 
  3. The capability of aligning architecture designs and to considering optimal refactoring options can be further strengthened if a detailed data lineage map is available. It is much easier for an IT architect to make the design options if the relevant data flows, together with the performance and throughput data, are presented on a diagram (which makes the job of IT architect more or less similar to an construction architect's work).
After seeing the benefits of data lineage, we need to come back and challenge if we need to have an overview of the whole data lineage across the enterprise IT systems instead of only those around the data warehouse. 

Typically, most of the data need can be fulfilled by requesting a data delivery from the data warehouse to any business intelligence applications. In many enterprises, lots of IT systems act as a "hub" of certain group of enterprise data and generate outputs by adding "values" to the data. For example, a CRM system or an accounting system normally need to take data like customer, branches, agreements from other systems and then create output data with more "corrected," "aligned," "calculated" information. From an enterprise point of view, the output from such "intermediate" systems is also a main data source for the data warehouse. And the data flows from and to these systems are equally or even more "mission-critical" than the data warehouse.

Another example is the case when there is much need on making "operational reporting" or "operational business intelligence" in a real-time or "near real time" manner.  In such cases, event-based architecture is used to send data directly to the business intelligence applications. No data warehouse is involved here (except that an end-of-day summary may be delivered to the data warehouse through a batch-window). Such data flow often requires great architecture care due to its mission-critical purpose.

So, it does make lots of sense to maintain an overview of the whole enterprise data lineage on top of all IT systems, not just around the data warehouse. As up to now and up to my knowledge, there has not been any great software product that can fulfill this need completely. We need to cross our fingers and see what the vendors (I expect IBM, ASG, and maybe some start-ups) can bring to the table in the future.

Friday, April 6, 2012

How to start the implementation of InfoSphere Business Glossary

InfoSphere Business Glossary (IBG or BG) is a very useful "Data Concept Management Tool" to connect all users of data at an enterprise. BG is part of the IBM Information Server tool suite and is intended to be used as a enterprise glossary of business concepts and terms. With the metadata interchange support and the Information Server Metadata Server, a business term described in BG can be connected to a technical asset such as a column in a table, an ETL job that populates a table, a field member of a Business Intelligence report, or flat file contained in a delivery between enterprise applications. Another key feature of BG is the use of data steward. A data steward (person, role or group) can be assigned as an owner of a business term defined in BG. This provides an opportunity of creating an enterprise data governance structure. The ownership concept provides an enterprise-wide awareness of data and quality of data.
 
Implementation of BG normally requires a team of allocated resources (e.g., a project team or a task force) to start an initial "enabling" phase to define the work process and to actually implement a substantial amount of key business terms and concepts. The following is a list of key issues or topics that such team must work with in order to create a successful initial implementation of Business Glossary.

1. Creating a data owner/steward role in the enterprise
    Apparently, making terms without knowing if it is correct or wrong does not make any sense at all. The enterprise must have employees that care and use the data that a term describes. Designing a role like data steward will not cost the employee a huge amount of extra working hours. On the long term perspective, the data owner can guardian the data and quality of data which will, in return, helps the enterprise to make the right business decisions based on the reports/dashboards/mining results that are based on the data.

2. Designing a basic category structure in BG
   Business Glossary is a very simple tool. All business terms are located in categories. All categories are having a tree structure. A term physically located in one category can be "referred" by another category. A term can be connected to other terms as synonyms.
   Generally, a good structure in BG contains top categories that classifies the business concepts. For example, there can be a category called "Customer" which includes all kinds of business terms and concepts regarding customers to this enterprise. If all the top categories in BG are about such kind of major business conceptual areas, for example, "Customer," "Branch," "Product," "Agreements," etc., any users in the enterprise will find it not difficult to browse around to understand all the key business areas and business concepts. Besides, the search functionality in BG is well enough for any newcomers to look around.

3. Designing the work processes around BG
    Typically, when users start to use BG, there will be requirements on how to let data steward edit the terms, how to manage the life cycle of terms (in BG, a term can be "standard," "accepted," "candidate," or "deprecated"), and how to make sure that all contents are having back-ups etc. There can be many more such questions to the maintenance team of the BG content.
    In addition, when BG is used and maintained in a strict enterprise environment where there can be TEST, FREEZE and PRODUCTION instances, the work processes seem to be more important.

4. Marketing and making organizational implementation
    A tool like Business Glossary does not contain any fancy and complicated functionality such that all users of the enterprise can suddenly start enjoying and cheering up every time they find it useful. The team must go out and communicate with almost everyone in the enterprise in order to have a good start on the tool. Here, the data stewards team should be the first group that get familiar (and enjoyable) with the tool and start (with their charms) to influence other enterprise users as "Business Glossary advocates" until everybody likes it.

5. Starting populating the terms
    Normally an enterprise must have several key business concepts that are used throughout almost all activities/documentations, such as customer, internal units, employee, etc. It is very unlikely that one cannot find any employee that cares about terms and concepts in these core conceptual areas. In other words, one should find it possible and possibly easy to define and describe terms and assign data stewards in these core concepts. Giving a dedicated period and resource to populate terms in such core business conceptual areas gives a huge positive impact on the success of Business Glossary.

The concept of having a Business Glossary is to use the "crown sourcing" power to improve and align the organization on the data and information usage. It paves the way for successful business intelligence implementation in the long run. Hopefully, we could expect more social and collaborative features from Business Glossary in the future.