Monday, April 4, 2011

Does SCRUM fit for data warehousing?

There have been a broad discussion on how agile concept fits into the landscape of data warehousing. Considering the very special nature (well, many things are special, as I agree) of data warehousing, concepts like SCRUM seem to be beneficial for running projects in the data warehousing context. However, can all data warehousing activities fit into a sprint?

Well, I do agree that there are ways to break-down the activities into smaller steps so that they fit into sprints. However, the different life cycle statuses of a data warehouse may imply different tactics of implementing SCRUM concept.

When you work with a matured platform, where more than 80% percent of the major data subject areas, such as people, organization, employee, customer, products & services, have been populated in DW and the need to adding new data sources has been disappearing over the last 2 years, it is time to consider using SCRUM to manage the control the development activities around the DW. Why? Because the data model, the ETL, and the different rules, guidelines are getting matured. People are used to the way that things are supposed to be. So it is very easy to estimate what activities should fit into each sprint.

If we are in a situation where less than 20% of the data warehouse data is populated from source systems, it seems very challenging to consider using SCRUM. In such a status, the DW team are still struggling on the rules and ways of working. Substantial re-works are appearing on weekly or daily basis. In such a case, trying out any agile methods can be risky unless the SCRUM team has all key technical developers enrolled and gets the full management support (in case of re-works).

What if you are in-between these two states (21%-79% of data is populated)? I would be very careful with what has been populated in DW. If the majority of the enterprise master data, such as Customer, Products, Organization, Arrangements, has been ready, it is safe to consider SCRUM by including key technical developers during the process. Otherwise, consider a more classical DW approach.

By the way, this book may look interesting to read.

Saturday, April 2, 2011

Is data valuable for an enterprise?

There is a recent post on Information Management where people questioned and debated about the value of data itself to an organization.

As a general assumption, most enterprises value data as "valuable asset" for themselves. Whatever is kept in their IT systems is useful for creating business benefits. However, people always have to put data into certain context, such as a business process like "sales," "marketing," and "credit profiling", in order to create the value based on these data. On that sense, the data itself does not seem to be a valuable asset until the moment it is dragged into the context.

Well, this is an interesting observation or argument. But I would definitely question what is "data" to an enterprise. When a customer record is written in the database table at the IT system, this piece of data is already put into a context, i.e., the business definition of a customer to this business organization, and the business rules applied to this customer. And since this moment, this piece of data is creating value for the business. How? There can be few reasons.

1. The customer record is in fact the hard evidence that a business transaction happened. This transaction information lets the business organization have a legal way to protect its business. When a customer comes and ask for a refund or modification of certain product or services, the data kept in the system is a legal protection of what should or should not be considered. One could argue that I am using a certain context in this example. However, I would then ask when and how can you find any piece of data which has nothing to do with the business on any context? If that happens, why would you need data into the IT system?

2. In certain business domains, such as banking and health-care, keeping the data (I mean archiving) is a necessary need from a regulatory and compliance perspective. If an organization in these domains cannot fulfill this requirement, its business has to stop.

So, data in an enterprise is definitely a valuable asset. Why? Because you will lose your business if you do not keep data.

Friday, April 1, 2011

What is "One Version of Truth” for a Data Warehouse

One version of truth or single version of truth (svot) has been a popular vision for data warehouse developers in many years. Large organizations tend to put one version of truth as a major milestone in the implementation of a centralized data warehouse. So, what does "one version of truth" mean? Is it achievable?

First of all, one version of truth means that all enterprise data are consolidated into a single data warehouse. The data is kept in a consistent and non-redundant way such that all data coming out from the data warehouse should be understood as the enterprise's common view of information. For example, if there are minor differences in the organization hierarchy data in different business area, the data from the the data warehouse should be considered as the correct, commonly-accepted, and enterprise-wide agreed organization hierarchy.

In many cases, the interpretation of single version of truth is extended so that one can have a "federated" view of different versions of truth in the single data warehouse. This implies that the data warehouse tends to give the governance of certain business logic away in order to maintain the view of “centralization" in a technical level. I think Malcolm Chrisholm's article "there is no single version of the truth" is in fact quite valid in the real data warehouse world. The only way to achieve a "single version of the truth" is to have the agreement and governance process on the business level.

However, when you manage to let all business to agree on almost everything of the organization's data, are we losing the power of being different and being able to think out of the box? The basic nature of a successful business is that people try to work out innovative ways and new concepts or new views of old things.

In my point of view, one version of truth is achievable in certain sectors, such as the military section and the public sector. In an enterprise where it is important to make business innovations and improvements, one version of truth sounds more like a thing in the road map... ...