Sunday, July 28, 2013

Understanding Data Analysis: Part One

Data analysis is a basic skill for any business personel at any modern enterprises. Most of the decision making processes at such organizations lay on a solid analytical background before they reach into the final resolution. Almost all the modern Business Intelligence technologies originate from requirements at the data analysis steps. Learning the data analysis process and concept gives very good meaning for understanding why and how the BI technologies have evolved into the current state.

I have a series of blog entries on data analysis concepts, process, and techniques. This is the first part of the series. This blog entry is motivated and actually based on the following references.

  1. Michael Milton: Head First Data Analysis

 

Part I: The basic steps

Making decisions at a management position can be either easy or difficult. Using intuitions and always bet on one's luck has been a tradition for many "cowboy" style managers. To keep being successful and stay at the right career steps, many managers choose to use decent approaches and make decisions based on a set of rational and serious analysis. This starts the era of data analysis.

No matter what analysis it is, the very basic steps of any analysis are to define the problem, break it down, take observations and make partial solutions, assemble the results and make the final conclusions. Data analysis is about applying the analysis with a large set of data at hand.

The information age brings massive amount of data to every business entity. Data means value and opportunities if you know how to understand and use it. In many situations we found ourselves working as a data analyst without knowing the basic techniques and practices about this role. A data analyst knows how to break down a large set of data and give it a structural understanding so that the data becomes intelligence and insights. With a powerful toolbox, a data analyst transforms data into knowledge which often pushes forward decisions to be made. Thus, the basic skill or instinct of a data analyst is the capability to understand and structure data.

Given a scenario or just a piece of data, the typical steps that every data analyst will go through can be summarized below.

1. Define the problem

This is the very first step. Know your problem is the basic instinct for a data analyst. All analysis is targeted at a goal, that is, the question that should be answered or the problem that should be solved. Without defining the question or the problem, the analysis will head for nothing. How to define the problem? Just ask! You should of course ask the person who will make decisions based on your analysis. Sometimes it can just be yourself if you are to make the decision. Don't just ask for the final goal of the problem. Ask what the decision maker means.

In many scenarios, the decision maker needs the help from a data analyst because she/he

- Do not know about the data

- Know somethinng about the data but is not totally sure with it

- is not sure about the problem

- is not sure about the scenario of the problem

- Is not good at making decisions

- Use more intuition than analytics

The data analyst should start by clearly defining the problem with her/his clients. It can be that the client brings more problems or requirements after this step, which is totally fine. Changes to requirement may bring little overhead in terms of administration effort. But it also helps to narrow down the problem in many cases. A lot of the times, to help the clients to identify her/his problem is the very first and the most important task for the analyst. Many clients are even not aware of the problems that they are facing.

There is, of course, a method or concept called "exploratory data analysis." That means the analyst must find valuable hypotheses from the data which worth evaluating. This is already more than a concrete problem.

2. Disassemble the scenario

To find a solution to a problem scenario, a very typical approach is to divide the problem and scenario into small pieces and analyze. This step requires the data analyst break down the problem into smaller pieces in order to start the detailed analysis. It is important to keep breaking down to the level that best fit the analysis.

To break down the problem, one approach is to create a kind of tree structure, starting with the problem defined in the previous step and extending to different sub-nodes until you reach the leaf level.

This also applies to the data. The data analyst will need more detailed data if the first step only provides a general level summary of the data. More data brings more room for analysis and testing of conclusions.

What is the most important technique to use when looking at data? Making comparisons! In fact, making good comparisons is at the core of data analysis. A data will always find it useful to break down the data by finding good and interesting comparisons of data.

As a product from this step, I can imagine that the data analyst completes with a document of two sections. The first is the break-down structure of the problem. The second is a set of rules on how to divide and slice the data.

As it always says, you cannot move the whole mountain all at once, but you can move it piece by piece.

3. Evaluate your conclusion

This step evaluate all the observations from the previous steps. The key to evaluate is to compare. With all the problems on the left hand and all the observations on the right side, compare, compare and compare!

A compulsory to this step is to insert the analy her/him-self to this process. This means the analyst bring her/his responsibility on board and focus on getting the best value for the client as well as the analyst her/him-self. While the analyst is indeed betting on her/his credibility, there comes also more likelihood that the analyst will get more trust and belief from the clients.

I'd like to quote this paragraph from the book Head First Data Analysis.

"Whether you’re building complex models or making simple decisions, data analysis is all about you: your beliefs, your judgement, your credibility"

4. Make the decision

At this step, the analyst summarizes the conclusion and gives the final decision or recommendation. All the analysis results must be formed into a format that is easy for making decisions. Otherwise, the analysis was useless.

The analyst must make it simple and straight for the decision-makers. It is important to get the voice heard (that is, to make it understandable) so that people can make sensible decisions based on the input from the analyst.

One more thing to add is about the analysis report. The analysis report can take as simple as three sections, " background," "understanding of data," "recommendation." And again, it must be simple, concise and direct.

 

With all the steps of data analysis, will we be ready for all kinds of challenges? Not exactly. There are quite a list of things to set ready for. One first question will be "What to do when my analysis turns wrong or incomplete?"

Well, not everything can be so easy and safe. In many cases, the initial data analysis report can get bad feed back and results. What could go wrong then?

The first thing to consider is the assumptions made in the previous analysis. Assumptions are usually based on mental models we built on the scenario. Mental model is how we normally see the world around us. It helps us to understand and interpret information around us. It is like a model in mathematics, where one can build a ton of basic elements and describe the world with them. Very often, the analyst's mental model is affected by the mental model from the clients. So an important lesson for the analyst is to make all the mental models, at least the important parts, explicit to the analysis.

Every part of the analysis, such as the statistic model, the data model, etc. is always dependent on the analyst's mental model. When mental model is wrong, all steps of analysis must be run again to provide a new round of analysis result.

To build the right mental model, the analyst must include both things that are taken as assumption and things that are unknown or uncertain. The "anti-assumption" is very often more important and can lead to new discovery and insights. For example, consider making a list of "things I do not know," "things that I do not know how to do," "things that I have not done," etc. This must be applied to the analysis process and the analyst needs to challenge the client on many aspects of the clients' mental model at the "disassemble" step of the analysis.

And of course, the analyst must ask for new and more detailed data for the new round of analysis. Results from the previous analysis can shed the light on where and how to ask for the new data. A very typical mistake from many analysts is to focus too much on the numerical values (also called "measures") in the data. In fact, the row or column header information is even more important to look at. In many cases, how you drill into and mesh around data decides how the data look like in the end.

Analysis never stops. If the analyst always finds out new clues or observations, she/he can continue the four steps for ever.