Friday, December 31, 2010

The dilemma of consistency in technology architecture

As one of the key elements of enterprise architecture, the "platform" or "technology" architecture is about what kind of software and hardware tools and systems to use across the whole organization.

One can imagine that there are ways of maintaining a strategy of having a "buying list" of these systems for an enterprise. Historically or even nowadays, large enterprises tend to put a lot of efforts on maintaining the "open system" concept. Here the "open system" concept means that an enterprise use software systems that accepts and enables  interoperability, portability and open software standards. But in reality, not all software vendors give full-support to open standards and using systems from various technology and vendor background means extra cost of ownership to the IT managers. Thus, the following two phenomenas show the general dilemma of most enterprises towards technology choices.

  • Due to business and economical requirements, different departments are becoming more and more vendor dependent in spite of the enterprise-wide strategy of supporting open system concept. 
  • Although most departments are using standard products, no matter if they come from a single vendor a or limit group of them, different local units still choose to use packaged solutions (often called appliances)  in situations where they often find it hard (or expensive) to use the standard tools.
So should an architect put all her/his effort to ensure the consistency of the usage of standard tools?

Although consistency is definitely a personal virtue, it is absolutely not expandable to IT architecture. Instead of wasting all the effort to keeping the consistency, it is more useful to focus on maintaining the architectural strategy in a timely manner and getting ready to accept new ideas and changes from time to time.

In terms of integration of different systems, it is always important to consider a set of protocols in the infrastructure when adding new items in the buying list.

  1. Middlewares and gateways that enable the integration;
  2. Communication protocols such as web services
  3. Information brokers, such as those that transform the data-types, char-sets (ASCII to EBCDC), or XML-based transformation
  4. BPM tools that cope with processes at various frequencies;
  5. Event and alert management tools;
  6. Message-based systems, i.e., solutions that can keep messages for various systems. 
  7. Application-oriented adapters that support integration of applications with other existing solutions. 

Friday, October 22, 2010

OO, star-schema, and anchor modelling

Apparently there are many other data modelling methodologies besides relational modelling. In the data warehouse and BI world, the key word "multi-dimensional model" has been overwhelming for more than 2 decades. Kimball's theory on creating dimensional models has been well adopted into the industry. What we have found out, based on many people's (hard) experiences, is that the multi-dimensional data models are quite fit for data analysis purposes and it fits into the analytical mind of most business world. But, it is not a proper model for maintaining a large data warehouse where multiple data sources are ETL-ed into a single-version-of-truth.

The object-oriented modelling has been invented into the database world more than 10 years ago. As it is now, I can only see that Oracle has adopted some parts of this methodology into its commercial product. Maybe OO is just not the right way for managing the data, at least in the OLTP/OLAP world.

In the data warehouse modelling world, one of the key challenges to every data warehouse, is how to keep the history of data. Different data has different profiles. Some changes often, some needs to have a traceable history of the changes, some never changes, some only needs the most current change. To model the different enterprise data and keep the histories in a good manner, the concept of anchor modelling has just been discussed in the last ER conferences. It will be valuable to take a read into this topic and get to know more details about the method.  

Friday, September 24, 2010

Relational model and E-R diagram

Well, anyone can pick up the definition of relational model by Googling and reading the Wikipedia page on this. What I want to point out, based on my experience, is that relational model is a clean way to sort out concepts around data in an enterprise situation. There have been other types of models in the database theory, such as hierarchy and network models. Relational models fit the need of OLTP system design quite perfectly and have been the dominant modelling method for more than 40 years. Right now, there are object oriented models, and other variations of relational models exist in the industry.

One of the key challenges to the relational modelling, is that the management of history inside a model. In the relational theory, the 1NF (Normal Form), 2NF, to BCNF,4NF, and 5NF are even not enough to keep historical data in a clean way. People have introduced 6NF to manage history data in relational theory.

The key concepts in relational modelling are, for example, entity, entityset, relationship, relationshipset, one-to-many, many-to-many, one-to-one relationships, etc. So, how does people in the enterprise world understand a data model? Diagrams, yes, the diagrams. Most people could not understand the modelling tools such as IDA, RSM, WBM, until they see the diagrams shown in the tool. Diagrams are the key output in a modelling session.

So, improve your "diagramming" skill if you plan to be a modeller :)

Friday, September 10, 2010

How data modelling has been in the enterprises

Well, it all depends...

For an enterprise where modelling is considered as an important step towards the maturity of IT development, process, functionality, and data modelling (and others like user experience) are an important part of developers' life.

What's interesting to see is that data modelling has not been considered as so important compared to functionality or process parts in most organizations. It is not difficult to understand this. The E-R modelling discipline has served most transnational system designs and other tricks, such as multidimensional modelling has been well used in most situations before. And I believe many S-M or close to L size organizations still have no problem of only doing some basic E/R modelling and hire good DBAs to take care of the rest for the next decades.

The difference is that some, I cannot find a better word than some, organizations do have good business in the past and have acquired different lines of business and owned different systems for many years. When they do the integration of IT solutions (they will end-up doing this most of the time), it has been extremely difficult to integrate on the process, or functionality. or any other levels. Data model is the only possible easy way to let the integration succeed. So, it is just the recent years that many large or X large organizations started to realize the importance of controlling their data, meaning the data modelling, data quality, master data, and meta-data.

Another angle is to look at the vendors of data modelling tools and data models. There have been quite a lot of data modelling tools (we mentioned this in a previous note) in the market but there are only a few leaders in the market, most of whom also provide large database engines. And vendors of data models are extremely limited to giant software vendors. So this market up-to-now has been very limited.

This phenomenon looks interesting and even funny to me. Most of people talk about information explosion in the new IT era. But it is also these people who choose not to understand their data, which is then translated to information for some purposes. :) 

Friday, August 27, 2010

Useful resources about data modelling

 I believe I would definitely try the Wikipedia: http://en.wikipedia.org/wiki/Data_modeling which has a lot of good links and http://www.databaseanswers.org which is also very useful. 


To get a further and more detailed understanding about data modelling, there are plenty of tutorials, books, and presentations about data modelling. Among these, I found the following quite useful.

  • Len Silverston (and his team)'s work "the data model resource book" is definitely a "must-have" for all data modellers. The three series of the book is a "make-a-living" tool for many data modellers in the industry.
  • The classical book "Data Modelling Essentials" by Graeme Simsion and Graham Witt is another book that I recommend everybody to keep in your book shelf. 




    • If you are more at the business side and would like to know a bit more about data modelling, I would recommend Steve Hoberman's book "Data Modelling Made Simple." 
    • As we've been talking about BI and data warehousing in this blog, the typical theory of multidimensional modelling and star-schema and snow-flake is well introduced in Kimball's book series. Here I would recommend the dimensional modelling book in his series. 
    •  Besides these books for industry users, I think there existing many database theory books that use 1 or 2 chapters to talk about E-R modelling, normalization, and dimensional modelling. It is fair enough for university students. And I would actually recommend new-comers to the data modelling world to start your readings from these classical theories. (I've introduced two books in a previous posting.)
    • More intellectual work exists as products or services from various companies rather than being described in books or blog notes. The IBM industry data model has been well made for several industries. Similarly, Teradata has its industry data model sold together with its platform. Even SAS has its own data model to support its well-known BI platform. What one needs is to find out an employer who would like to hire you and train you to get the knowledge of these industry products. Right?

    Friday, August 20, 2010

    Data modelling tools and vendors

    It is important to point out that a data modelling tool (I mean, a decent and correct tool) should support at least logical and physical modelling. A tool that one can use to create database schema or database design is DEFINITELY not a data modelling tool, but a database development tool.

    I found this link very useful: http://www.databaseanswers.org/modelling_tools.htm
    And this link actually is quite updated on the most recent modelling tools.

    For enterprise users, Erwin, IDA (previously RDA), MS Visio, Sybase Power Designer and Data Architect, and Oracle tools are the most relevant from the list. I believe SAS also has its tools to support certain level of data modelling.

    There is no need to compare these tools. But there are few things to be in mind if you are inspecting data modelling tools.

    First, a data modelling tool can be able to support different type of models, such as conceptual model, logical model, and physical model. The typical database theory tells that one should start from conceptual modelling and move down to logical models and then to physical data models.

    Second, UML modelling or diagramming is not data modelling. Creating UML elements and show some diagrams (Visio is a good example here) are not modelling but creating sketches. Data modelling requires one to create entities, relationships, attributes, keys, etc. (I listed them in a previous note) and the right data modelling tool should keep the metadata in the data models. Creating diagrams is a small part of data modelling. When the data model gets complicated, it is impossible to show the details in the diagram. That's the time when you need to use the modelling tool to keep the design details.

    Third, most of these data modelling tools only provide functionality and do not have any model content. This is absolutely OK. Data modelling is a process of designing intellectual properties. There have been a few vendors of data model content, such as IBM industry model, Teradata model, SAS data model, etc. For small and medium size enterprises, I believe it is quite OK to just take Silverstone's book (the data model resource book, VOL I, II, and III) and copy some part of the content into the modelling tool.

    Fourth, I believe that vendors are quite important for data modelling tools in large enterprises. One thing is that when your developers at the company work on a model and find out something wrong in the modelling tool, a good vendor will be able to provide sufficient consultancy support in time. The second thing is that mange large enterprises tend to tailor the modelling tool to have its own "scar" on the assets created by the tool. This requires a close relationship between vendors and users.

    So, this is about data modelling tools and vendors. I will write something about useful books, references and resources on data modelling in the coming post....

    Sunday, August 15, 2010

    Why do we do logical data modelling?

    In the domain of my work, there has been a long-going debate on the purpose of logical data modelling. There are people who care more about the end-result of a database design and choose to focus on the database part and call it "physical data modelling." There are also people who care about both logical and physical data models but are in doubt whether these two are the same or different.

    Well, by opening the few books that I left behind my shelf and check the Wikipedia site, I think I do have some ways to answer these questions.

    First, what is logical data model? I guess the Wikipedia definition below is fair enough for people to understand.

    "A logical data model (LDM) in systems engineering is a representation of an organization's data, organized in terms entities and relationships and is independent of any particular data management technology."

    So, the principle is that no specific database technology should be involved in the logical data modelling part. If we look at the typical database theories, such as the one you can get from these books , here are the key areas that we look at during the logical modelling phase.
    • Entities
    • Attributes
    • Relationships: Binary/tertiary/n-ary
    • Roles
    • Participation
    • Keys, super keys, candiate keys, primary key
    • Weak Entity Types
    • ternary relationship
    • Multi-valued Attributes
    • Lossless-Join Decomposition
    • Functional Dependency
    • Normal Forms
    • Boyce-Codd Normal Form (BCNF)
    • 3rd Normal Form (3NF)
    • Update anomalies
    It is important to note that part of the E-R modelling is initiated in the conceptual modelling phase. The logical data modelling activity starts by inspecting the E-R model and decides on how the entities and relationships are further arranged into tables.

    The term 'Logical Data Model' is sometimes used as a synonym of 'Domain Model' or as an alternative to the domain model. While the two concepts are closely related, and have overlapping goals, a domain model is more focused on capturing the concepts in the problem domain rather than the structure of the data associated with that domain.

    Second, why do we need logical data modelling? I think the Wikipedia has a few good points. For example, "Helps common understanding of business data elements and requirements" and " Facilitates avoidance of data redundancy and thus prevent data & business transaction inconsistency."

    It is quite apparent that the logical data modelling provides a foundation for designing the database schema. However, many people choose to ignore this fact by creating the database design direct. In fact, the "logical data model" is already inside these people's mind when they are creating the database tables. Otherwise, how can one say that attribute A and attribute B should be in the same table? How can one determine the unique key of a table?

    Another vital step in logical data modelling, is the decisions that help to reuse and share data. Most people only understand this point after the database has been used for several years (and when new requirements arrive to the database design). It is hard to say that this is a shame. But a lot of enterprises do need to understand the importance of data by losing billions to re-create solutions every time the database cannot afford the changes.

    Third, there has always been debates and discussions about the boundary of logical data modelling and the physical database design. When one comes to generic data modelling where an individual or an organization is generally considered as "Involved Party," the decisions about roll-up/down in the class hierarchy can be either a logical data model decision or a physical database design decision. It is hard to distinguish who should make the final decision in most enterprises. However, a better way to solve this situation is to involved both logical data modellers and the DBAs in the discussion and reach a design where DBAs agree with the logical data modelers. To make this point clear, this is a logical data model decision as long as no specific database technology is involved. The DBAs of a specific database technology can be consulted (so that she/he feels involved and engaged) to find out if a re-work has to be done when the logical design is applied to physical database design.

    Wednesday, August 11, 2010

    Good article on Eclipse Development Projects

    I've found a very decent list of excellent Eclipse development projects from this link at eweek.com.
    To make it easy to read, I've extracted part of the content here.

    Eclipse Modeling Framework (EMF)
    Eclipse is huge in the modeling community. EMF is the core framework and code generation facility that allows developers to create
    applications based on a structured data model.
    Xtext
    Xtext is a relatively new project but is quickly become very popular for creating domain specific languages. With Xtext you can easily create your own programming languages and domain-specific languages (DSLs). The framework supports the development of language infrastructures including compilers and interpreters as well as full blown Eclipse-based IDE integration.

    Jetty
    Jetty is an open-source project providing an HTTP server, HTTP client and javax.servlet container. Jetty is a very popular Web server and servlet container. It is often found embedded in applications such as Yahoo Hadoop Cluster, Google AppEngine and Zimbra. Jetty provides a Web server and javax.servlet container, plus support for Web Sockets, OSGi, JMX, JNDI, JASPI, AJP and many other integrations

    CDT
    The CDT Project provides a fully functional C and C++ Integrated Development Environment based on the Eclipse platform. CDT is now the defacto C/C++ IDE in the non-Microsoft world. Most embedded vendors and Linux distros use CDT as their C/C++ IDE.

    PDT Eclipse PHP Development Tools
    The PDT project provides a PHP Development Tools framework for the Eclipse platform. This project encompasses all development components necessary to develop PHP and facilitate extensibility. It leverages the existing Web Tools Platform (WTP) and Dynamic Languages Toolkit (DLTK) in providing developers with PHP capabilities. PDT has quickly become the one the more popular IDE in the Eclipse community.

    Mylyn Framework
    Mylyn is the task and application lifecycle management (ALM) framework for Eclipse. Over the last three years Mylyn has become the hub or integration point for many of the Agile ALM vendors. Mylyn has over 45 different connectors that make it possible to link different ALM tools to its unique task perspective.

    BIRT—The Business Intelligence and Reporting Tools
    BIRT is an open-source, Eclipse-based reporting system that integrates with your Java/J2EE application to produce compelling reports. BIRT provides core reporting features such as report layout, data access and scripting. BIRT has become a popular reporting solution for Java developers.

    Web Tools/Java EE Tools/Eclipse Java Development Tools (JDT)
    Eclipse continues to be the standard for Java developers. If you are creating Java applications chances are you are using some combination of the JDT and Web Tools or Java EE Tools project.

    Eclipse Plug-in Development Environment (PDE)
    The Plug-in Development Environment (PDE) provides tools to create, develop, test, debug, build and deploy Eclipse plug-ins, fragments, features, update sites and RCP products. PDE also provides comprehensive OSGi tooling, which makes it an ideal environment for component programming, not just Eclipse plug-in development.

    eGit Version Control
    The rest of this list highlights up and coming projects that have become popular with developers. One of them is eGit, which is an Eclipse Team provider for the Git version control system. Git is a distributed SCM, which means every developer has a full copy of all history of every revision of the code, making queries against the history very fast and versatile. The eGit project is implementing Eclipse tooling on top of the JGit Java implementation of Git. Git is becoming a very popular source code management system. eGit is a new Eclipse project to provide tight integration between Eclipse and Git.

    Gemini
    The Enterprise Modules Project—Gemini—is all about modular implementations of Java EE technology. It provides the ability for users to consume individual modules as needed, without requiring unnecessary additional runtime pieces. Gemini implements many of the OSGi Enterprise Specifications.

    Memory Analyzer (MAT)
    The Eclipse Memory Analyzer is a fast and feature-rich Java heap analyzer that helps developers find memory leaks and reduce memory consumption. Memory Analyzer is becoming a very popular tool with Java developers.

    Connected Data Objects (CDO)
    CDO is both a technology for distributed shared EMF models and a fast server-based object-relational (O/R) mapping solution. With CDO you can easily enhance your existing models in such a way that saving a resource transparently commits the applied changes to a relational database. CDO is a model repository for EMF models. It provides the scalability and transactional capabilities required to use EMF for large scale applications. CDO has a 3-tier architecture supporting EMF-based client applications, featuring a central model repository server and leveraging different types of pluggable data storage back-ends like relational databases, object databases and file systems.

    Eclipse Device Software Development Platform (DSDP) Project
    The Eclipse Device Software Development Platform (DSDP) Project is an open source collaborative software development project dedicated to providing an extensible, standards-based platform to address a broad range of needs in the device software development space using the Eclipse platform. DSDP is a top-level container project that includes several independent technology sub-projects focused on the embedded and mobile space. Sub-projects under the DSDP include Blinki, Device Debugging, Mobile Tools for Java, Native Application Builder, Real-Time Software Components (RTSC), Sequoyah, Target Management, and Tools for Mobile Linux.

    JavaScript Development Tools
    The JavaScript Development Tools provide plug-ins that implement an IDE supporting the development of JavaScript applications and JavaScript within web applications. It adds a JavaScript project type and perspective to the Eclipse Workbench as well as a number of views, editors, wizards, and builders.

    Eclipse Marketplace
    The Eclipse Marketplace, which offers the Eclipse community a convenient portal to help users find open source and commercial Eclipse-related offerings. The new Marketplace client makes it easier for users to download and install tools from Instantiations and others.

    I think these books about Eclipse are very useful....

    Sunday, August 8, 2010

    Agile BI in my view (1)

    TDWI has a few good discussion on Agile BI recently, such as the followings.

    Reflection on an Agile BI program

    An Imperative to Build, Not Buy, Agile BI

    While most of these articles are focused on establishing processes, guidelines or toolsets to ensure Agile BI success, I have different points of views.

    If one looks at the current HR setup in IT branches of large enterprises, the key to ensure agility in BI is that they do have qualified people to do the good work. This is, in most cases, done by contractors such as external consultants, outsourcing partners, or experts that are going to be head-hunted to a better-pay job in 1-2 years.

    Even the right processes, tools and guidelines are available, the more important part is that the developers have the awareness, competency, and willingness to follow the agile development.

    Awareness means that the developers (IT and business) knows about the agile process and knows about how to follow the process. In large organizations, sometime it takes more time to know how to do the process than execute the process.

    Competency is always an issue for managers in large enterprises. Employees who are eager to learn and improve skills are normally looking for challenges most of the time. If the project is done, it is hard to keep this competency inside the organization. In order to do agile BI, it is very important that the developers have good understanding of the toolset. Otherwise, the first 3-5 sprints will be used just to train the developers. Do we still consider such training as "agile development?" That's one of the reasons that a lot of companies are using external consultants.

    By the way, using a single toolset such as MS BI tools or SAS tools or Cognos tools seems to be much better than using different tools from different vendors. at the same time. There are two reasons. i. It is impossible to have your developers with knowledge of all these tools; ii. The communication between these tools has a potentially large cost.

    Willingness is an interesting issue. Sometime people may not follow the process even if they know the process and they have the right competency. Think about using a team with a hybrid structure, employees who have been working with you for 20 years (who may know COBOL very well), external consultants from company A with the latest BI tool knowledge, external consultants from company B with knowledge of another BI toolset, and developers from outsourcing partners. It is hard to imagine that all the team members will work perfectly in such situation.

    Another important issue to agile BI is the communications. I mean communications inside the sprint team and with the outside world.

    By the way, I think these books are very useful when you need to learn about agile methods and BI/DW.

    Saturday, June 5, 2010

    Is Apple going to be the next 20-year IT leader?

    Some years ago I was in a mood of feeling sorry for Jobs and his Apple team because Gates and Microsoft inherited the concept of UI (from Apple) and dominated the markets with Win3.2, Win95, Win98, XP, etc... The great time of Personal Computer was the golden time for Microsoft and Gates.

    What surprised me was Apple's coming back with  iPod , iPhone, and many other wonderful products. Perhaps Jobs was much better in foreseeing the "far future" of IT than Gates at 20 years ago. Another "phenomenon" is, of course, Google. The Internet era and mobile era are coming together at this moment. As far as I have seen, Apple has been the most successful one to lead and dominate the trend.

    Will Apple be the market leader for the next 20 years? Well, perhaps yes.

    Saturday, May 15, 2010

    The time of "touch screen" and "tablet"

    I just read a review of one of Logitech's recent product called "Squeezebox." Despite its horrible price (I mean, compared to its ability), this product shows a vision that many IT people have been working on since late 1990s.

    Stream media at home? Yes, and I have heard about this since college and the next will be to control your home appliances through a control board. I am sure that Apple is working on this at some of its labs. Let's wait and see who's on top of who.

    Wednesday, April 28, 2010

    The "net income increases" time

    It is quite surprising, and a bit "happy," to learn that so many IT vendors are having network income increase in the past Q1 of 2010. All the signals show that we are a bit over in the lowest level of economy growth. Sybase, Microsoft, Western Digital, etc., and many other well-known IT software and service vendors are having a good time now. Hopefully, the investors are optimistic about the market as well. ;)

    Wednesday, April 14, 2010

    Is iPad so useful?

    I've always been asking myself this question since I first heard about iPad. Well, if one just wants to be "cool" or "show-off," this can be a reason to buy this interesting gadget. But otherwise, what is the purpose of having it?

    I think the key question is, what is the "killing point" of iPad? The "touch-screen?" or the Apple type of "usability?" I have to admit that iPad and iPhone are very well at the usability. And the design is so cool. When stripping off this part, what else does Apple have as a competitive advantage?

    I am glad that there are still companies like Apple to keep the trends in the IT industry and there are still lots of fans that keep the eyeballs around the IT industry. Good luck iPad, really.

    Tuesday, April 13, 2010

    Nokia enhance its involvement in LBS

    Apparently Nokia, the giant mobile phone vendor, is not afraid of the challenges from Google and Apple. At a recent acquisition, Nokia got MetaCarta, a provider of geographic intelligence solutions based in Massachusetts. In the area of location based services (LBSs), Nokia is launching more capabilities with MetaCarta's technology on local search in location and other services. In an early acquisition, Nokia got Novarra, an internet mobility company, whose mobile browser and services platform are going to be used for improving the Nokia Series 40 mobile phones.

    Monday, April 12, 2010

    Who could be buying Palm?

    Giving the hot competitions from Apple iPhone and Google phones, it is not unusual that the well-known smart phone maker is seeking a way-out. Perhaps HTC can be a good choice. Well, I would think of a software vendor, such as Google or Microsoft, to jump in over here.

    Again, this is just a consolidation activity in the mobile phone market. Hopefully the users of Palm can live with the changes.

    Saturday, April 10, 2010

    LBS features in new iPhone OS

    Apparently Apple is giving a lot of new life to the new version of iPhone OS. If we just take the Location-Based Services. In the new OS, location services can be configured to use cell towers, rather than the power-hogging GPS chip. iPhone OS 4 will notify users when an app is trying to discover their location, log that data, and allow users granular, app-by-app control of how that location data is used.

    This is a really decent support for those who has been working in the LBS world. The "towers-based" tracking can be a huge promotion to many LBS applications. And the ability of letting users controlling the privacy and usage of their location is showing Apple's clear understanding of users' privacy and security.

    About decent laptops

    Some friends ask me to tip about laptops that seem decent. Well, there are pretty a lot on the market. My first point is to forget about NetBooks. I believe netbooks will die out soon or just be merged with typical laptops (just think about why iPhone has new OS to support multi-tasking)....I would look at the CPU and RAM first, and then size, weight. The brand is very important as well. Sony and ASUS seem to produce something really good (both in prise and quality). And do not forget Lenovo and HP. Well, there is also something called DELL.

    Hope this is a long enough suggestion....

    Tuesday, February 9, 2010

    Who is the "killer" of iPad

    Among most of the tablets produced from the major vendors, such as HP, SONY, LENOVO, etc., one of the key issues is the price. How can we cut the price down to several hundred dollars?
    Well, that pushes me to put my bet on the FreeScale products. Apparently the smartbooks produced by FreeScale are not suitable for direct usage to the consumers. There are much value in the tailoring and customization side. If one can actually put on an open source project to develop extensions to the FreeScale smartbooks, this is going to take away a lot of users of iPad.