Sometimes software developers may not be the best candidate for creating a winning solution if they do not have a clear view of what is product management.
So, what is product management? It has many definitions. There are two sides in product management. From a product manager’s point of view, these two are the internal and external sides. The external side is about to go to the customers (or, users) and turn their requirements into strategies and objectives of the product. The internal side is about to let the organization to support and implement the strategies and objectives.
Why should we think about product management before talking anything about architecture? Generally speaking, what you delivered, as a software product, must be maintained, upgraded, re-developed by a set of people in your developing organization. Then it matters quite a lot that, at the beginning of developing this product, you start by considering how the software is managed by the product management team. Without a successful product management, no architecture can be said as successful.
There are many product development models, such as the waterfall, spiral, and XP. What makes these models successful in individual teams, is that one must bare in mind that the delivery is not just a software-thing, it is a suite of things, including a software, a set of product management protocols, and so on. The principles behind the agile development method should be very well understood by modern software teams. One important thing is to put test always in the first place. Before you design anything, design the test first.
Normally a product development process starts from a business plan and then a project proposal and so on. When we come to new releases, it is very important that all differences in the different releases are captured in at least one document. The book mentioned a “marketing requirements document (MRD)” to keep track of these differences.
There are quite a few things important to the product development cycle.
1. Freezing. When we work on stuff in a serious manner, it is always important to have test, freeze, and production environments. Freezing environment is where we put a batch of updates together and commit, test them before these updates are finally moved to production.
2. Change management should always be thought of as something should be documented, implemented, tested, analyzed, and linked to the source of change.
3. Documentation is always a challenge. It is always hard to know how much documentation is enough or what kinds of documentation is enough for the users and the product management team.
4. Recycle Bin. It is always a good idea to say “no” if you know that you do not have enough capacity to develop a new feature required by the customer. Just put it to a recycle bin and get back in the next life cycle of the product.
Actually modern software architecture design must also be tightly connected to the product marketing process. There are 4 “Ps” of marking, i.e., product, price (and the business model), place (distribution channel), and promotion (advertising and marketing communication). In fact some of these can be considered in the architecture design and be implemented accordingly. One should also look at a market segmentation when thinking about the architecture design of the product.
Wednesday, March 26, 2008
Monday, March 24, 2008
What's new in week 13, 2008
Monday, 2008-03-24, Copenhagen
Safari is known mostly to Mac users. But it seems that this browser is trying to join the competition between MS IE and Mozilla’s Firefox. The recent release of Safari 3.1 is claimed to be an excellent tool for both MS and Mac users. Most of the foundational work of Safari comes from the WebKit open source project. The key feature of Safari is speed and simplicity. Safari has the best score on the Web Standards Project’s Acid3 test. However, the feature also brings disadvantages to the tool. As compared to Firefox, the extended features of Safari seem poor.
Sun has recently released NetBeans 6.1 IDE with extended features to support Java, C/C++, JavaScript and Ruby. MySQL is more tightly-binded with NetBeans in this new release. It seems that Sun is building a similar IDE platform to the Visual Studio from MS. Hopefully the developers all over the world can benefit from this competition.
IBM has recently introduced a new BI tool, called ProAct. It is a tool to help companies automate customer interaction tasks and boost sales. The software is developed by the IBM India research lab. It does not seem that this tool will be very well integrated with Information Server platform, but rather stay as a standalone system.
Unified threat management (UTM) solutions are quite popular among SMBs. It takes quite a cost to have people that are able to maintain the multiple devices that take care of different aspects of threats, such as firewall, anti-spam, malware detection, web content filter, physical hookups, licenses, services and support. Most SMBs are looking for a unified solution that minimizes the maintenance cost of the different tools.
An IDE for SQL developers? Yes! Embarcadero just shipped PowerSQL, an Eclipse-based SQL development tool. The tool supports Oracle, SQL Server, Sybase, and DB2. It seems that everybody is trying to extend Eclipse to everything (except MS, of course) and people are now seriously looking at the importance of database developers.
Tuesday, 2008-03-25, Copenhagen
Sun has just upgraded its virtualization software, VDI, to support the management of virtual desktop sessions on operation systems like Solaris OS, Windows, Mac OS, and Linux. It seems that, in addition to VMWare and MS, other OS vendors are also looking into the “virtualization” cake.
IBM just launched its first cloud computing center in Europe. The Cloud Computing Center is located in Dublin, Ireland and offers the same types of services and technology as in the Blue Cloud program. In addition, IBM has developed yet-another social networking application for business for cloud computing, called “Idea Factory.” The collaboration platform is targeted for business users inside an enterprise.
BI in Google spreadsheet? Yes! Google just said that it is using the BI software from Panorama, a Toronto-based software company to let users have analytical and reporting tools. Given that Panorama formerly sold its OLAP platform to Microsoft in 1996 (which is then rebranded as SSAS and has been very successful since then), Google’s action with Panorama seems to be a step forward to challenge the market of MS Office and BI platforms.
Wednesday, 2008-03-26, Copenhagen
Blist just went public two days ago and opened its social database application to the internet folks. Any user of Blist can create a database that uses a spread-sheet style front-end. Users create databases, called Blists, to store information for private use and for groups to collaborate. The great idea behind Blist, is that, even you have a social network with a lot of participants, you may not be able to connect to the right person at the right time. The reason is that you do not have enough information on the other folks in the same community. On the other hand, nobody would like to open his/her own information totally public. How do you spread your part of the knowledge/skill/stories/what-ever in the network? The technology can help you. If you keep your data in the database and let a kind of “data-miner” to work on it, when other people are searching for something related to the same stuff, the “data-miner” will be able to “RELATE” you and your stuff to those people. By doing that, we can easily find and use everybody’s expertise in an optimal way. Of course, users of Blist may experience privacy and security problems in the future. But the great idea of collaboration improvement behind this tool is more important and exciting.
Thursday, 2008-03-27, Copenhagen
British Telecom (BT) moving to Asia area? Yes. It is not new information any more. BT has just completed its acquisition of a Singapore-based firm, Frontline Technology. The Singapore-based company is listed on Singapore Exchange and has operations in most countries of south-eastern Asia. BT seems to be quite a bit interested in the growing market in this area.
What to note for Vertica? It is a company that produces RDBMS. Their product is majorly focused on the data warehouse and business intelligence market. The special thing about Vertica RDBMS is that it is a column-based RDBMS (while the traditional ones are row-based). It seems that the company chooses to go for a complete different direction from all the other RDBMS vendors. It may be interesting to know how long this RDBMS will exist (or maybe all the others will disappear).
While we have had so many social networking sites (Facebook, Bebo, LinedIn, hi5, Zorpia, …) in the internet planet, should there be anyone who finally unite or unify these networks? Yes, it is Microsoft. It seems that MS is seriously looking into the big pie of internet and starting to open its bloody mouth. When you can unite these networks, of course you can bring more users (either from these networks or new ones) to the world that you have designed to be. When all the users are so much dependent on your, you can just leave those networks so that they die out soon.
What is Imeem? As described on its website, “imeem is an online community where millions of fans and artists discover new music, videos, and photos, and share their tastes with friends.” What is interesting about this community is that it publishes a lot of OpenAPI so that other developers can use these APIs to create tools for the application. This is a brilliant idea for attracting man power from the open source community. But, wait a minute, how can we make sure that these outside applications do not bring security leaks? Apparently you cannot force out-house developers to following your development model, but then, what is the maintenance cost when you decide to adopt one provided from the outsiders?
Friday, 2008-03-28, Copenhagen
Is PostgreSQL just an open source toy? Definitely not! A lot of people have already been working on different extensions of PostgreSQL and there are software vendors that seriously take the energy from the open source community and utilize the tool in a commercial way. For example, EnterpriseDB, with a successful new round of Series C venture capital financing, has just released Postgres Plus 8.3 and Postgres Plus Advanced Server 8.3. Maybe PostgreSQL will be the only tool left for medium-size vendors to work on after MySQL is acquired by Sun.
Yahoo and Google just started an initiative that developers from both sides will create an open framework for social networks. This is a threat to the big network-owners like MySpace or Facebook. But on the other hand, this is always an ultimate destiny of all different networks. If you stay isolated, you will die our very soon. Everyone needs to connect to the rest of the world.
Microsoft is reaching the VOIP market and focusing on SMBs. Normally it takes a big cost when you move your office from one geographical location to another. I am not saying the cost of hiring someone to move the tables, but the cost of hiring someone who is able to re-configure everything in the new location so that the IT systems work the way it is expected to be. One important part is the VOIP system which used to be very complicated to play with. For SMBs, this is a serious problem. MS just had a new release of its VOIP Response Point phone system which is aimed to strip away the need for expertise with VOIP. The ultimate target is to let even the business users able to install and configure the system without any technical support. I am wondering what will happened when MS links its VOIP part with the rest of the unified communication plan (then there comes another big piece of cake).
Motorola is splitting into two parts, with the handset division being standalone in the future and the rest business stay in the other part. As Motorola is already losing its handset market in the US (from number 2 down to number 3, overtaken by Samsung), to split up may be a very good idea to re-boom the business.
SAS is seriously taking steps to occupy future BI markets. Last week, SAS announced its acquision of Teragram, a provider of multilingual natural language processing technologies and text analytics. At some point in history, people in the BI market were talking about text-based business intelligence and this only seems to be a bit far away from reality. Now SAS are taking its first step towards this trend. I bet other big vendors will follow very soon.
Safari is known mostly to Mac users. But it seems that this browser is trying to join the competition between MS IE and Mozilla’s Firefox. The recent release of Safari 3.1 is claimed to be an excellent tool for both MS and Mac users. Most of the foundational work of Safari comes from the WebKit open source project. The key feature of Safari is speed and simplicity. Safari has the best score on the Web Standards Project’s Acid3 test. However, the feature also brings disadvantages to the tool. As compared to Firefox, the extended features of Safari seem poor.
Sun has recently released NetBeans 6.1 IDE with extended features to support Java, C/C++, JavaScript and Ruby. MySQL is more tightly-binded with NetBeans in this new release. It seems that Sun is building a similar IDE platform to the Visual Studio from MS. Hopefully the developers all over the world can benefit from this competition.
IBM has recently introduced a new BI tool, called ProAct. It is a tool to help companies automate customer interaction tasks and boost sales. The software is developed by the IBM India research lab. It does not seem that this tool will be very well integrated with Information Server platform, but rather stay as a standalone system.
Unified threat management (UTM) solutions are quite popular among SMBs. It takes quite a cost to have people that are able to maintain the multiple devices that take care of different aspects of threats, such as firewall, anti-spam, malware detection, web content filter, physical hookups, licenses, services and support. Most SMBs are looking for a unified solution that minimizes the maintenance cost of the different tools.
An IDE for SQL developers? Yes! Embarcadero just shipped PowerSQL, an Eclipse-based SQL development tool. The tool supports Oracle, SQL Server, Sybase, and DB2. It seems that everybody is trying to extend Eclipse to everything (except MS, of course) and people are now seriously looking at the importance of database developers.
Tuesday, 2008-03-25, Copenhagen
Sun has just upgraded its virtualization software, VDI, to support the management of virtual desktop sessions on operation systems like Solaris OS, Windows, Mac OS, and Linux. It seems that, in addition to VMWare and MS, other OS vendors are also looking into the “virtualization” cake.
IBM just launched its first cloud computing center in Europe. The Cloud Computing Center is located in Dublin, Ireland and offers the same types of services and technology as in the Blue Cloud program. In addition, IBM has developed yet-another social networking application for business for cloud computing, called “Idea Factory.” The collaboration platform is targeted for business users inside an enterprise.
BI in Google spreadsheet? Yes! Google just said that it is using the BI software from Panorama, a Toronto-based software company to let users have analytical and reporting tools. Given that Panorama formerly sold its OLAP platform to Microsoft in 1996 (which is then rebranded as SSAS and has been very successful since then), Google’s action with Panorama seems to be a step forward to challenge the market of MS Office and BI platforms.
Wednesday, 2008-03-26, Copenhagen
Blist just went public two days ago and opened its social database application to the internet folks. Any user of Blist can create a database that uses a spread-sheet style front-end. Users create databases, called Blists, to store information for private use and for groups to collaborate. The great idea behind Blist, is that, even you have a social network with a lot of participants, you may not be able to connect to the right person at the right time. The reason is that you do not have enough information on the other folks in the same community. On the other hand, nobody would like to open his/her own information totally public. How do you spread your part of the knowledge/skill/stories/what-ever in the network? The technology can help you. If you keep your data in the database and let a kind of “data-miner” to work on it, when other people are searching for something related to the same stuff, the “data-miner” will be able to “RELATE” you and your stuff to those people. By doing that, we can easily find and use everybody’s expertise in an optimal way. Of course, users of Blist may experience privacy and security problems in the future. But the great idea of collaboration improvement behind this tool is more important and exciting.
Thursday, 2008-03-27, Copenhagen
British Telecom (BT) moving to Asia area? Yes. It is not new information any more. BT has just completed its acquisition of a Singapore-based firm, Frontline Technology. The Singapore-based company is listed on Singapore Exchange and has operations in most countries of south-eastern Asia. BT seems to be quite a bit interested in the growing market in this area.
What to note for Vertica? It is a company that produces RDBMS. Their product is majorly focused on the data warehouse and business intelligence market. The special thing about Vertica RDBMS is that it is a column-based RDBMS (while the traditional ones are row-based). It seems that the company chooses to go for a complete different direction from all the other RDBMS vendors. It may be interesting to know how long this RDBMS will exist (or maybe all the others will disappear).
While we have had so many social networking sites (Facebook, Bebo, LinedIn, hi5, Zorpia, …) in the internet planet, should there be anyone who finally unite or unify these networks? Yes, it is Microsoft. It seems that MS is seriously looking into the big pie of internet and starting to open its bloody mouth. When you can unite these networks, of course you can bring more users (either from these networks or new ones) to the world that you have designed to be. When all the users are so much dependent on your, you can just leave those networks so that they die out soon.
What is Imeem? As described on its website, “imeem is an online community where millions of fans and artists discover new music, videos, and photos, and share their tastes with friends.” What is interesting about this community is that it publishes a lot of OpenAPI so that other developers can use these APIs to create tools for the application. This is a brilliant idea for attracting man power from the open source community. But, wait a minute, how can we make sure that these outside applications do not bring security leaks? Apparently you cannot force out-house developers to following your development model, but then, what is the maintenance cost when you decide to adopt one provided from the outsiders?
Friday, 2008-03-28, Copenhagen
Is PostgreSQL just an open source toy? Definitely not! A lot of people have already been working on different extensions of PostgreSQL and there are software vendors that seriously take the energy from the open source community and utilize the tool in a commercial way. For example, EnterpriseDB, with a successful new round of Series C venture capital financing, has just released Postgres Plus 8.3 and Postgres Plus Advanced Server 8.3. Maybe PostgreSQL will be the only tool left for medium-size vendors to work on after MySQL is acquired by Sun.
Yahoo and Google just started an initiative that developers from both sides will create an open framework for social networks. This is a threat to the big network-owners like MySpace or Facebook. But on the other hand, this is always an ultimate destiny of all different networks. If you stay isolated, you will die our very soon. Everyone needs to connect to the rest of the world.
Microsoft is reaching the VOIP market and focusing on SMBs. Normally it takes a big cost when you move your office from one geographical location to another. I am not saying the cost of hiring someone to move the tables, but the cost of hiring someone who is able to re-configure everything in the new location so that the IT systems work the way it is expected to be. One important part is the VOIP system which used to be very complicated to play with. For SMBs, this is a serious problem. MS just had a new release of its VOIP Response Point phone system which is aimed to strip away the need for expertise with VOIP. The ultimate target is to let even the business users able to install and configure the system without any technical support. I am wondering what will happened when MS links its VOIP part with the rest of the unified communication plan (then there comes another big piece of cake).
Motorola is splitting into two parts, with the handset division being standalone in the future and the rest business stay in the other part. As Motorola is already losing its handset market in the US (from number 2 down to number 3, overtaken by Samsung), to split up may be a very good idea to re-boom the business.
SAS is seriously taking steps to occupy future BI markets. Last week, SAS announced its acquision of Teragram, a provider of multilingual natural language processing technologies and text analytics. At some point in history, people in the BI market were talking about text-based business intelligence and this only seems to be a bit far away from reality. Now SAS are taking its first step towards this trend. I bet other big vendors will follow very soon.
Tuesday, March 18, 2008
Notes for reading "The Data Warehouse Toolkit"
The second edition of "'The Data Warehouse Toolkit" by Ralph Kimball and Margy Ross is a great book about dimensional modeling. It is presumed as an industry-must-read material by most data warehouse people.
Here comes the reading notes for Ch. 1
Chapter 1 Dimensional Modeling Primer
Do you need a background to work with data warehouse? Yes, definitely. This whole chapter is talking about what you have to understand to be able to understand the data warehouse.
The first thing to clarify is the difference between "operational systems" and the data warehouse. In general, an operational system deals with one record at a time. It takes orders, register customers, or log complaints. It does not do a lot of summary or aggregations or dash board things like data warehouse does. It is a rigid system. It works with transactions if necessary. It repeats the same procedures over and over to finish the same business processes. Operational systems also keep historical records of data, but for a different purpose from the data warehouse. These historical records are only used for validation, recovery, but not for summarization. Data warehouse is where the historical data is re-organized, conformed, and summarized. Data warehouse is used in a more dynamic way. I admit that, there are quite a lot of similar reports that a data warehouse must present. But the power the data warehouse is that, when users come up with new questions, it is able to answer the questions easily. When the same thing happens to the operation systems, that requires a re-development of the platform which is very expensive.
So, what is the role of a data warehouse in an enterprise? There are following answers.
a. It should make all the data "easily accessible" which means a data warehouse is not another castle in the enterprise that everybody must try hard to learn and understand. It should be easily understood and used by most business folks.
b. It is a place to keep all the data consistent. Nowadays, people are talking about "one version of the truth." The data warehouse is definitely the best place to keep the truth of all the data.
c. The data warehouse should be ready for any changes. That is also one big difference between the data warehouse and operational systems.
d. As I quote from the book. "An organization's informational crown jewels are stored in data warehouse." The data warehouse must have effective control on the confidential information of the enterprise.
e. The data warehouse must ensure the correctness and completeness of the information it contains in order to serve the decision-making of the enterprise.
f. Once created, the data warehouse must be used. So the data warehouse must be widely accepted by the business community in the beginning of its life cycle.
So, to make a successful data warehouse, it is important that your team must take knowledge and skills from both DBA side and MBA side.
We can try to think data-warehouse-publishing-data as the process of an editor trying to publish a magazine. Apparently you need to talk to the audience, understand their attention, and provides the right information at the right time. Usually you have to come to people to collect new requests for improvement. You need to find new source of information. You need to develop and manage a good network of people working with the magazine. You need to keep all relevant people (esp. those at business sides) happy. Data warehousing is about publishing the right data at the right time to the right people.
So, if we begin to talk about the data warehouse (I mean, seriously, not things around the data warehouse but the things very nearby and inside the data warehouse). What are the parts that make up a complete data warehouse environment? There are four distinctive types of components, source systems, data staging area, data presentation area, and data access tools.
Let us discuss each in the following.
1. Source systems. What can be source systems? I would say any applications that deliver data to the data warehouse. Is source system equal to operational system? The answer may be “no,” because nowadays people are talking about collecting data from Internet or certain ERP, CRM systems (which can be assumed to be a kind of BI appliance) into the data warehouse. But most of the source systems are operational ones. If an enterprise takes efforts to do EAI (enterprise application integration) which means these systems are re-engineered to have a consistent view on processes or functionalities or data or all of them, the data warehousing task will become much easier.
2. Data staging area. I like very much the example provided in the book. Just imagine the data staging area as the kitchen of a restaurant. One thing to keep in mind is that this is a both storage area and an area of many processes. By “processes” I mean those that do the extract-transform-load (ETL) jobs. It is, as described in the book, everything between the source systems and the data presentation area. One thing that normally should not happen is to let the customers come into the kitchen or eat directly in the kitchen. That’s a rule-of-thumb. As to the book’s point of view, those in industry that talks about “enterprise data warehouse” are actually talking about something in the staging area. A more general meaning of enterprise data warehouse includes both the staging area and the presentation area and perhaps also the source systems.
3. Data presentation area. The usage of dimensional model is a bit kind of nature because people tend think of this as a very simple way of understanding things. So you have to accept that dimensional modeling is successful and has to be the only modeling way used at the data presentation area. Regarding the data marts in the presentation area, I would agree that the “bus” structure is a kind of natural choices when it comes to different BI applications with great challenges to conform everything into single dimensions and facts. However, there has to be a place where the “one version of the truth” of the data is kept. If you cannot hold them in the data presentation area, then they have to be at the data staging area. That is perhaps why a lot of people nowadays are talking about establishing a RDM 3NF model at an EDW before everything is mapped into dimensional models. As we all know that Start Schema is used very widely to support dimensional model in relational database. Actually another way of making the presentation area is the multidimensional database or OLAP. I believe that, not in the far future, OLAP applications will become more and more mature so that the “star-schema” time will be over very soon.
4. Data access tools. This is normally called BI applications nowadays. Excel and the front-end query tools like SQL Server- Query analyzer are also part of the data access tool.
Are there other possible components? Yes according to the current trend. Examples include metadata repository and ODS and data quality tools.
Regarding metadata, it has been reaching a stage that the whole data warehousing industry realizes its importance and quite a few tools are emerging to support centralized, unified metadata framework. Most people knew that metadata is of great importance but did not realize how bad it could be without doing anything about it. That was why a lot of enterprises are eagerly looking for a metadata solution. To work out metadata management, the first thing is the scope, i.e., how much you want the metadata framework to be. You need a complete conceptual framework to support the implementation of the metadata system. How to define the scope of metadata framework? That depends on your ambition and budget. Do you want to have a framework that is scalable for the next 20 years? Or just something that is able to help you in the next 5 years? Not a lot of people are mentally strong enough to support a very long-term metadata framework. At least, you need to figure out the relevant metadata around the data warehouse, and to corral, catalog, integrate these varieties of metadata. And this is like a resource library. Compared to the dimensional data modeling, metadata model is much more complicated.
Next, ODS, operational data store. ODS normally stays at the data staging area. It is kept frequently updated and it is a place where the data from source systems are somewhat integrated. It is a database with 3NF design. Sometime ago, I was confused by the ODS and the “enterprise-data-warehouse” that I described above. In fact, if you have an 3NF relational data model at the EDW, this EDW is actually part of the data staging area and is actually equally to the ODS to some extent. The difference between ODS and the “EDW” is where you put a lot of business calculations defined by the BI applications. There should not be so many such calculations between the source systems and the ODS. People need ODS normally because they need some kinds of immediate report which needs to integrate a lot of data from difference source systems. Real-time BI may be a requirement to have ODS. There are trends that people put ODS as a specially-administrated part of the conventional data warehouse.
Vocabulary of dimensional modeling
A fact table is where the numerical performance measurements of the business are stored. Normally measurement data is the largest portion of any data mart. What is important for defining a fact table is the grain of the data. It must comply to the same grain of all the relevant dimension tables. What can become difficult to build a fact table is to decide the columns in the fact table. There are additive facts and semi-additive facts that seem to be difficult to put, either in a fact table or a dimension table. It is also possible that a fact can be textual type. But normally developers would try to put it to dimensional tables.
Dimension tables contain the description of business. Each dimension table may have more than 50 attributes. These attributes are used to divide, group, merge different business entities which is then joined with the fact tables to have a business-oriented calculation. The result of such calculation may lead to a report, a performance report, a business decision, or a new business product.
There are few things that always stay around the dimension table design. a) Operational codes (suffix, etc.) b) The hierarchy behind a business area, such as family relations, organization relations, etc., is normally stored by the dimension table (although it brings redundancy) c) A snow flake design may let the business understanding a bit clearer, but it is definitely a performance disaster. d) The granularity of data must be clarified before the data model is designed. Normally people come to the finest grain as much as possible. e) There is also an issue about time and updates, i.e., the so-called slowly-changing-dimension (SCD) problem.
The dimensional model, compared to an ER diagram, may be simpler. Normally ER diagrams combines and conforms to a lot of business scenarios together. When making dimensional model based on the ER diagrams, those many-to-many relationships in the ER diagrams that contain numeric and additive non-key fields are mostly facts and should be put into fact tables. The remaining tables then should be de-normalized into flat tables like the dimension tables.
There are quite a few things to be careful when doing dimensional modeling in the industry.
1. The grain of data in the dimensional model should be the finest, not for summary. And the dimensional model should have data of the whole history.
2. One should be very careful with how to scope the dimensional models. It can be departmental or process-oriented (i.e., one model for one business process). Some people support the approach of spide-web (or Hub-spoke) approach that you always have a central place to feed data to all the data marts, but Kimbal seems to disagree with this idea and support the idea of multiple feeds. I would agree with a hub-spoke way because, if we look at the data from a senior management point of view, it is very important that you have an easy way of managing the data flow and the “bus” architecture seems to bring a lot more correlations and connections than the “hub-spoke” one. And it is better to use the “hub-spoke” if you are serious about the “one-version-of-the-truth.”
3. Dimensional model can be applied to many different industries.
4. It is better to have the usage pattern when you are designing a dimensional model. But it is not compulsory. A dimensional model can still be successful without knowing how it is going to be used exactly in the beginning of the design. Anyways, the designer can coach the user.
5. Be focused more on business requirements and goals when developing a data warehouse as they are more important than technologies.
6. Make sure that you include influential, accessible, and reasonable business sponsors in the project.
7. It may be more useful to consider an iterative project process to develop a data mart.
8. Make sure that the data presentation area is considered equally important to the staging area by th project, and allocate time on the data presentation area as much as you can.
9. Be aware that, many requirements and analytics around the data warehouse is changing over the time.
10. The success of a data warehouse is ultimately decided by the users.
This the end of my note on Ch. 1.
Here comes the reading notes for Ch. 1
Chapter 1 Dimensional Modeling Primer
Do you need a background to work with data warehouse? Yes, definitely. This whole chapter is talking about what you have to understand to be able to understand the data warehouse.
The first thing to clarify is the difference between "operational systems" and the data warehouse. In general, an operational system deals with one record at a time. It takes orders, register customers, or log complaints. It does not do a lot of summary or aggregations or dash board things like data warehouse does. It is a rigid system. It works with transactions if necessary. It repeats the same procedures over and over to finish the same business processes. Operational systems also keep historical records of data, but for a different purpose from the data warehouse. These historical records are only used for validation, recovery, but not for summarization. Data warehouse is where the historical data is re-organized, conformed, and summarized. Data warehouse is used in a more dynamic way. I admit that, there are quite a lot of similar reports that a data warehouse must present. But the power the data warehouse is that, when users come up with new questions, it is able to answer the questions easily. When the same thing happens to the operation systems, that requires a re-development of the platform which is very expensive.
So, what is the role of a data warehouse in an enterprise? There are following answers.
a. It should make all the data "easily accessible" which means a data warehouse is not another castle in the enterprise that everybody must try hard to learn and understand. It should be easily understood and used by most business folks.
b. It is a place to keep all the data consistent. Nowadays, people are talking about "one version of the truth." The data warehouse is definitely the best place to keep the truth of all the data.
c. The data warehouse should be ready for any changes. That is also one big difference between the data warehouse and operational systems.
d. As I quote from the book. "An organization's informational crown jewels are stored in data warehouse." The data warehouse must have effective control on the confidential information of the enterprise.
e. The data warehouse must ensure the correctness and completeness of the information it contains in order to serve the decision-making of the enterprise.
f. Once created, the data warehouse must be used. So the data warehouse must be widely accepted by the business community in the beginning of its life cycle.
So, to make a successful data warehouse, it is important that your team must take knowledge and skills from both DBA side and MBA side.
We can try to think data-warehouse-publishing-data as the process of an editor trying to publish a magazine. Apparently you need to talk to the audience, understand their attention, and provides the right information at the right time. Usually you have to come to people to collect new requests for improvement. You need to find new source of information. You need to develop and manage a good network of people working with the magazine. You need to keep all relevant people (esp. those at business sides) happy. Data warehousing is about publishing the right data at the right time to the right people.
So, if we begin to talk about the data warehouse (I mean, seriously, not things around the data warehouse but the things very nearby and inside the data warehouse). What are the parts that make up a complete data warehouse environment? There are four distinctive types of components, source systems, data staging area, data presentation area, and data access tools.
Let us discuss each in the following.
1. Source systems. What can be source systems? I would say any applications that deliver data to the data warehouse. Is source system equal to operational system? The answer may be “no,” because nowadays people are talking about collecting data from Internet or certain ERP, CRM systems (which can be assumed to be a kind of BI appliance) into the data warehouse. But most of the source systems are operational ones. If an enterprise takes efforts to do EAI (enterprise application integration) which means these systems are re-engineered to have a consistent view on processes or functionalities or data or all of them, the data warehousing task will become much easier.
2. Data staging area. I like very much the example provided in the book. Just imagine the data staging area as the kitchen of a restaurant. One thing to keep in mind is that this is a both storage area and an area of many processes. By “processes” I mean those that do the extract-transform-load (ETL) jobs. It is, as described in the book, everything between the source systems and the data presentation area. One thing that normally should not happen is to let the customers come into the kitchen or eat directly in the kitchen. That’s a rule-of-thumb. As to the book’s point of view, those in industry that talks about “enterprise data warehouse” are actually talking about something in the staging area. A more general meaning of enterprise data warehouse includes both the staging area and the presentation area and perhaps also the source systems.
3. Data presentation area. The usage of dimensional model is a bit kind of nature because people tend think of this as a very simple way of understanding things. So you have to accept that dimensional modeling is successful and has to be the only modeling way used at the data presentation area. Regarding the data marts in the presentation area, I would agree that the “bus” structure is a kind of natural choices when it comes to different BI applications with great challenges to conform everything into single dimensions and facts. However, there has to be a place where the “one version of the truth” of the data is kept. If you cannot hold them in the data presentation area, then they have to be at the data staging area. That is perhaps why a lot of people nowadays are talking about establishing a RDM 3NF model at an EDW before everything is mapped into dimensional models. As we all know that Start Schema is used very widely to support dimensional model in relational database. Actually another way of making the presentation area is the multidimensional database or OLAP. I believe that, not in the far future, OLAP applications will become more and more mature so that the “star-schema” time will be over very soon.
4. Data access tools. This is normally called BI applications nowadays. Excel and the front-end query tools like SQL Server- Query analyzer are also part of the data access tool.
Are there other possible components? Yes according to the current trend. Examples include metadata repository and ODS and data quality tools.
Regarding metadata, it has been reaching a stage that the whole data warehousing industry realizes its importance and quite a few tools are emerging to support centralized, unified metadata framework. Most people knew that metadata is of great importance but did not realize how bad it could be without doing anything about it. That was why a lot of enterprises are eagerly looking for a metadata solution. To work out metadata management, the first thing is the scope, i.e., how much you want the metadata framework to be. You need a complete conceptual framework to support the implementation of the metadata system. How to define the scope of metadata framework? That depends on your ambition and budget. Do you want to have a framework that is scalable for the next 20 years? Or just something that is able to help you in the next 5 years? Not a lot of people are mentally strong enough to support a very long-term metadata framework. At least, you need to figure out the relevant metadata around the data warehouse, and to corral, catalog, integrate these varieties of metadata. And this is like a resource library. Compared to the dimensional data modeling, metadata model is much more complicated.
Next, ODS, operational data store. ODS normally stays at the data staging area. It is kept frequently updated and it is a place where the data from source systems are somewhat integrated. It is a database with 3NF design. Sometime ago, I was confused by the ODS and the “enterprise-data-warehouse” that I described above. In fact, if you have an 3NF relational data model at the EDW, this EDW is actually part of the data staging area and is actually equally to the ODS to some extent. The difference between ODS and the “EDW” is where you put a lot of business calculations defined by the BI applications. There should not be so many such calculations between the source systems and the ODS. People need ODS normally because they need some kinds of immediate report which needs to integrate a lot of data from difference source systems. Real-time BI may be a requirement to have ODS. There are trends that people put ODS as a specially-administrated part of the conventional data warehouse.
Vocabulary of dimensional modeling
A fact table is where the numerical performance measurements of the business are stored. Normally measurement data is the largest portion of any data mart. What is important for defining a fact table is the grain of the data. It must comply to the same grain of all the relevant dimension tables. What can become difficult to build a fact table is to decide the columns in the fact table. There are additive facts and semi-additive facts that seem to be difficult to put, either in a fact table or a dimension table. It is also possible that a fact can be textual type. But normally developers would try to put it to dimensional tables.
Dimension tables contain the description of business. Each dimension table may have more than 50 attributes. These attributes are used to divide, group, merge different business entities which is then joined with the fact tables to have a business-oriented calculation. The result of such calculation may lead to a report, a performance report, a business decision, or a new business product.
There are few things that always stay around the dimension table design. a) Operational codes (suffix, etc.) b) The hierarchy behind a business area, such as family relations, organization relations, etc., is normally stored by the dimension table (although it brings redundancy) c) A snow flake design may let the business understanding a bit clearer, but it is definitely a performance disaster. d) The granularity of data must be clarified before the data model is designed. Normally people come to the finest grain as much as possible. e) There is also an issue about time and updates, i.e., the so-called slowly-changing-dimension (SCD) problem.
The dimensional model, compared to an ER diagram, may be simpler. Normally ER diagrams combines and conforms to a lot of business scenarios together. When making dimensional model based on the ER diagrams, those many-to-many relationships in the ER diagrams that contain numeric and additive non-key fields are mostly facts and should be put into fact tables. The remaining tables then should be de-normalized into flat tables like the dimension tables.
There are quite a few things to be careful when doing dimensional modeling in the industry.
1. The grain of data in the dimensional model should be the finest, not for summary. And the dimensional model should have data of the whole history.
2. One should be very careful with how to scope the dimensional models. It can be departmental or process-oriented (i.e., one model for one business process). Some people support the approach of spide-web (or Hub-spoke) approach that you always have a central place to feed data to all the data marts, but Kimbal seems to disagree with this idea and support the idea of multiple feeds. I would agree with a hub-spoke way because, if we look at the data from a senior management point of view, it is very important that you have an easy way of managing the data flow and the “bus” architecture seems to bring a lot more correlations and connections than the “hub-spoke” one. And it is better to use the “hub-spoke” if you are serious about the “one-version-of-the-truth.”
3. Dimensional model can be applied to many different industries.
4. It is better to have the usage pattern when you are designing a dimensional model. But it is not compulsory. A dimensional model can still be successful without knowing how it is going to be used exactly in the beginning of the design. Anyways, the designer can coach the user.
5. Be focused more on business requirements and goals when developing a data warehouse as they are more important than technologies.
6. Make sure that you include influential, accessible, and reasonable business sponsors in the project.
7. It may be more useful to consider an iterative project process to develop a data mart.
8. Make sure that the data presentation area is considered equally important to the staging area by th project, and allocate time on the data presentation area as much as you can.
9. Be aware that, many requirements and analytics around the data warehouse is changing over the time.
10. The success of a data warehouse is ultimately decided by the users.
This the end of my note on Ch. 1.
Monday, March 17, 2008
What's new in week 12, 2008
Monday, 17-03-2008, Copenhagen
Never heard of 'Kyte.tv?' It is a video streaming service that let users upload video from their mobile phones or web cams to be distributed on the web through Kyte channels. I wonder the difference between Kyte.tv and YouTube. Perhaps Kyte is more focused on the "channels." Which means the company Kyte tries to put their channels connected to more customers and then they are in a position of having a lot of people's focus so that they are able to attracts a lot of business to distribute their content through these channels. YouTube can be used for B2B, B2C, C2C (and C2C is probably the most widely used one). Kyte is more likely to be focused on B2C.
What happens when AOL gets Bebo? Bebo is a social networking site with 40 million users. It has a heavy presence in the UK, Ireland and New Zealand. Bebo can be seen as in the top three most popular social networking sites in the United States behind MySpace and Facebook. They are so many social networking sites. Which ones will always stay successful? From a user's point of view, do I need to have an account in every of these sites to keep up with all my friends (as they may have different belief or taste on social networks and may choose to use different sites)? There are two ways to understand this. First, it may come in the near future that, these networks decide to open to each other in certain ways so that users do not have to remember so many accounts and passwords. It is like you have both ICQ and MSN account and you can talk to your friends in MSN from your ICQ software. Second, there may be a kind of "mediator" sites that help people to connect to all these social networks in a easy way. It is like that there is a software that, once you logon (yes, you may need yet-another-account), it automatically login you into both ICQ and MSN so that you do not feel any difference talking to friends on either world.
Hulu.com, a legal and commercial media service on the web? As claimed, Hulu.com is a legal site where people can watch and share TV shows and full length movies. It is backed by NBC and News Corp and is open only to US users now. The player is developed based on Flash technology so that it can run only multiple platforms. Although there is no charge to use Hulu but users have to watch commercials in a certain frequency. And there are small banners running across the bottom of the screen. It is a successful (well, may be not yet) story for many P2P and online media in the 'dark world' to read.
Google and Postini. It was last summer that Google bought Postini to bring much-needed security to Google Apps. Postini provides message security, archiving, encryption and policy enforcement services for hosted software. Well, we used to received security attacks (I mean virus, spams, etc.) at our PCs. And later there were many attacks towards web servers. When SAAS are becoming more and more popular, I guess the next generation of security attack will be around the hosted applications. And yes, the attack will be more and more clever.
Tuesday, 18-03-2008, Copenhagen
Do we need to migrate to Vista just because it is a newer version of OS? If you are at an enterprise and you think of all you PCs. No! Not now. For many enterprises, such OS upgrade will only happen when the hardware needs to be upgraded. The direct upgrade of OS on an established hardware occurs only if the new OS brings a lot more performance, usability, and reliability benefits, such as from Win95 to Win98. Micrsoft has said it will end mainstream support in April 2009 and extended support in 2014.
MS vs. Google in the online market? I guess the winner will be the customers only. MS recently bought Rapt, a company that makes advertising yield management software for online publishers. If we think of the recent moves from MS to acquire Yahoo, it seems that MS is seriously looking for the next big piece of cake. And Google is definitely uncomfortable with such a giant player on one of its dominant business areas.
What do we expect from 802.11n? Well, four times faster speed then 802.11 a/b/g and longer connection distance? It seems that WiFi is getting closer and closer to our lives. Is there anybody doing medical research can tell me if WiFi is always safe for pregnant women and babies?
Talking about information security of an enterprise, do you think authentication, authorization, and coding is enough? Nowadays there has been an ongoing paradigm shift of the security industry. People are thinking of an information-centric security model. One example can be the prevention of information loss in an enterprise. The master data management (MDM) application can partially solve this issue. But a whole security framework (plus the privacy concerns) is more like an ultimate solution in a while.
MS released a new component library for financial service developers. Good news for developers of the banking and insurance industry! The component library seems to have a tight connection with both the Office tools and the back-end systems such as SQL Server. It seems that this component is trying to replace the IT developers’ efforts into a product so that the business developers can make user of. On the other hand, are there any jobs left for IT developers, or do they have to migrate into the business planet?
Sunday, 23-03-2008, Copenhagen
After a long Easter holiday, here is the latest info that I’ve got from my side.
There are stories that people built a “hot-pluggable architecture” for their BI infrastructure which enables the success of BI across quite a few legacy systems. It seems that this term in the BI and DW area is brought forward mainly by Oracle. The idea of this architecture is to allow an enterprise to mix and match existing and emerging technologies from a range of software vendors. Having a hot-pluggable architecture will allow the enterprise to easily connect and extend the existing heterogeneous systems and maximize the return on current and future IT investments. I think the basis of having such architecture is to enable and enrich the semantic layer of the enterprise information.
A lot of us are talking about cloud computing. Are we ready for an era that we do not have to care about upgrade of our software? It seems that SAAS is a trend with proven cases in industry. But the total era of cloud computing has to wait for a few important factors to grow mature. For example, one important factor is the internet connection availability. And a lot of people are worrying about internet security since it first appears. When the security threats turn their heads on servers and the network, we have to witness enough effort to protect the cloud computing technology against the threats. BTW, IBM has an initiative called “Blue Cloud” following this trend and Google has already established the Google App.
It seems that “Collaboration Tool” is one of the most important technologies of 2008. For a big enterprise, such tool allows employees to brainstorm, analyze, and share work and make their decisions together. Apparently, if the collaboration tool is equipped with unified communication capabilities and supported by various types of hardware such as mobile phones and PDAs, it is going to be the ultimate system that everybody will use.
Never heard of 'Kyte.tv?' It is a video streaming service that let users upload video from their mobile phones or web cams to be distributed on the web through Kyte channels. I wonder the difference between Kyte.tv and YouTube. Perhaps Kyte is more focused on the "channels." Which means the company Kyte tries to put their channels connected to more customers and then they are in a position of having a lot of people's focus so that they are able to attracts a lot of business to distribute their content through these channels. YouTube can be used for B2B, B2C, C2C (and C2C is probably the most widely used one). Kyte is more likely to be focused on B2C.
What happens when AOL gets Bebo? Bebo is a social networking site with 40 million users. It has a heavy presence in the UK, Ireland and New Zealand. Bebo can be seen as in the top three most popular social networking sites in the United States behind MySpace and Facebook. They are so many social networking sites. Which ones will always stay successful? From a user's point of view, do I need to have an account in every of these sites to keep up with all my friends (as they may have different belief or taste on social networks and may choose to use different sites)? There are two ways to understand this. First, it may come in the near future that, these networks decide to open to each other in certain ways so that users do not have to remember so many accounts and passwords. It is like you have both ICQ and MSN account and you can talk to your friends in MSN from your ICQ software. Second, there may be a kind of "mediator" sites that help people to connect to all these social networks in a easy way. It is like that there is a software that, once you logon (yes, you may need yet-another-account), it automatically login you into both ICQ and MSN so that you do not feel any difference talking to friends on either world.
Hulu.com, a legal and commercial media service on the web? As claimed, Hulu.com is a legal site where people can watch and share TV shows and full length movies. It is backed by NBC and News Corp and is open only to US users now. The player is developed based on Flash technology so that it can run only multiple platforms. Although there is no charge to use Hulu but users have to watch commercials in a certain frequency. And there are small banners running across the bottom of the screen. It is a successful (well, may be not yet) story for many P2P and online media in the 'dark world' to read.
Google and Postini. It was last summer that Google bought Postini to bring much-needed security to Google Apps. Postini provides message security, archiving, encryption and policy enforcement services for hosted software. Well, we used to received security attacks (I mean virus, spams, etc.) at our PCs. And later there were many attacks towards web servers. When SAAS are becoming more and more popular, I guess the next generation of security attack will be around the hosted applications. And yes, the attack will be more and more clever.
Tuesday, 18-03-2008, Copenhagen
Do we need to migrate to Vista just because it is a newer version of OS? If you are at an enterprise and you think of all you PCs. No! Not now. For many enterprises, such OS upgrade will only happen when the hardware needs to be upgraded. The direct upgrade of OS on an established hardware occurs only if the new OS brings a lot more performance, usability, and reliability benefits, such as from Win95 to Win98. Micrsoft has said it will end mainstream support in April 2009 and extended support in 2014.
MS vs. Google in the online market? I guess the winner will be the customers only. MS recently bought Rapt, a company that makes advertising yield management software for online publishers. If we think of the recent moves from MS to acquire Yahoo, it seems that MS is seriously looking for the next big piece of cake. And Google is definitely uncomfortable with such a giant player on one of its dominant business areas.
What do we expect from 802.11n? Well, four times faster speed then 802.11 a/b/g and longer connection distance? It seems that WiFi is getting closer and closer to our lives. Is there anybody doing medical research can tell me if WiFi is always safe for pregnant women and babies?
Talking about information security of an enterprise, do you think authentication, authorization, and coding is enough? Nowadays there has been an ongoing paradigm shift of the security industry. People are thinking of an information-centric security model. One example can be the prevention of information loss in an enterprise. The master data management (MDM) application can partially solve this issue. But a whole security framework (plus the privacy concerns) is more like an ultimate solution in a while.
MS released a new component library for financial service developers. Good news for developers of the banking and insurance industry! The component library seems to have a tight connection with both the Office tools and the back-end systems such as SQL Server. It seems that this component is trying to replace the IT developers’ efforts into a product so that the business developers can make user of. On the other hand, are there any jobs left for IT developers, or do they have to migrate into the business planet?
Sunday, 23-03-2008, Copenhagen
After a long Easter holiday, here is the latest info that I’ve got from my side.
There are stories that people built a “hot-pluggable architecture” for their BI infrastructure which enables the success of BI across quite a few legacy systems. It seems that this term in the BI and DW area is brought forward mainly by Oracle. The idea of this architecture is to allow an enterprise to mix and match existing and emerging technologies from a range of software vendors. Having a hot-pluggable architecture will allow the enterprise to easily connect and extend the existing heterogeneous systems and maximize the return on current and future IT investments. I think the basis of having such architecture is to enable and enrich the semantic layer of the enterprise information.
A lot of us are talking about cloud computing. Are we ready for an era that we do not have to care about upgrade of our software? It seems that SAAS is a trend with proven cases in industry. But the total era of cloud computing has to wait for a few important factors to grow mature. For example, one important factor is the internet connection availability. And a lot of people are worrying about internet security since it first appears. When the security threats turn their heads on servers and the network, we have to witness enough effort to protect the cloud computing technology against the threats. BTW, IBM has an initiative called “Blue Cloud” following this trend and Google has already established the Google App.
It seems that “Collaboration Tool” is one of the most important technologies of 2008. For a big enterprise, such tool allows employees to brainstorm, analyze, and share work and make their decisions together. Apparently, if the collaboration tool is equipped with unified communication capabilities and supported by various types of hardware such as mobile phones and PDAs, it is going to be the ultimate system that everybody will use.
Sunday, March 16, 2008
Note of Ch. 1 for "Beyond Software Architecture"
Chapter 1 [Software Architecture]
"The foundation of a winning solution lies in the architecture that creates and sustains it."
First topic, how to define software architecture
From my own point of view, software architecture, or system architecture, is about the "big picture" of the system. If we just talk about software system, this big picture includes all non-trivial modules (or, functionalities), processes, data, user interfaces (if necessary), and the relationship among these elements.
Apparently software architecture can be described in multiple levels, depending on the level of details required. It is just like observing a picture. When standing from far away, you can just have an approximate view and things become clearer andclearer when you get closer. However, there is always a time to say 'stop' when you find out that, if moving even closer, you will lose some parts of the picture. Fortunately, there are many people that can work on one architecture.
That means, someone will have the overview and someone only needs to read part of the picture.That's the way software team works.
Second topic, what is also important and related to software architecture.
Software architecture is not always a purely technical issue. It is often influenced by 'people' and 'business' around the big picture. For example, when decomposing a big system into sub-s,sometimes it is necessary to consider how easy it is to manage the dependencies among the developing teams, if each team is in charge of one or a few sub-s. If there are sub systems A and B, and there are teams T1 and T2. It is much better to let T1(or, T2) to work only on A and the other team to work on B than to let A to work on parts of T1 and T2 and ask B to finished the rest of T1 and T2.
It is also important to design sub-systems by "putting different things into different groups." The problem is then how many groups we can find and perhaps predefine. It is a matter of experience.
An experienced architect may have the advantage of knowing all the necessary pieces that should be put into the architecture picture.Completeness is an important criteria for the success of an architecture.
Being an architect, it is important to know when and how to 'give in.' Being at an IT organization, one can often feel that it is difficult to agree with your teammates on the architecture picture. What is important then is to make design decisions solely based on the needs from customers and always try to keep an objective understanding of the needs. And, be ready to change the design when the customers change their needs (of course, if we follow CMMI procedures, this means a lot of change management procedures).
Is an architecture good or bad? Normally you cannot have a consistent answer if you change the target people and the time to ask this question. In case you really get an consistent answer,that means, the architecture is indeed a master piece (and I personally do not believe in any existing masterpieces).
Third topic, why software architecture is important
Normally a good architecture is an important key to the long term success of a system or a solution.
Architecture influences success in many ways.
1. Longevity. Normally architecture stays longer than the team that develop it. The time a developer is active on the same system can be 2-4 years. But the time a system can stay is over 10-12years.
2. Stability. A stable architecture means that minimal amount of fundamental re-work is necessary. If a system's functionality is extended by regular release life cycles, all the necessary changes(or, refactoring) and improvement is kept being added to the system. The cost of change will be minimized.
3. Nature of change. In principle, if a change will improve customer satisfaction or bring new features that can attract new customers, it is a good change. It is also a good idea to design at least parts of the architecture in a way that it is able to accommodate changes very easily. Such a design is like the 'plug-in-play' feature of current OS systems.
4. Profitability. The profitability of an architecture is often related more to the business rather than the architecture itself.But for the long run, simple and elegant architecture is the best 'return on investment' choice for any system.
5. Social structure. A good architecture works for the team that creates and maintains it. On the other hand, architecture also influences the structure, relation, and formation of the team.
6. Boundaries Defined. It is very important that people understand and agree on what is the new things. Boundaries questions are inevitable. What is important is that these boundary are thoroughly analyzed, documented, and managed (through a comprehensive configuration management method).
7. Sustainable, Unfair Advantage. A good architecture must have sophisticated, sustainable advantages so that it is very difficult for other competitors to duplicate. Even this is impossible to achieve, the architecture should be designed so that it provides deep advantage for other ways (e.g., many commercial ways) of competing. Normally, usability and performance are the two inevitable features of any new architecture.
8. About replacing an old architecture with a new one. Apparently it is a very difficult management decision to totally replace an old architecture with a brand new one. There are a few things to seriously consider.
i. If you can split half of your current team and do the re-platforming in approx. 1 year, then it make sense to do it. ii. It is very natural that the team on the old architecture must have certain amount of change. Maybe you do not need all the people of the old platform (maybe they are not willingly to learn new things or they are not capable of), but one or two trusted veterans of its development to make certain that the new system is faithfully implementing the functionality of the old. There will be skeletons, and you're going to need these veterans to know where these skeletons are buried. iii. It is of course that the re-build of the architecture should 'be careful' and 'may take longer time than expected.' Economically, a good rule of thumb is that it costs at least 20 percent of the total investment in the old architecture to create a functionally equivalent new one if you are using experienced people, with no interruptions, and a radically improved development process.
Fourth topic, what happens in the beginning of creating a new architecture.
Normally it starts from sketches on white boards or papers. The immature picture is going to be changed by a honest process where the architecture team discusses with all aspects and make necessary changes according to the feedback.
Sometimes people say that, once having an architecture picture,you need to explore the alternatives. This is 'virtually useless.' There are three reasons.
1. You may not have the time or budget to allow you to 'explore the alternatives.'
2. The nature of the problem is already limiting the amount of alternative choices. Sometimes it makes sense to explore the alternatives solutions for a specific detail part of the architecture if the alternatives do have radical differences.
3. Even you have the chances to look at the alternative choices, do you exactly know what is 'good' or 'bad' in all the pictures? It takes years to establish an architecture and only after putting the system running for certain amount of years can you really understand if it is a 'good' choice.Sometimes it is very necessary to kill a project if you can spot serious problems which lead to economical disasters to a system.
Fifth topic, patterns
There are not a few works that seriously document architectural patterns. Sometimes the patterns that decide an architecture is bad is more useful than the ones that tell you exactly what to do.Patterns are not believes, but just what 'they' did before. Anyone can choose to start a new 'pattern' and try to be successful.
Sixth topic, about re-factoring, care and feed of architecture
There are many occasions that, the project team has to deliver the product in a tight schedule so that parts of the architecture is done in an "ugly" way. Then, the team must pre-schedule a "re-factoring" phase to improve the "ugly" implementation after the initial work is delivered.
Another message is that it is very important that the feature and capability of a solution should be managed.
An architecture is somewhat like a garden. You have to keep taking care of it from time to time. Otherwise, it is going to be ugly after a while.
Seventh topic, a few architectural principles
It is a nature for most developers and architects that, when designing the architectures, they try their best to explore all the alternative architecture designs and give best shot after a decent and profound consideration process. The following principles seems to be the best ones from most of the time.
1. Encapsulation. This is a very well-known design. The basic idea is to group different functionalities into different "black boxes" (should I call it components?) so that the whole architecture is just like these boxes plus the communications among the boxes.
2. Interface. It is very important that the interfaces among different components are documented clearly and well.
3. Loose coupling. It means the interconnectedness among different components should be reduced to minimal. This will ease the changes and also improve the architecture if parallelism is used in the architecture design.
4. Appropriate granularity. The architecture design cannot be at the very very detail. When it comes to finer granularity, it means the freedom that the developer can take is more reduced which brings more difficulty to the implementation.
5. High cohesion. It describes the pieces in each component of an architecture. If all the elements are strongly related and contributed to a single task, the component is highly cohesive.
6. Parametrization. This is also related to components. Every component must define clear input and output.
7. Deferral. In many architecture design process, it is due to the business requirement that many design choices must be made ahead of time. This, of course, brings risk to the architecture and the solution. In general, it is always a good choice to try best to defer such choice-making process until you are very well prepared.
Eighth topic, how to let people understand the big picture
Apparently, not all the people in a development team can easily understand the picture you draw. This is because they have different roles, backgrounds, and knowledge. The best way is that you prepare different pictures for different people to understand and you make sure that these pictures are consistent. A very well-known method is the rational "4+1" model. There are four views of an architecture.
i. Logical view. This is more or less a static snapshot of all the components of a system. This view is focused on functionality.
ii. Process view. If the product is just a software, this view is about the concurrent processes. If the product is a kind of combination of software and services, this view includes a lot of business processes and the underlying IT processes that enable the business processes.
iii. Development view. It is a static presentation of organizations of the functionalities of the architecture.
iv. Physical view. This is a more "physical" picture of the architecture (it includes the physical technology maps).
One more thing, the "+1" is all about use cases. You can add as many use cases as necessary to
Ninth topic, about the architecture team
The team is very well connected to the architecture. This type of connection is by nature. One thing to remember is that, when in a large project, the initial team that creates the architecture has the biggest freedom to have choices and the latter teams do not.
"The foundation of a winning solution lies in the architecture that creates and sustains it."
First topic, how to define software architecture
From my own point of view, software architecture, or system architecture, is about the "big picture" of the system. If we just talk about software system, this big picture includes all non-trivial modules (or, functionalities), processes, data, user interfaces (if necessary), and the relationship among these elements.
Apparently software architecture can be described in multiple levels, depending on the level of details required. It is just like observing a picture. When standing from far away, you can just have an approximate view and things become clearer andclearer when you get closer. However, there is always a time to say 'stop' when you find out that, if moving even closer, you will lose some parts of the picture. Fortunately, there are many people that can work on one architecture.
That means, someone will have the overview and someone only needs to read part of the picture.That's the way software team works.
Second topic, what is also important and related to software architecture.
Software architecture is not always a purely technical issue. It is often influenced by 'people' and 'business' around the big picture. For example, when decomposing a big system into sub-s,sometimes it is necessary to consider how easy it is to manage the dependencies among the developing teams, if each team is in charge of one or a few sub-s. If there are sub systems A and B, and there are teams T1 and T2. It is much better to let T1(or, T2) to work only on A and the other team to work on B than to let A to work on parts of T1 and T2 and ask B to finished the rest of T1 and T2.
It is also important to design sub-systems by "putting different things into different groups." The problem is then how many groups we can find and perhaps predefine. It is a matter of experience.
An experienced architect may have the advantage of knowing all the necessary pieces that should be put into the architecture picture.Completeness is an important criteria for the success of an architecture.
Being an architect, it is important to know when and how to 'give in.' Being at an IT organization, one can often feel that it is difficult to agree with your teammates on the architecture picture. What is important then is to make design decisions solely based on the needs from customers and always try to keep an objective understanding of the needs. And, be ready to change the design when the customers change their needs (of course, if we follow CMMI procedures, this means a lot of change management procedures).
Is an architecture good or bad? Normally you cannot have a consistent answer if you change the target people and the time to ask this question. In case you really get an consistent answer,that means, the architecture is indeed a master piece (and I personally do not believe in any existing masterpieces).
Third topic, why software architecture is important
Normally a good architecture is an important key to the long term success of a system or a solution.
Architecture influences success in many ways.
1. Longevity. Normally architecture stays longer than the team that develop it. The time a developer is active on the same system can be 2-4 years. But the time a system can stay is over 10-12years.
2. Stability. A stable architecture means that minimal amount of fundamental re-work is necessary. If a system's functionality is extended by regular release life cycles, all the necessary changes(or, refactoring) and improvement is kept being added to the system. The cost of change will be minimized.
3. Nature of change. In principle, if a change will improve customer satisfaction or bring new features that can attract new customers, it is a good change. It is also a good idea to design at least parts of the architecture in a way that it is able to accommodate changes very easily. Such a design is like the 'plug-in-play' feature of current OS systems.
4. Profitability. The profitability of an architecture is often related more to the business rather than the architecture itself.But for the long run, simple and elegant architecture is the best 'return on investment' choice for any system.
5. Social structure. A good architecture works for the team that creates and maintains it. On the other hand, architecture also influences the structure, relation, and formation of the team.
6. Boundaries Defined. It is very important that people understand and agree on what is the new things. Boundaries questions are inevitable. What is important is that these boundary are thoroughly analyzed, documented, and managed (through a comprehensive configuration management method).
7. Sustainable, Unfair Advantage. A good architecture must have sophisticated, sustainable advantages so that it is very difficult for other competitors to duplicate. Even this is impossible to achieve, the architecture should be designed so that it provides deep advantage for other ways (e.g., many commercial ways) of competing. Normally, usability and performance are the two inevitable features of any new architecture.
8. About replacing an old architecture with a new one. Apparently it is a very difficult management decision to totally replace an old architecture with a brand new one. There are a few things to seriously consider.
i. If you can split half of your current team and do the re-platforming in approx. 1 year, then it make sense to do it. ii. It is very natural that the team on the old architecture must have certain amount of change. Maybe you do not need all the people of the old platform (maybe they are not willingly to learn new things or they are not capable of), but one or two trusted veterans of its development to make certain that the new system is faithfully implementing the functionality of the old. There will be skeletons, and you're going to need these veterans to know where these skeletons are buried. iii. It is of course that the re-build of the architecture should 'be careful' and 'may take longer time than expected.' Economically, a good rule of thumb is that it costs at least 20 percent of the total investment in the old architecture to create a functionally equivalent new one if you are using experienced people, with no interruptions, and a radically improved development process.
Fourth topic, what happens in the beginning of creating a new architecture.
Normally it starts from sketches on white boards or papers. The immature picture is going to be changed by a honest process where the architecture team discusses with all aspects and make necessary changes according to the feedback.
Sometimes people say that, once having an architecture picture,you need to explore the alternatives. This is 'virtually useless.' There are three reasons.
1. You may not have the time or budget to allow you to 'explore the alternatives.'
2. The nature of the problem is already limiting the amount of alternative choices. Sometimes it makes sense to explore the alternatives solutions for a specific detail part of the architecture if the alternatives do have radical differences.
3. Even you have the chances to look at the alternative choices, do you exactly know what is 'good' or 'bad' in all the pictures? It takes years to establish an architecture and only after putting the system running for certain amount of years can you really understand if it is a 'good' choice.Sometimes it is very necessary to kill a project if you can spot serious problems which lead to economical disasters to a system.
Fifth topic, patterns
There are not a few works that seriously document architectural patterns. Sometimes the patterns that decide an architecture is bad is more useful than the ones that tell you exactly what to do.Patterns are not believes, but just what 'they' did before. Anyone can choose to start a new 'pattern' and try to be successful.
Sixth topic, about re-factoring, care and feed of architecture
There are many occasions that, the project team has to deliver the product in a tight schedule so that parts of the architecture is done in an "ugly" way. Then, the team must pre-schedule a "re-factoring" phase to improve the "ugly" implementation after the initial work is delivered.
Another message is that it is very important that the feature and capability of a solution should be managed.
An architecture is somewhat like a garden. You have to keep taking care of it from time to time. Otherwise, it is going to be ugly after a while.
Seventh topic, a few architectural principles
It is a nature for most developers and architects that, when designing the architectures, they try their best to explore all the alternative architecture designs and give best shot after a decent and profound consideration process. The following principles seems to be the best ones from most of the time.
1. Encapsulation. This is a very well-known design. The basic idea is to group different functionalities into different "black boxes" (should I call it components?) so that the whole architecture is just like these boxes plus the communications among the boxes.
2. Interface. It is very important that the interfaces among different components are documented clearly and well.
3. Loose coupling. It means the interconnectedness among different components should be reduced to minimal. This will ease the changes and also improve the architecture if parallelism is used in the architecture design.
4. Appropriate granularity. The architecture design cannot be at the very very detail. When it comes to finer granularity, it means the freedom that the developer can take is more reduced which brings more difficulty to the implementation.
5. High cohesion. It describes the pieces in each component of an architecture. If all the elements are strongly related and contributed to a single task, the component is highly cohesive.
6. Parametrization. This is also related to components. Every component must define clear input and output.
7. Deferral. In many architecture design process, it is due to the business requirement that many design choices must be made ahead of time. This, of course, brings risk to the architecture and the solution. In general, it is always a good choice to try best to defer such choice-making process until you are very well prepared.
Eighth topic, how to let people understand the big picture
Apparently, not all the people in a development team can easily understand the picture you draw. This is because they have different roles, backgrounds, and knowledge. The best way is that you prepare different pictures for different people to understand and you make sure that these pictures are consistent. A very well-known method is the rational "4+1" model. There are four views of an architecture.
i. Logical view. This is more or less a static snapshot of all the components of a system. This view is focused on functionality.
ii. Process view. If the product is just a software, this view is about the concurrent processes. If the product is a kind of combination of software and services, this view includes a lot of business processes and the underlying IT processes that enable the business processes.
iii. Development view. It is a static presentation of organizations of the functionalities of the architecture.
iv. Physical view. This is a more "physical" picture of the architecture (it includes the physical technology maps).
One more thing, the "+1" is all about use cases. You can add as many use cases as necessary to
Ninth topic, about the architecture team
The team is very well connected to the architecture. This type of connection is by nature. One thing to remember is that, when in a large project, the initial team that creates the architecture has the biggest freedom to have choices and the latter teams do not.
Notes for reading "Beyond Software Architecture"
This is a wonderful book by Luke Hohmann.
I read some chapters ayear ago and used them as foundations for several of my lecturesin Aalborg university. Having been in industry for 6 months, Ijust found out that it is very beneficial to read the book againand write down notes based on my past and current experiences.
So,let us just start.
I read some chapters ayear ago and used them as foundations for several of my lecturesin Aalborg university. Having been in industry for 6 months, Ijust found out that it is very beneficial to read the book againand write down notes based on my past and current experiences.
So,let us just start.
Subscribe to:
Posts (Atom)