BI, SOA, and EDA

InfoQ recently posted an article on Business Intelligence (BI) and SOA and it made some good points in the later half of the article. Hopefully readers get there, however, because the initial discussion around ETL and getting data originally gave me the impression that the article wasn’t going to get to the really important factors on the relationship between SOA and BI. It began the discussion by stating:

The road to BI usually starts with extract transform and laod (ETL).

I don’t have any disagreement with that statement, however, the next part of the article goes on to talk about the potential problems introduced by SOA to ETL. In essence, it states that because SOA is all about isolating internal data behind interfaces, it becomes problematic for BI systems rooted in ETL because of the need to intimately understand the data. The article seems to suggest that companies may try to use SOA to pull data into the data warehouse rather than an ETL approach. This would be a mistake, in my opinion. If the existing ETL processes work just fine, why change it? Typically service usage is more about transactional processing and not about the bulk data movement associated with ETL.

Where the discussion gets interesting is when we start looking at the future of BI. There are two areas for improvement. The first is in the timeliness of the information. BI processing that is dependent on ETL processing is typically going to happen as part of some scheduled job that occurs each night. The second is in the information itself.

To address the first area, a move toward an event-driven architecture for pushing information into the business intelligence system is necessary. The article does a very good job in addressing this, and even correctly calls out that advancing technologies in event stream processing such as CEP (complex event processing) will play a key role in this.

The second area is not as clearly addressed in the article. I am by no means an ETL and BI expert, but my limited experience with it indicated that it was largely based on the end results of processing. That is, for any given processing chain, you typically have nodes that represent and endpoint where that particular processing chain ends. Users typically begin the chain, and some resource (relational database, content management system, file system) typically ends the chain, if there’s any permanent record associated with the processing. Graphically, it looks like this:


Today, BI and its associated ETL jobs may focus only on relational databases that make a permanent record of a transaction. That’s only one resource in the chain. What about all of the intermediate steps along the way? Think of what Amazon does by not only looking at your purchase history but also in looking at the things you (and others) browse for, independent of whether or not you actually purchase it. Arguably, this represents a far better picture of your interests that your orders. This is where I feel SOA can really provide benefits. Adopting SOA should result in more individual components for any given solution, but by accessing those components in a standards-based manner, we’re still able to manage the associated increase in complexity. If we have more components, we have greater visibility into the information flowing in and out. Employ some standard service management technology, and you can easily extract the message flows in real time and centrally store them for extraction into your BI environment. Or, if you’re ready to do it all in real time, those same tools can simply publish those service messages out to an event bus for real-time extraction into your BI environment. This is where I think the potential lies. It’s not about getting at the same old information in an easier way, it’s about getting new information that can yield better intelligence. Consistency of service schemas is important to SOA, and if done right, the incorporation of this wealth of information sources into your BI system shouldn’t become a complex nightmare. This is yet another example of how we need to think outside the box, and realize that information can always be used in novel ways beyond the immediate consumer-provider interaction that is driven by the project at hand.

One Response to “BI, SOA, and EDA”

  • […] I’ve previously posted on the integration between SOA, BPM, Workflow, and EDA, or probably better stated, services, processes, and events. There are people who will argue that EDA is simply part of SOA, I’m not one of them, but that’s not a debate I’m looking to have here. It’s hard to argue that there are natural connections between services, processes, and events. I just recently posted on BI and SOA. So, it’s time to try to bring all of these together. Let’s start with a picture: […]

Leave a Reply


This blog represents my own personal views, and not those of my employer or any third party. Any use of the material in articles, whitepapers, blogs, etc. must be attributed to me alone without any reference to my employer. Use of my employers name is NOT authorized.