Archive for March, 2007

Teaming Up for SOA

I recently “teamed up” with Phil Windley. He interviewed me for his latest InfoWorld article on SOA Governance which is now available online, and is in the March 5th print issue. Give it a read and let me know what you think. I think Phil did a great job in articulating a lot of the governance challenges that organizations run into. Of the areas where I was quoted, the one that I think is a significant culture change is the funding challenge. It’s not just about getting funding for shared services which is a challenge on its own. It’s also a challenge of changing the way that organizations make decisions to include architectural elements in the decision. Many organizations that I have dealt with all tend to be schedule driven. That is, the least flexible element of the project is schedule. Conversely, the thing that always gives is scope. Unfortunately, it’s not usually visible scope, it’s usually the difference in taking the quickest path (tactical) versus the best path (strategic). If you’re one of many organizations trying to do grass roots SOA, this type of IT governance makes life very difficult as the culture rewards schedule success, not architectural success. It’s a big culture shift. Does your Chief Architect have a seat at the IT Governance table?

Anyway, I hope you enjoy the article. Feel free to post your questions here, and I’d be happy to followup.

IT in homes, schools

I’ve had some lightweight posts on SOA for the home in the past, and for whatever reason, it seems to be tied to listening to IT Conversations. Well, it’s happened again. In Phil and Scott’s discussion with Jon Udell, they lamented the problems of computers in the home. Phil discussed the issues he’s encountered with replacing servers in his house and moving from 32-bit to 64-bit servers (nearly everything had to be rebuilt, he indicated that he would have been better off sticking with 32-bit servers). Jon and Phil both discussed some of the challenges that they’ve had in helping various relatives with technology.

It was a great conversation and made me think of a recent email exchange concerning my father-in-law’s school. He’s a grade school principal, and I built their web site for them several years ago. They host it themselves, and the computer teacher has done a great job in keeping it humming along. That being said, there’s still room for improvement. Many of the teachers still host their pages externally. My father-in-law sends a letter home with the kids each week that is a number of short paragraphs and items that have occurred throughout the week. Boy, that could easily be syndicated as a blog. Of course, that would require installing WordPress on the server, which while relatively easy for me, is something that could get quite frustrating for someone not used to operating at the command line. Anyway, the email conversation was about upgrading the server. One of the topics that came up was hosting email ourselves. Now, while it’s very easy to set up a mail server, the real concern here comes up with reliability. People aren’t going to be happy if they can’t get to their email. Even if we just look at the website, as it increasingly becomes part of the way the school communicates with the community, it starts to become critical.

When I was working in an enterprise, redundancy was the norm. We had load balancers and failover capabilities. How many people have a hardware load balancer at home? I don’t. You may have a linux box that does this, but it’s still a single point of failure. A search at Amazon really didn’t turn up too many options for the consumer, or even a cash-strapped school for that matter. This really brings up something that will become an increasing concern as we march toward a day where connectivity is ubiquitous. Vendors are talking about the home server, but when corporations have entire staffs dedicated to keeping those same technologies running, how on earth are we going to expect Mom and Pop in Smalltown U.S.A. to be able to handle the problems that will occur?

Think about this. Today, I would argue that most households still have normal phones and answering machines. Why don’t we have the email equivalent? Wouldn’t it be great if I could purchase a $100 device that I just into my network and now have my own email server? Yes, it would be okay if I had to call my Internet provider and say, “please associate this with biske.com” just as I must do when I establish a phone line. What do I do, however, if that device breaks? What if it gets hacked and becomes a zombie device contributing to the dearth of spam on the Internet? How about a device that enables me to share videos and pictures with friends and family? Again, while hosted solutions are nice, it would be far more convenient to merely pull them off the camcorder and digital camera and make it happen. I fully believe that the right thing is to always have a mix of options. Some people will be fine with hosted solutions. Some people will want the control and power of being able to do it themselves, and there’s a marketplace for both. I get tired of these articles that say things like “hosted productivity apps will end the dominance of Microsoft Office.” Phooey. It won’t. It will evolve to somewhere in the middle, rather than one side or the other. Conversations like that are always like a pendulum, and the pendulum always swings back. I’m off on a tangent, here. Back to the topic- we are going to need to make improvements in orders of magnitude on the management of systems today. Listen to the podcast, and here the things that Jon and Phil, two leading technologists that are certainly capable of solving most any problem, lament. Phil gives the example of calls from his wife (I get them as well) that “this thing is broken.” While he immediately understands that there must be a way to fix it, because we understand the way computers operate behind the scenes, the average joe does not. We’ve got a long way to go to get the ubiquity that we hope to achieve.

Starter SOA

Jeff Schneider has posted a series of entries on “Starter SOA” on his blog. The first deals with what he “believes is at the heart of the SOA issue.” It recommends attacking three specific areas in getting started: portfolio management, enterprise architecture, and information management. I think this is right on, for very straightforward reasons. First, portfolio management deals with what services should be created. If you don’t make any changes to this discipline, you’re simply going to get the same solutions you always have, except with some services thrown in. That’s not SOA. Secondly, enterprise architecture is the technical counterpart to the portfolio management side. While portfolio management is concerned about the business aspects of shared services, enterprise architecture needs to be concerned about the technical aspects of shared services. Finally, information management is the source of consistency across our services. If every service team defines its own service schemas, we really haven’t made things much better, as additional effort must now be made to mediate between the information models of every consumer and every service that must talk to each other. Get two or more services and consumers involved, and it simply increases in complexity.

In the next entry, Jeff discusses the fact that SOA will challenge the organizational structure of SOA. How are organizations supposed to address these challenges? He suggests forming an SOA Steering Committee. The committee consists of a cross-discipline team of people who are normally thinking in enterprise terms, rather than project-specific. Importantly, however, he emphasizes that this committee must interact with their project-specific counterparts. That is, the enterprise architect works with application architects. The portfolio analyst works with the project analyst. The PMO rep works with the project manager, and so on. An important aspect of this group is that they can make enterprise decisions as things progress with SOA. An enterprise architect trying to drive SOA on his or her own isn’t left trying to find an open ear when they determine that organizational change is needed, or that a project should be split into multiple projects.

In part 3 (I don’t know if he has more parts planned!), he gets into a more sensitive and difficult area: money. The most important thing that he introduces here is the simple notion that the funding model has to change. Where funding was previously all about getting the “application” completed, we now need models that fund shared items- shared services, shared infrastructure. This shouldn’t be new to organizations, as shared infrastructure is certainly something that they should be dealing with today, this now just extends it into the application development domain.

It’s good to get back to the basics every now and then. Those of us that are out there commenting on this on a regular basis can get into modes where the only other people who care about what we’re saying are other commentators, and not everyone is at that point.

Metrics, metrics, metrics

James McGovern threw me a bone in a recent post, and I’m more than happy to take it. In his post, “Why Enterprise Architects need to noodle metrics…” he asks:

Hopefully bloggers such as Robert McIlree, Scott Mark, Todd Biske and others would be willing to share not only successes within their own enterprise when it comes to metrics but also any unintended consequences in terms of collecting them.

I’m a big, big fan of instrumentation. One of the projects that I’m most proud of was when we built a custom application dashboard using JMX infrastructure (when JMX was in its infancy) for a pretty large web-based system. The people that used it really enjoyed the insight it gave them into the run-time operations of the system. I personally didn’t get to use it, as I was rolled onto another project, but the operations staff loved it. Interesting, my first example of metrics being useful comes from that project, but not from the run time management. It came from our automated build system. At the time, we had an independent contractor who was acting as a project management / technical architecture mentor. He would routinely visit the web page for the build management system and record the number of changed files for each build. This was a metric that the system captured for us, but no one paid much attention to it. He started posting graphs showing the number of changed files over time, and how we had spikes before every planned iteration release. He let us know that those spikes disappeared, we weren’t going live. Regardless of the number of defects logged, the significant amount of change before a release was a red flag for risk. This message did two things: first, it kept people from working to a date, and got them to just focus on doing their work at an appropriate pace. Secondly, I do think it helped up release a more stable product. Fewer changes meant more time for integration testing within the iteration.

The second area where metrics have come into play was the initial use of Web Services. I had response time metrics on every single web service request in the system. This became valuable for many reasons. First, because the thing collecting the new metrics was new infrastructure, everyone wanted to blame it when something went wrong. The metrics it collected easily showed that it wasn’t the source of any problem, and actually was a great tool in narrowing where possible problems were. The frustration switched more to the systems that didn’t have these metrics available because they were big, black boxes. Secondly, we caught some rogue systems. A service that typically had 200,000 requests per day showed up on Monday with over 3 million. It turns out a debugging tool had been written by a project team, but that tool itself had a bug and started flooding the system with requests. Nothing broke, but had we not had these metrics and someone looking at them, it eventually would have caused problems. This could have went undetected for weeks. Third, we saw trends. I looked for anything that was out of the norm, regardless of whether any user complained or any failures occurred. When the response time for a service had doubled over the course of two weeks, I asked questions because that shouldn’t happen. This exposed a memory leak that was fixed. When loads that had been stable for months started going up consistently for two weeks, I asked questions. A new marketing effort had been announced, resulting in increased activity for one service consumer. This marketing activity would have eventually resulted in loads that could have caused problems a couple months down the road, but we detected it early. An unintended consequence was a service that showed a 95% failure rate, yet no one was complaining. It turns out a SOAP fault was being used for a non-exceptional situation at the request of the consumer. The consuming app handled it fine, but the data said otherwise. Again, no problems in the system, but it did expose incorrect use of SOAP.

While these metrics may not all be pertinent to the EA, you really only know by looking at them. I’d much rather have an environment where metrics are universally available and the individuals can tailor the reporting and views to information they find pertinent. Humans are good at drawing correlations and detecting anomalies, but you need the data to do so. The collection of these metrics did not have any impact on the overall performance of the system, however, they were architected to ensure that. Metric collection should be performed as an out-of-band operation. As far the practice of EA is concerned, one metric that I’ve seen recommended is watching policy adherence and exception requests. If your rate of exception requests is not going down, you’re probably sitting off in an ivory tower somewhere. Exceptions requests shouldn’t be at zero, either, however, because then no one is pushing the envelope. Strategic change shouldn’t solely come from EA as sometimes the people in the trenches have more visibility into niche areas for improvement. Policy adherence is also needed to determine what policies are important. If there are policies out there that never even come up in a solution review, are they even needed?

The biggest risk I see with extensive instrumentation is not resource consumption. Architecting an instrumentation solution is not terribly difficult. The real risk is in not provided good analytics and reporting capabilities. It’s great to have the data, but if someone has to perform extracts to Excel or write their own SQL and graphing utilities, they can waste a lot of time that should be spent on other things. While access to the raw data lets you do any kind of analysis that you’d like, it can be a time-consuming exercise. It only gets worse when you show it to someone else, and they ask whether you can add this or that.

Ads

Disclaimer
This blog represents my own personal views, and not those of my employer or any third party. Any use of the material in articles, whitepapers, blogs, etc. must be attributed to me alone without any reference to my employer. Use of my employers name is NOT authorized.