Friday, January 16, 2009

Extremely Poor ROI

This week I took a Project Management class at work the same days I was watching Battlestar Galactica. It was an odd juxtaposition that eventually demanded my attention until I wrote this, a "project evaluation" for Battlestar Galactica. It's long, and extremely dull if you haven't (and probably even if you have) seen the new series' premier. But here it is.

Colonial Fleet Modernization Upgrade – Project Post-Mortem
Major Richard Jolsen, QC Auditor, PMP

Overview: This project was intended to network all fleet assets together and upgrade their navigation systems, making it easier to share information and make navigation safer. On the surface, this project appears to have been a success. The vast majority of the fleet was networked and upgraded on time and on budget. That this project ultimately resulted in the destruction of nearly all of humanity suggests that the wrong metrics were used in gauging project success. This report will attempt to address what likely went wrong in each stage of the project in order to uncover “lessons learned” for future endeavors.

Stakeholder Identification: At first glance there were no defects at this stage. The appropriate fleet sponsors and stakeholders were identified. On further analysis, however, two important stakeholders were excluded in this phase: Humanity, and fleet personnel in general. Identifying these two important stakeholder groups and their need for security may have helped focus the project in directions that would have resulted in dramatically different results.

Scope Definition: The first specific project defects were introduced during this activity. For a military project it seems inconceivable that the Business Vision excluded the goal “Make Humanity Safer”. Indeed, this project seems to have suffered from the far-too-common problem of “Geek Technology Attraction”, more clearly defined as the need for the best and fanciest technology for the sake of technology. Networking the fleet and upgrading the navigation software in hindsight seem not to support the goal of keeping humanity safe at best, and run completely counter to that goal at worst.

The entire reason for the fleet’s existence is to protect humanity. The primary goal is to protect them from the Cylons. The secondary goal is protecting interplanetary commerce from the hazards of space travel. While the Colonial Fleet Modernization Upgrade (CFMU) Project would render non-trivial benefits toward the secondary goal, it runs distinctly contrary to success in the primary goal, based on known Cylon strategies and tactics.

Therefore it can only be assumed that while the fleet’s formal business vision has been to protect humanity from both Cylons and interstellar hazards, the long silence from the Cylons has resulted in an unofficial and unannounced alteration of that business vision. It’s as if the Cylons were dropped from the list of potential threats from which humanity must be protected, even while the Colonial Fleet continued to train first and foremost for combat against known Cylon weapons and tactics with only lesser preparation for rescue operations.

This defect in vision cascaded throughout the project, resulting in insufficient Business Goals and Objectives, inappropriate identification of Business Needs/Pain, and hence potentially harmful Project Objectives and Deliverables. In short, when someone suggested we needed to upgrade the fleet, no one asked the critical question of “Why?”

One possible explanation for this indefensible error is the recent discovery of Cylon infiltrator and sleeper units, who look and act like human beings and have been able to acquire significant positions within our society and fleet. That a Cylon agent reached a position high enough in the fleet to influence and authorize this otherwise-inexplicable deviation in Business Vision would explain much.

Risk Assessment: To the project’s credit, “The Cylons return and declare war” was identified as a project risk. It was defined as a Low-Probability, High-Impact risk. While the probability rating was, in hindsight, set too low, what is really inexcusable is the decision to simply accept the risk. Considering the original vision of the fleet—to defend against the Cylons—and the fact that it is the traditional role of the military to prepare for the unexpected, this risk should have been mitigated.

Similarly the risks “Network attack” and “Catastrophic system failure” were identified, classified as Low-Probability/High-Impact, and simply accepted. This is also inexcusable. Even ignoring the Cylon threat and their known tactics of attacking systems and networks as a possibility, hackers and mal-ware developers are a known threat in both the civilian and military world. While it has been determined that the Cylons had inside help in attacking the networks, the fact that we had no manual overrides available on our vessels is simply inexcusable. To make our fleet assets completely reliant on networked computer systems and leave them no way to operate should those systems fail is astonishingly negligent.

Risks that were not identified were “Lack of security surrounding project deliverables”, and “Outsourcing of critical systems”. That outside experts were brought into the project is understandable. That they were allowed to develop the entire system outside of the military security structure is astonishing. Gaius Baltar had security clearance, but he was allowed to bring outside consultants onto the project without clearing it with anyone else. This was not a fast-food ordering system we were building. This was a critical component in our entire defense establishment. There are fast-food companies with tighter project security.

Estimating & Scheduling, Tracking, and Communications: Once the project reached this stage everything progressed quite smoothly. The catastrophic defects were already in place and were simply allowed to propagate as the project executed on schedule and on budget. There is little of note here.

Quality: It would be stating the obvious at this point to suggest that the quality in this project was not achieved. There are not adjectives to describe the level of failure this project represents. Here too there were serious oversights that might have otherwise resulted in much different outcomes than we currently experience.

The first error was in code review. No one reviewed Gaius Baltar’s code. If they did they did not ask him to explain the unidentified sections that proved to be the “back door” the Cylons used to cripple our fleet. While Mr. Baltar is arguably a genius, he would have been using known coding languages, and our people should have been able to figure out everything his code would do. If not, it would have been within the rights of our contract to make him explain it, as our IT team would be responsible for supporting his code. The fact that Gaius Baltar either did not notice this code himself or did not bring it to our attention at best makes him unreliable and a security risk for the future, and at worst makes him a willful collaborator and a traitor.

The other error was in reviewing the system design. Networks and navigation software are limited systems and should have been logically and perhaps physically separated from other systems. Weapons and Propulsion systems, for example, should have no need of navigation inputs and should not need to communicate with other vessels. That the navigation system was designed to interface with every other system, both critical and non, should have been easy to notice and questioned immediately. It’s almost as if Baltar were told “Go and create us a system. Let us know when you think it’s ready and we’ll implement it sight unseen.” Best practices mandate that developers not test their own code and that each deliverable in a project undergo some level of QA and QC evaluation. These seem not to have been done.

Implementation: The happiest of failures in this project occurred in implementation, proving that occasionally two wrongs do make a right. Commander Adama of the Battlestar Galactica failed to implement both systems. Citing the fact that his vessel was due to be decommissioned soon, and through dogged stubbornness, he avoided compliance. Under normal circumstances this would be considered a failure. The project objective was 100% compliance, not 99.99%, after all.

Commander Adama has a long history of avoiding upgrades to his vessel. The latest of the late adopters, he should have been specifically identified as an implementation risk and specifically ordered to install the upgrades. He was not, and he did not. In this case the “project saboteur” was correct, and only through his intense opposition can this project be considered even a minute fraction short of a complete and utter failure. Indeed, it is because of Commander Adama that this post-mortem is even possible.

Recommendations: As indicated above, the chief defect in this project was a faulty Business Vision. Had we not lost sight of that vision many errors in this project would have been avoided before they were even a possibility. The Colonial Fleet simply forgot who they were and what their job was. This simple inattention to detail and in-adherence to mission resulted in a cascading failure of unprecedented magnitude.

Even so, this disaster may have been avoided had proper Quality Assurance practices been observed. The defects in the deliverables were in the open for anyone to see. No one looked. This cannot happen again. Any future projects simply must follow good Quality standards. We must test all deliverables as if our lives depended on them, for indeed they do.

Opportunities: With the notable exception of Gaius Baltar, the entire project team is dead. It is a regrettable outcome, but suitably just. We have the opportunity to build a project management culture from the ground up. Those who will participate in future projects will be painfully aware of the cost of failure for a good, long time. The motivation to “get things right” will be very high. On the other hand, we should also make every effort to ensure that no Cylon agents find their way into positions to define or implement the new and improve methodology we need.

No comments: