A few years ago, I was speaking at a conference on the west coast—attending were mostly IT executives from startup companies, and it was a lively group. Also, some IT folks were in attendance from some more established companies. Afterwards I sat beside one such chap at a sponsored evening at a nice restaurant. He attended the conference as he wanted to build a new call center on the west coast for a hotel chain that was looking to expand into other regions, and we got talking. As it turns out, he had an interesting mainframe migration story to tell, and after a few drinks he gave me all the juicy details.
It started off by me telling him what my company did: mainframe optimization. He replied, “Where were you guys three years ago?” He proceeded to tell me that they had recently moved off the mainframe: they had a 480 MSU mainframe system that managed their reservations, property management and payroll activities—the reservation activity accounted for about 80% of their MSU usage. Their reservation application was a custom app integrated into their property management system, and they could do some really intensive analysis on patterns within the year, room night rates—all kinds of things.
Their biggest concern was that the cost-per-reservation of their current mainframe was creating a problem for them because of their size. He told me that they spent about $2500 per MIPS, but he suspected that the real cost was actually closer to $2800. Before I go on, I must be fair and state that this was at least a year before IBM came out with the zEnterprise BC12 and their improved pricing metrics. So at that time, he felt that increasing costs were forcing his hand, and that he had no option but to move to Unix boxes to help mitigate his costs. Another concern was that (at that time) mobile reservations were increasing sharply, and they didn’t have a mainframe solution to handle it. Of course, that’s all changed now, but it was an important deciding factor for migration at the time.
Before the migration, they were handling about 1.1M reservations per year, so each reservation cost them about $1.25. When you factor in cancellations and such, it was probably closer to a $1.30. They saw themselves going through growth and really needed to upgrade their mainframe to keep pace. But with the pricing available at the time, coupled with their specific growth pattern, their costs were going to increase to about $1.50 or $1.55 per reservation, just based on the system upgrade.
It is important to note that he is an accomplished IT professional who really appreciated what his mainframe system could do in terms of throughput and reliability, but he also knew that (at that time) you needed to be running at least 2500 MIPS to run it effectively. He also knew that a machine of that capacity would have been serious overkill for their needs, even accounting for their growth projections.
So they made the decision to move to a distributed systems solution from a well-known reservation systems solutions provider, and began the migration process. The vendor’s solution was fairly close to what they needed to do, but not 100%, and did not provide all the custom room-night analysis capabilities that their mainframe application had. The vendor insisted that they could obtain the same data out of their solution, and use a spreadsheet to do the analysis.
When they did the math initially, it looked like the cost-per-reservation would actually go down to about $0.65, and likely even lower than that. The assumption was that they could enter into an agreement with a company that would supply them with servers during peak periods, and the cost would actually decline as they built up the number of room-nights resulting from their expected growth. The idea was that the cost would go down to about $0.50 per room-night at about 2M reservations per year, but because of some clustering issues it would actually jump back up again to about $0.65, and then decline again.
It all sounded pretty exciting and impressive, that is until things started going wrong. The initial phase of the project took two years longer than planned. They spent $160K more with the vendor in professional services to fix “crap that the guys promised us would work anyway,” and they still could not get the reports to do what they were hoping for—and had running on their mainframe before—so for the most part, they had no analytics.
That was just the start. There were several times throughout the project when he was certain that he was going to be fired. In one incident, the new system lost all their reservations for a six-week period during an upgrade and they had to manually go back through reports to rebuild it—a process that took three IT people working 65-hour work-weeks, and taking more than three weeks to complete. But the worst incident occurred during Thanksgiving, an extremely busy time for the hotel chain, as families often book reservations soon after Labour Day for Thanksgiving as they want to plan the next big get together. Their new systems went down at the worst possible time, again losing reservation information, resulting in an unmitigated business disaster. Their results for the Thanksgiving period that year were only 8% of the previous year: the system reported to potential customers that they were full when they were not, so all those reservations were lost. This impacted the corporate bottom line, and as a result 15 people were laid off at their head office—their largest single layoff in company history.
To justify the project, he felt compelled to promise the corporate management team that he could reduce the reservation cost from $1.50 to $0.65, an annual savings of nearly $950K. This was something the vendor had promised him, but the new system failed to deliver. Even so, the story had a reasonably happy ending; he didn’t lose his job, and he eventually solved most of his problems.
Ultimately, this story is about bad planning, perhaps on the part of the hotel tech team, but certainly on the part of the vendor. While it may be true that a hotel chain of this size might be better served by distributed systems (it certainly was true at the time), if the zEnterprise BC12 were available then, along with today’s more favorable pricing, they would still be running on the mainframe today, and would have suffered none of the pain that they endured over the migration process.
At the time, I asked my colleague if I could write about this and he said that based on the issues he was having with the vendor, the serious problems experienced with the migration, and his personal involvement, that he would have to say no, which I have respected (now, years later, I write about it without naming him, his employer, or the vendor).
This is one of the reasons that you rarely read detailed accounts about failed IT projects—particularly mainframe migration projects. And as mainframe migration disaster stories go, this one isn’t really that bad. I’m sure that some of you have heard of the $100M motor vehicle licensing agency disaster, the $50M oil company disaster, the $100M retail chain disaster, and so on, but nobody really wants to put their name to them. And who can really blame them?