Friday, July 10, 2009

The unpredictable future and buying hardware...

At my previous clients site, they have some old Sun servers that they are upgrading to M5000's. The hardware they evaluated (and they did very rigorous testing of the hardware) was a T5240, T5440 and M5000. The M5000 was configured with 4 processors not the full 8 that are possible, which is the subject of my post. When choosing the M-series server they decided to go with the M5000 because it would have 2 free slots allowing them to add memory or processors later. So they are trying to protect themselves against a CPU utilization problem down the road by having slots to put additional capacity in. I've been down this road a few times myself. I bought the 880s and 890s with 4 procs just in case we needed the other 4 down the road. Unfortunately most of the time I never needed those slots. and wasted the rack space, power and cooling. In my current clients case they should probably go with the M4000 instead.


On a list price comparison over five years you get:


M4000 $66,380 M5000 $81,880


Maintenance (numbers are swags for platinum pre-paid for 3 years):


M4000 $17,000 M5000 $22,000


I'm typically a Veritas user, so that adds complexity. Last time I looked (a year and a half ago) the M4000 was a tier E and the M5000 was a tier H. So SF for oracle for both would be:


M4000 $4,000 M5000 $9,000


Veritas Maintenance would be somewhere around (swag, 3 years):


M4000 $2,400 M5000 $2,700


The rest a bit of a wash, and brings us to:


M4000 $89,760 M5000 $115,580.


So the M5000 which has two advantages: 4 internal drives (no value unless you're partitioning) and 2 expansion slots (potential future value) has a 28% greater price premium over the M400. The reason engineers like myself make choices like this the unpredictable future. Often times when I'm asked to spec out hardware for an application I'm given initial requirements like 12,000 total users, 300 users concurrently. And if I'm lucky some information about the resource utilization associated with each user session. Most the time it's a shot in the dark however, and have to dig around for similar usage profiles via google and try to work that into my sizing model. But that's relatively straight forward. There's some art and finesse to it but at the end of the day it usually comes down to a derivative formula of X sessions * Y-Mb-per-Session + overhead + wiggle-room = ZGb of memory. Same kind of thing for CPU and I/O. Where it gets hard is when you have to forecast the life of the machine. You're forced to try and pick a machine that will meet the needs of not only year one but years two through four or five as well. When we ask the customer what their growth rate is they'll usually shrug and give a non-answer. Or they'll give you an answer that's based directly on other non-knowable facts like "our user base will increase at the same percentage as our market share". Great. Thanks for that. It's very tempting to just go out and buy the top of the line server to ensure we never get a resource problem. Buy a tour bus when all we need is a passenger van. But when they see the sticker price of that tour bus, we're usually back to the drawing board. That's what makes machines like the M5000 or the 890 it replaced so appealing. It has room for an extra row of seats in case the number of passengers we need increases drastically. Unfortunately you have to pay extra fuel costs to hall that extra space around (maintenances) and there's the up-front acquisition costs as well.


It's all about going back to the well. The reason we over-build or infrastructure this way is because of the difficulty in going back to the well for additional funding. In my work in the non-profit space there's a real risk of that well being dry as well. For example, I could buy the M4000 and then if I have problems in year 2 or 3 I would then do a forklift upgrade to the M5000 (swap the boot disks and away I go). Easy stuff, except I have to actually buy that M5000. Which comes with lots of questions: Why didn't you buy an M5000 from the start? Why were your forecasts wrong? Where do you think we can come up with that kind of money? Collective amnesia will shift all the blame to the people who spec'ed out the system. Blame rolls down hill. It picks up mass and speed as it rolls and engineers are usually at the bottom of the hill with the operations folks (often one and the same). So we buy machines that have that 'extra reserve' built in. In high end servers you can usually turn on the additional capacity by purchasing a license key. But in the mid-to-low end range we're only offered machines with expandability. So if our forecast is off or the conditions change, we're able to bring a lower incremental cost to the table to gain additional performance and capacity. Unfortunately for me however, I have rarely needed that expanded capacity. I can only remember two examples one success, were we added two boards (4 CPUs + memory) to an 890 and one case where they didn't make the versions of the board we currently had in the server which meant we would have had to replace all the boards which would have cost almost as much as replacing the server outright.


I used to be a 'keep something in reserve' kind of engineer. Be able to put the rabbit out of the hat to meet the increased demand that we didn't know was coming. Basically pull of a Montgomery Scott to save the day. By doing so however, I have enabled the behavior that has gotten me here in the first place. By not purchasing the equipment the requirements suggest and adding some reserve "just in case" the cycle repeats itself. Now I'm not going to purchase the minimum needed to meet the requirements given to me (however flawed they may be), but I am going to start putting the decision back on the requesters and have them make the choice. In writing. With as much concurrence as can be achieved from the project team as a whole. So if I were to travel back in time to the period before the aforementioned M5000s were purchased. I would offer the M4000s instead. I would tell them you save X dollars up front. Your downside risk is you may have to replace this server if your usage or growth models are wrong. And, perhaps most importantly, get documented concurrence from the stake holders.


This has turned into a much longer post than I had originally intended... phew. Now onto my next client/project.


No comments:

Post a Comment