Quantcast
Channel: General Networking
Viewing all articles
Browse latest Browse all 27527

A song of server room ice and fire: How a day of mop duty led me to the cloud

$
0
0

This is the 232nd article in the Spotlight on IT series. If you'd be interested in writing an article on the subject of backup, security, storage, virtualization, mobile, networking, wireless, cloud and SaaS, or MSPs for the series PM Eric to get started.

I remember the days before “the cloud.” Most of us probably do, but I also remember the very specific day when I went from “there is no way I will give over control” to “so I am gonna send this data offsite—how can I keep from losing control?”

It was about 10 years ago when a unique onsite experience prompted me to change my tune on moving to the cloud.

My IT team had a killer onsite data center, which meant speedy and reliable data access for local folks. Remote people used VPN, and it wasn’t so bad, typically. We had concentrators in the U.S. on the east and west coasts, and a combination of leased lines and MPLS to tie our sites together so they could make it back to the data center in HQ. It wasn’t perfect, but it worked pretty reliably, as long as my guy was on top of things, and our redundant routes were performing as planned.

However, our data costs began to grow right along with the company. This was especially apparent with the small engineering office we had recently added halfway across the country. Even though there were only 10 people in the new office, the data flowing between that office and the HQ data center added up quickly. We hit our data limit, and started seeing performance degradation and helpdesk tickets. We switched to bonded, burstable lines, but the costs were high because we kept bursting. Adding more employee Internet traffic from HQ wasn’t helping the situation either. But, I had a network that worked and enough budget to try to help the remote performance issues. I was lucky, and I was satisfied.

At this time, more and more folks in my circle—and in “the industry”—talked about the costs of AC, power and space. They started to extoll the virtues of virtualization and offsite hosting. I wasn’t swayed: I had a sweet onsite situation.

See, the previous tenants of our building saw fit to add 4,000 sq. ft. of dedicated server space, insane amounts of power, a very large set of backup batteries that were rated to keep the data center up for 15 minutes in the event of a total power loss, and an onsite medium-duty diesel generator that was tested and topped off every 60 days. No joke. I assume the previous company was planning to work straight through the zombie apocalypse. Fortunately, they went out of business before that happened, and we got to move in on the cheap.

We had it pretty good—until the unthinkable happened.

At about 5:30 in the morning, the pagers went off. It wasn’t the end of the world, just a disk in one of the big disk shelves that had gone bad. Now, our monitoring was a little overzealous in this case—it was a RAID 5 shelf and only one disk was bad—but it was still worth swapping out as soon as reasonable. Our sys admin was on the job, and all was fine.

About 30 minutes later, we got another disk failure in a server: same row, different rack. This time it was more urgent—this was a mail server. The disk was still mirrored but having an outage here would mean that mail would be down until we restored from onsite backup.

Again, sys admin to the rescue, but two disk failures in two separate machines within one hour meant something was up. Dust storm in the datacenter? Secret testing of a new electromagnetic pulse weapon? Air cover from ground-based ion cannons so the troop transports could leave Hoth? There was definitely a tremor in the Force, here. I got in the car and headed in to the office so I could be ready for whatever was in store.

By the time I arrived, the pager was on meltdown and the source was clear. Dozens of servers were reporting overheat conditions, well into the low-90 degree range. One of the giant AC units had failed, and the machines in that side of the data center were heating up quickly.

My first thought, of course, was to solve the problem. My second thought was to send a notification to prepare employees for potential outages. Then there were no thoughts, just a lot of frantic phone calls to AC repair specialists. When the AC guy was on the way, I started thinking, “When did I get in the business of AC maintenance?”

I had all kinds of backup for my backup, but apparently three out of four AC units were not enough to keep all our whirring, spinning, glowing, blowing data center cool. I had ceased being an IT guy and started being a facilities guy in the course of an hour. And, I don’t know a lot about facilities.

Back in the data center, we had opened the doors, popped a couple a fans to blow out the hot air and seemed to be keeping temps hovering around 85 degrees. Not great, but we could last out the day. The worst was behind us.

Except it so totally wasn’t.

The entire inside of the AC unit—roughly 50 cubic feet—had frozen solid. I have never seen anything like it. Ice had grown to fill the chassis, completely engulfing the innards in winter wonderland. It was amazing, a box of AC-unit-sized ice hanging off the wall/ceiling of our datacenter. At the end of the row that housed the mail, files and Oracle database servers and disk.

We now knew the problem—a Freon leak led to a frozen unit—but worse was the solution: chip it out, melt it out, or remove the whole unit (as it melted itself everywhere). It was all hands on deck—the world’s dumbest deck. As I pushed a mop back and forth, trying to contain the dripping mess from running down my data center, I began for the first time to give real thought to cloud hosting.

It wasn’t just that mopping felt like a poor use of my time, though honestly, that was a very strong influence right then and there. I started to think back to my network costs. All of the in and outbound traffic from HQ added up to a big pipe, and of course, I had two of them for redundancy. As the traffic from HQ to the Internet grew, so too did traffic to HQ, since we were hiring more remote employees and adding other offices that needed access to our servers.

Of course we were used to direct access and directly managed networking, but when things went sideways, us in IT were the ones that ended up leaving the helpdesk (and our jobs) unmanned to push the proverbial, and literal, mops back and forth in the data center. Bad times.

We ended up consolidating and hosting lots of services in the cloud—mail went first, then storage. Salesforce replaced our onsite CRM soon after, but Oracle hung out in HQ until after I left. It was a balance, but one that kept availability and reliability at the top of the list. And pushed mopping closer to the bottom…


Viewing all articles
Browse latest Browse all 27527

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>