jump to navigation

Bummer Day for Production November 17, 2009

Posted by skunkworkscmj in DBMS Systems, Hardware.
Tags: ,
trackback

Our production Greenplum database system consists of 1 master node and 3 segment node servers.  Hardware is Sun X4540/Thor servers with 32gb RAM and 250kb drives.  We’ve been working through some workload management issues as we’ve brought the system online – mostly struggling with the few queries with huge costs that run at the same time normal queries are running.  Either the single high-cost query consumes all of the resources, or never makes it into the queue.

Part of the resolution was to double the memory in the 3 segment nodes.  The RAM was on back-order with Sun for a few weeks, then showed up today.  We brought down our QA system to install its memory upgrade and everything went fine.  At 3:00 pm we brought down production and began the installation . . . at 6:45 I was notified that after the memory was installed 2 of the 3 segments came back to life, but the third wouldn’t even boot.  We are on Sun’s Gold service package with a 4-hr SLA, so we logged a ticket with them and hoped to the best.  Its now 10:40 and I am hearing that we are in the process of installing a replacement server and re-distributing the data.

Definitely a bummer day for production, we’ll see what the damage is in the morning.  The Sys-Admin teams are burning midnight oil on this one – Thanks guys!

Advertisement

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.