Bummer Day for Production November 17, 2009
Posted by skunkworkscmj in DBMS Systems, Hardware.Tags: Greenplum, Sun 4540
add a comment
Our production Greenplum database system consists of 1 master node and 3 segment node servers. Hardware is Sun X4540/Thor servers with 32gb RAM and 250kb drives. We’ve been working through some workload management issues as we’ve brought the system online – mostly struggling with the few queries with huge costs that run at the same time normal queries are running. Either the single high-cost query consumes all of the resources, or never makes it into the queue.
Part of the resolution was to double the memory in the 3 segment nodes. The RAM was on back-order with Sun for a few weeks, then showed up today. We brought down our QA system to install its memory upgrade and everything went fine. At 3:00 pm we brought down production and began the installation . . . at 6:45 I was notified that after the memory was installed 2 of the 3 segments came back to life, but the third wouldn’t even boot. We are on Sun’s Gold service package with a 4-hr SLA, so we logged a ticket with them and hoped to the best. Its now 10:40 and I am hearing that we are in the process of installing a replacement server and re-distributing the data.
Definitely a bummer day for production, we’ll see what the damage is in the morning. The Sys-Admin teams are burning midnight oil on this one – Thanks guys!
TDWI – Day 1 – Open Source BI and Data Warehousing Tools November 3, 2009
Posted by skunkworkscmj in Analysts and Experts, Conferences, DBMS Systems, Open Source, Reporting.Tags: Birt, Greenplum, Jaspersoft, Linux, MySQL, Open Source, Palo, Pentaho, PostgreSQL, Talend
add a comment
Course taught by Krish Krishnan and Mark Madsen
I spent Monday afternoon attending the Open Source BI/DW course. The first part of the session was really a history lesson on technology patents and the economic forces behind the open source movement. The point here is that as technology makes leaps that change underlying usage and business models, someone loses and someone gains. The entrenched old-guard (the ones making the money) throw up roadblocks and challanges (barriers to entry), but eventually lose to a new generation of entrepreneurs and technology. Rinse and repeat. It doesn’t matter the technology, historical precedent, culture, etc – economics always wins in the long term.
Following the patent law/economic history lesson, we looked at the growth of the commercial software market, especially around the DW/BI. This is a maturing market. As the market for enterprise software matures, sales and licensing revenues decline with increasing emphasis professional service and consulting to keep overall profitability flat. Shifts in technology and architecture open the door to low-cost providers – enter open source.
We are seeing various models for open source software – either completely free and community supported (ie PostgreSQL, etc) or models that build on the community product by adding features or professional services that are available for a cost (ie Greenplum who turned PostgreSQL into MPP for a price, etc). The space for open source operating systems and DBMS’s is maturing (Linux, Suse, MySQL, MonetDB, etc), reporting and dashboarding tools are also (Pentaho, Jaspersoft, BIRT, Palo, etc). Open source data integration tools are not far behind (Pentaho/Kettle, Talend, etc). Options for metadata management and data visualization are fewer as these are niche markets still being served by commercial options while open source catches up.
Primary reasons why open source is catching on: (1) Cost savings, (2) Integration and customization options, (3) Reduced vendor dependence.
Risks of going open source are both real and imagined, depending on circumstances: (1) Lack of support, (2) Products are immature, (3) Perceived quality problems, (4) Lack of internal skills
The “right” approach for implementing open source in your organization is probably one where these technologies coexist with commercial solutions, with the commercial software being phased out over time as more open source options become available or mature.
This was a great presentation that dispelled myths and gave good insight into where the future is taking us. My notes above are summarizations of the material presented by Mark Madsen and Krish Krishnan – I take no credit for it. They have posted an older version of their presentation at Slideshare.
TDWI – Day 1 – Choosing the Right Data Warehousing Approach November 2, 2009
Posted by skunkworkscmj in Conferences, Program/Project Management.Tags: Architecture, Stanford, Strategic Execution
add a comment
Course taught by David Wells
Notice that the title doesn’t say “Choosing the Right Data Warehousing Architecture?” After some brief definitions to get everyone on the same page, the first order of business recognizing that an approach to data warehousing is bilateral. An approach must account for (1) the Data Architecture, and (2) the Project Architecture, both existing within the context of a company’s environment and culture.
Data Architecture speaks to “What is being built” while Project Architecture is about “How it gets built.” Project teams and sponsors often confuse the two architectures or don’t recognize the distinction between them. These aren’t completely independent aspects of building a data warehouse, though. Certain data architectures are best implemented with specific project architectures. The “how” has to be compatible with the “what.” Data architectures based on a high degree of integration are best developed and maintained with project methodologies that can drive standardization and agreement across the enterprise.
I’ve seen this at my own company – our data architecture is based on Inmon’s Corporate Information Factory. We have high levels of integration before the data ever leaves our “hub” for the downstream marts and extracts. When efforts to standardize data or definitions for that integration effort are not facilitated “top-down”, we flounder and end up with eight slightly different versions of the same field just to keep all of the departments and end-users happy. We are much more successful when the standardization is driven by senior executives at the corporation, we reach that single version of the truth.
Forgotten or disregarded even more often than Project Architecture are the constraints coming from the environment. This is where a company’s culture, values, and organizational structure influence project execution. Again, from my own experience in a company with a very decentralized and segmented organizational structure, getting to the standards needed for true enterprise capabilities is a struggle. More on the influence of culture and organizational identity on project execution can be found at Stanford’s Advanced Project Management series.

I was only in this session until mid-day, so I missed the second half of the course. Before wrapping up for lunch, we covered 16 critical factors to consider when determining your data warehousing approach. Covering topics like integration levels, metadata needs, latency, costs to deploy and support, and scalability, these metrics are also great for evaluating where your current weaknesses and evaluating where your data warehouse program needs to go. Great session – learned a lot, remembered a lot.
Off to TDWI-Orlando November 2, 2009
Posted by skunkworkscmj in Conferences.add a comment
A few of us from the team flew out this morning to the TDWI conference in Orlando. We’ll be there through Friday and are looking forward to a great week. We’ll be covering most aspects of data warehousing at the conference – everything from Data Governance and MDM to Enterprise Architecture, Cloud Computing, and trends in Open Source BI. You can follow me on Twitter or the conference on Twitter. More updates to follow during the week.
As a side note, I was a little shocked when the Delta kiosk asked me to pay $20 to check my bag – guess I’ve been flying Southwest too long.




