Tuesday, December 26, 2006

Postgresql - shockingly fast and scalable

I'm a postgresql nut. I have a bias against mysql because I think it's a great database for simple problems, and I don't have any database problems that simple.

Don't get me wrogn, I recommend using the right tool for the right job, and no doubt there are some jobs where mysql is the right tool. I just don't get those kinds of projects.

I would work with Oracle or MS SQL Server or IBM DB2 if I had the opportunity. However, when I work with a client, I recommend postgresql first, and only after they've rejected that (they really want to spend money) would I look at proprietary databases. That hasn't happened yet. Everyone faints with sticker shock. I know some people who use Microsoft solutions whenever they can, but then that's because that's what they know, and I don't think the cost of the software factors in since they just pirate it.

In any case, there's a great essay on how one guy used postgres and got incredible performance. As he says:


’ve pushed postgres on performance. And rather than finding it slow, I’ve found it shockingly fast and scalable. The traditional reward for a job well done is usually another, harder, job. But I’m not worried. I’m using Postgres. The worst that’ll happen is that I’ll get an excuse to actually get new hardware.


He used partitioning. I need to do that sometime. Maybe now, with version 8.2 out. Or possibly when 8.3 comes out. I'm concerned about some optimizations for queries that hit multiple tables (e.g., 160x30 tables). When people on the pgsql-general mailing list start talking about how great the optimizations on partitioned tables are, it'll be time to switch. For now, fat tables with appropriate indices and autovacuum/autoanalyze are good, and when queries still take too long (I have 300+GB of data at $DAYJOB) then I just build materialized views which are populated by triggers. Everything is good so far and I don't see any need to optimize further for another few years. Maybe when the data hits 1 terabyte (in 2-3 years) or when some individual tables hit 100GB I'll need to look into partitioning.

No comments: