LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Blogs > Musings on technology, philosophy, and life in the corporate world
User Name
Password

Notices

Hi. I'm a Unix Administrator, mathematics enthusiast, and amateur philosopher. This is where I rant about that which upsets me, laugh about that which amuses me, and jabber about that which holds my interest most: Unix.
Rate this Entry

PostgreSQL and Indexes: a formula for...disaster?

Posted 01-27-2011 at 12:03 PM by rocket357
Updated 01-29-2011 at 03:41 PM by rocket357

I've noticed a ton of complaining about PostgreSQL's apparent lack of sanity when it comes to index usage, so I'd like to throw a few things out that explain the "idiotic" way PostgreSQL handles indexes.

First things first. If the table fits in a single page, YOU ARE NOT GOING TO SEE INDEX ACTIVITY. That gets more people than any other issue. If the table has 17 rows, don't expect PostgreSQL to bother with indexes. Using an index on a table that fits in a single page is like asking someone to fly from San Diego to Los Angeles. The cities are 116 miles apart...so by car you can get there in roughly 2 hours. Sure, the flight might only take 45 minutes, but you still have to drive to the airport, get through security, check in, load, wait for your baggage, drive to your destination, etc...In these cases, it's usually faster to just drive straight there. PostgreSQL is smart enough to figure this out and it "just drives straight there" (i.e. use a sequential scan, rather than read the index, filter out unwanted rows, then read the rows it needs (i.e. random seeks) out of the table file).

Ok, so let's say the table has 17 million rows, not 17. Chances are pretty good that the table is going to incur some disk i/o. In this instance, it makes sense that an index scan (given that an index exists that is a match for the query) would greatly reduce the disk i/o required...or does it? If you have a query like such:

SELECT name FROM BIG_TABLE WHERE type = 1;

And you have an index that covers the entire table (i.e. the index is roughly the same size as the table), the planner isn't going to bother reading the index. Why read a 10 GB index off the disk, just so you can read another 1 GB of the table off the disk? Why not just read the 10 GB table once and scan through it? (Aside from the disk i/o, doing a single table read can preserve the OS's filesystem cache instead of potentially nuking it by reading multiple files). PostgreSQL will, however, read a 500 MB index so it can read 1 GB off of the disk...

But wait! I have a 500 MB index on my 10 GB table, and PostgreSQL is *STILL* performing sequential scans! WHY?!

Here's where the fun begins. In postgresql.conf, there are a lot of settings that clue the planner in to your environment. Are you running with a single IDE drive from 1995? Well, then you probably want to do everything you can to reduce disk i/o. Tons of RAM? The planner needs to know that the filesystem cache is large, so it can take that into account. PostgreSQL comes with some relatively sane defaults, but the further you get from the "norm", the less efficient your installation will be. If you EXPLAIN ANALYZE and see big errors (bad row-count estimates, skipping indexes, etc...), chances are good you can tweak some planners hints in postgresql.conf and get better plans (or you need better statistics...but that's an entirely different issue). That shiny new SSD, for example, has much better random access time than conventional disks...drop your random_page_cost down so the planner has an accurate "map" of it's environment.

A good paper on the planner hints in postgresql.conf can be found here: http://www.pgexperts.com/document.html?id=7

Once you get a realistic environment set up, you'll find that the planner does make decent decisions and efficient use of the resources it has at it's disposal. So remember, test on realistic data, keep your indexes small, and by all means tune postgresql.conf **to match your environment**!
Posted in Uncategorized
Views 1115 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 04:53 AM.

Main Menu
Advertisement

My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration