Tuesday, February 21, 2006

php+postgresql+cursors

I ran into a problem recently because PHP was dying when processing a query that returned some 20 million rows. PHP was running out of memory.

After some googling and running the program several more times (with some runs involving increased memory limit for PHP), it became clear that PHP's postgres support code is loading whole result sets into RAM (and feeding it to PHP one by one, when the PHP code calls pg_fetch_row or pg_fetch_array) rather than using cursors so that it can load one row at a time. I don't know if ADODB or PEAR do better. If they're still using the underlying postgres specific functions though, then they're still going to have the same problem.

The solution is to use cursors explicitly. There are some caveats.
1. Using a cursor requires using a transaction. This isn't onerous, just
something to remember.

2. It's necessary to check after the FETCH to see if there are no more rows.
There's probably a prettier way, but I just go with an infinite while loop
that breaks when there are no more rows.

// assume that $conn is the db connection and that $q is the query, the ff is typed
// straight into blogger, might contain minor errors.


pg_query($conn,"BEGIN WORK");

$cres=pg_query($conn,"DECLARE my_cursor CURSOR FOR $q FOR READ ONLY");
if(!$cres)
{
die(pg_last_error($conn);
}

// query succeeded, fetch the rows from the cursor
while(true)
{
$res=pg_query($conn,"FETCH my_cursor");
if(!$res)
die(pg_last_error($conn));

// quit loop if no more rows in cursor
if(pg_num_rows($res)==0)
break;

// there's a row, get it and process it.
$row=pg_fetch_array($res);
process_row($row);

pg_free_result($res);
}

pg_query($conn,"CLOSE my_cursor");
pg_free_result($cres);

pg_query($conn,"END WORK");


fetching rows one at a time, above, is for illustration only. of course,
for efficiency, one would fetch many rows at a time and loop through them.

No comments: