Thursday, July 22, 2010

CTEs for clarity (no efficiency gain here)

Some messages are sent to two kannels. I've got the essential data in a postgresql table but I wanted to find the messages which were sent to both kannels (within 5 seconds of each other, most such duplicated messages are sent within the same second, or within 1 second of each other).

The query could have been done without CTEs (using subqueries) but I prefer the CTEs since they move the subqueries "out" of the select statement, making the select much easier to read.

/* set up the CTEs although they're not really common except in the sense that they're the same statement, I'm just using them as *table*expressions* :-) */
WITH lhs AS
(
select id,kannel,tstamp,dest,msg_text from decmtmo WHERE mt_mo='mt'
), rhs as
(
select id,kannel,tstamp,dest,msg_text from decmtmo WHERE mt_mo='mt'
)
SELECT lhs.id lid,rhs.id rid,abs(extract('epoch' from lhs.tstamp-rhs.tstamp)),
lhs.kannel lk, rhs.kannel rk, rhs.dest,trim(rhs.msg_text )
FROM lhs,rhs /* this is what improved, otherwise we'd have the subselects here */
WHERE lhs.id<>rhs.id /* make sure we don't look at the same row on both sides */
AND lhs.dest=rhs.dest AND lhs.msg_text=rhs.msg_text /* MT identity */
AND lhs.kannel<>rhs.kannel /* but different kannels */
AND lhs.id>rhs.id /* avoid showing two copies of the same row, with lhs and
rhs swapped */
AND 5 > abs(extract('epoch' from lhs.tstamp-rhs.tstamp))
/* within 5 seconds of each other */
ORDER by lhs.id,rhs.id

No comments: