Welcome to MSDN Blogs Sign in | Join | Help

News

  • Visit the SQL Server Best Practices site (http://sqlcat.com) for real-world guidelines, expert tips, and rock-solid guidance to take your SQL Server implementation to the next level.
Fast ordered delete

We have a visit from MySpace database team in our lab this week. I have used this opportunity and asked Kevin Stephenson, MySpace’s Senior Database Developer, about “pain points” he encounters while doing his everyday job supporting SQL Server. He brought up a problem of efficient ordered delete. He needs to delete large portions of older custom log entries periodically and he needs to delete them in certain order. He also knows that it is efficient to break huge update operations into more small ones to allow truncating log. Kevin is aware of several methods. He immediately illustrated his problem on a small example:

drop table t1
create table t1 (a int primary key, b char (100))
declare @i int
set @i=1
SET NOCOUNT ON
while (@i<100000)
begin
insert into t1 values (@i,'x');
set @i=@i+1
end
 
set statistics io on
set statistics time on
delete from t1 where a in (select top (10000) a from t1 order by a);
set statistics time off
set statistics io off

And I got the following result messages:

Table 't1'. Scan count 2, logical reads 30564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 20152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 80 ms.

The delete query in the above script has the following plan (before performing the delete):

Pic1

We can see the table t1 is accessed twice; the lower branch is performing the “select top (10000) a from t1” subquery and joining the result to the second occurrence of the same table t1. So we need to read the “deleted” portion of the table twice to identify the rows to delete and then once more to perform the delete. The question was, is it possible to perform the same operation more efficiently?

The answer is using a tip – a view with ORDER BY. You cannot use ORDER BY in a view definition unless you use also TOP in the SELECT clause in the same view. And we know we want to delete 10000 rows. So the view definition is as follows:

create view v1 as (select top (10000) * from t1 order by a)

and we can delete the “top” rows using simply

delete from v1

The query plan for this delete is much simpler.

Pic2

and the I/O and cpu statistics demonstrate the improvement:

Table 't1'. Scan count 1, logical reads 654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 75 ms.

Kevin Stephenson and Lubor Kollar

Posted: Thursday, May 21, 2009 1:54 AM by LuborK

Comments

Grumpy Old DBA said:

such a simple solution - I've just been working around applying an update based upon a TOP which of course will not take an order by - this makes a simple way around it - thanks.

# May 21, 2009 7:57 AM

SimonS Blog on SQL Server Stuff said:

The simplest way to delete the top X rows is use to use a derived table. delete t1 from ( select top

# May 22, 2009 9:11 AM

sunwei said:

Another way is using CTE

with d  as

(

select top (10000) a from t

)

delete from d;

# June 3, 2009 11:01 AM

LuborK said:

Thank you Simon and sunwei for you additions to my blog. Both derived tables and CTEs are great alternatives for views and I'm happy to see that they both work also in the case of the "efficient" delete!

# June 9, 2009 8:50 PM
Anonymous comments are disabled
Page view tracker