Av rating:
Total votes: 62
Total comments: 42


Robyn Page
Robyn Page's SQL Server Cursor Workbench
24 January 2007
/* SQL Server Cursors Workbench */

--------------------------------

/*
The purpose of this series of workshops is to try to encourage you to take
a practical approach to SQL skills. I always find I learn things much quicker
by trying things out and experimenting. Please don't just run the
samples, but make changes, alter the data, look for my mistakes, try to see
if there are different ways of doing things. Please feel free to criticise
or disagree with what I say, if you can back this up.

This workbench on cursors is not intended to tell you the entire story, as a
tutorial might, but the details on BOL should make much more sense after you've
tried things out for yourself!

Contents
--------

What are cursors for?
Where would you use a cursor?
Global cursors
Are Cursors Slow?
Cursor Variables
Cursor Optimisation
Questions
Acknowledgements


What are cursors for?
---------------------

Cursors were created to bridge the 'impedence mismatch' between the 'record-
based' culture of conventional programming and the set-based world of the
relational database.

They had a useful purpose in allowing existing applications to change from
ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of
upheaval. DBLIB and ODBC make extensive use of them to 'spoof' simple
file-based data sources.

Relational database programmers won't need them but, if you have an
application that understands only the process of iterating through
resultsets, like flicking through a card index, then you'll probably
need a cursor.

Where would you use a Cursor?
-----------------------------

An simple example of an application for which cursors can provide a good
solution is one that requires running totals. A cumulative graph of monthly
sales to date is a good example, as is a cashbook with a running balance.
We'll try four different approaches to getting a running total...
*/

/*so lets build a very simple cashbook */
CREATE TABLE #cb ( cb_ID INT IDENTITY(1,1),--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money)--quantity
INSERT INTO #cb(et,amount) SELECT 'balance',465.00
INSERT INTO #cb(et,amount) SELECT 'sale',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',434.30
INSERT INTO #cb(et,amount) SELECT 'purchase',20.04
INSERT INTO #cb(et,amount) SELECT 'purchase',65.00
INSERT INTO #cb(et,amount) SELECT 'sale',23.22
INSERT INTO #cb(et,amount) SELECT 'sale',45.80
INSERT INTO #cb(et,amount) SELECT 'purchase',34.08
INSERT INTO #cb(et,amount) SELECT 'purchase',78.30
INSERT INTO #cb(et,amount) SELECT 'purchase',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',75.22
INSERT INTO #cb(et,amount) SELECT 'sale',5.80
INSERT INTO #cb(et,amount) SELECT 'purchase',3.08
INSERT INTO #cb(et,amount) SELECT 'sale',3.29
INSERT INTO #cb(et,amount) SELECT 'sale',100.80
INSERT INTO #cb(et,amount) SELECT 'sale',100.22
INSERT INTO #cb(et,amount) SELECT 'sale',23.80

/* You dont actually need a cursor. You can get a running total using a
correlated subquery */
SELECT [Entry Type]=Et, amount,
[balance after transaction]=(
SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount
END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb ORDER BY #cb.cb_id

--or you can do this simple inner join and group-by clause if you don't
--like correlated subqueries
SELECT [Entry Type]=MIN(#cb.Et), [amount]=MIN (#cb.amount),
[balance after transaction]=
SUM(CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount
END)
FROM #cb total INNER JOIN #cb ON total.cb_id <= #cb.cb_id
GROUP BY #cb.cb_id ORDER BY #cb.cb_id

--and here is a very different technique that takes advantege
--of the quirky behavionr of SET in an UPDATE command in SQL Server

DECLARE @cb TABLE(cb_ID INT,--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money,--quantity
total money)
DECLARE @total money
SET @total = 0

INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,CASE WHEN Et='purchase'
THEN -amount
ELSE amount
END,0 FROM #cb
UPDATE @cb
SET @total = total = @total + amount FROM @cb
SELECT [Entry Type]=Et, [amount]=amount,
[balance after transaction]=total FROM @cb ORDER BY cb_id


-- or you can give up trying to do it a set-based way and
-- iterate through the table

DECLARE @ii INT, @iiMax INT, @CurrentBalance money
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb

WHILE @ii<=@iiMax
BEGIN
SELECT
@currentBalance=@currentBalance
+CASE WHEN Et='purchase'
THEN -amount
ELSE amount
END FROM #cb WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total) SELECT @ii,@currentBalance
SELECT @ii=@ii+1
END
SELECT
[Entry Type]=Et,amount,total
FROM #cb INNER JOIN @Runningtotals r ON #cb.cb_id=r.cb_id

/*
or alternatively you can use......
----------....A CURSOR!!!
the use of a cursor will normally involve a DECLARE, OPEN, several
FETCHs, a CLOSE and a DEALLOCATE
*/
SET Nocount ON
DECLARE
@Runningtotals TABLE (cb_id INT, Et VARCHAR(10), --entryType
amount money, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money
--Declare the cursor
--declare current_line cursor -- SQL-92 syntax--only scroll forward

DECLARE current_line CURSOR fast_forward--SQL Server only--only scroll forward
FOR
SELECT
Et,amount
FROM #cb ORDER BY cb_id
FOR READ ONLY
--now we open the cursor to populate any temporary tables (in the case of
-- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line
--fetch the first row
FETCH NEXT FROM current_line
INTO @Et,@amount
WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN
SELECT
@CurrentBalance = COALESCE(@CurrentBalance,0)
+
CASE WHEN @Et='purchase'
THEN -@amount
ELSE @amount
END
INSERT INTO
@Runningtotals (Et, amount,Total)
SELECT @Et,@Amount,@CurrentBalance
-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line
INTO @Et,@amount
END
SELECT
[Entry Type]=Et,amount,Total FROM @Runningtotals ORDER BY cb_id
CLOSE current_line--Do not forget to close when its result set is not needed.
--especially a global updateable cursor!
DEALLOCATE current_line

-- although the Cursor code looks bulky and complex, on small tables it will
-- execute just as quickly as a simple iteration, and will be faster with tables
-- of any size if you forget to put an index on the table through which you're
-- iterating!

-- The first two solutions are faster with small tables but slow down
-- exponentially as the table size grows.

/* here is the result of all the routines above
Entry Type amount balance after transaction
---------- --------------------- -------------------------
balance 465.00 465.00
sale 56.00 521.00
sale 434.30 955.30
purchase 20.04 935.26
purchase 65.00 870.26
sale 23.22 893.48
sale 45.80 939.28
purchase 34.08 905.20
purchase 78.30 826.90
purchase 56.00 770.90
sale 75.22 846.12
sale 5.80 851.92
purchase 3.08 848.84
sale 3.29 852.13
sale 100.80 952.93
sale 100.22 1053.15
sale 23.80 1076.95
*/

--Why not try these different approaches, with tables of different
--sizes and see how long the routines take? (I demonstrate a suitable test-rig
-- shortly).

-- Is there a quicker or more elegant solution?
/* Global Cursors
-----------------

If you are doing something really complicated with a listbox, or scrolling
through a rapidly-changing table whilst making updates, a GLOBAL cursor
could be a good solution, but is is very much geared for traditional
client-server applications, because cursors have a lifetime only of
the connection. Each 'client' therefore needs their own connection.
The GLOBAL cursors defined in a connection will be implicitly deallocated
at disconnect.

Global Cursors can be passed too and from stored procedure and referenced
in triggers. They can be assigned to local variables. A global cursor
can therefore be passed as a parameter to a number of stored procedures
Here is an example, though one is struggling to think of anything
useful in a short example*/

CREATE PROCEDURE spReturnEmployee (
@EmployeeLastName VARCHAR(20),
@MyGlobalcursor CURSOR VARYING OUTPUT
)
AS
BEGIN
SET
NOCOUNT ON
SET
@MyGlobalcursor = CURSOR STATIC FOR
SELECT
lname, fname FROM pubs.dbo.employee
WHERE lname = @EmployeeLastName
OPEN @MyGlobalcursor
END

.
DECLARE @FoundEmployee CURSOR,
@LastName VARCHAR(20),
@FirstName VARCHAR(20)
EXECUTE spReturnEmployee 'Lebihan', @FoundEmployee OUTPUT
--see if anything was found
--note we are careful to check the right cursor!
IF CURSOR_STATUS('variable', '@FoundEmployee') = 0
SELECT 'no such employee'
ELSE
BEGIN
FETCH
NEXT FROM @FoundEmployee INTO @LastName, @FirstName
SELECT @FirstName+' '+@LastName
END
CLOSE
@FoundEmployee
DEALLOCATE @FoundEmployee

/*Transact-SQL cursors are efficient when contained in stored
procedures and triggers. This is because everything is compiled into one
execution plan on the server and there is no overhead of network traffic
whilst fetching rows.


Are Cursors Slow?
-----------------

So what really are the performance differences? Let's set up a
test-rig. We'll give it a really big cashbook to work on and give
it a task that doesn't disturb SSMS/Query analyser too much.
We'll calculate the average balance, and the highest and lowest balance.
Now, which solution is going to be the best?*/

--recreate the cashbook but make it big!
DROP TABLE #cb
CREATE TABLE #cb (cb_ID INT IDENTITY(1,1),--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money)--quantity
INSERT INTO #cb(et,amount) SELECT 'balance',465.00
DECLARE @ii INT
SELECT
@ii=0
WHILE @ii<20000
BEGIN
INSERT INTO
#cb(et,amount)
SELECT CASE WHEN RAND()<0.5 THEN 'sale' ELSE 'purchase' END,
CAST(RAND()*180.00 AS money)
SELECT @ii=@ii+1
END
--and put an index on it
CREATE CLUSTERED INDEX idxcbid ON #cb(cb_id)

--first try the correlated subquery approach...
DECLARE @StartTime Datetime SELECT @StartTime= GETDATE()
SELECT MIN(balance), AVG(balance), MAX(balance)
FROM
(
SELECT [balance]=(
SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount
END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb)g
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/*
elapsed time (secs)
-------------------
250

-- Now let's try the "quirky" technique using SET
*/
DECLARE @StartTime Datetime SELECT @StartTime= GETDATE()
DECLARE @cb TABLE(cb_ID INT,--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money,--quantity
total money)
DECLARE @total money
SET @total = 0

INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,CASE WHEN Et='purchase'
THEN -amount
ELSE amount
END,0 FROM #cb
UPDATE @cb
SET @total = total = @total + amount FROM @cb
SELECT MIN(Total), AVG(Total), MAX(Total)
FROM @cb
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())

/*
elapsed time (secs)
-------------------
1
almost too fast to be measured in seconds
*/

-- now the simple iterative solution
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SET nocount ON
DECLARE
@ii INT, @iiMax INT, @CurrentBalance money
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb

WHILE @ii<=@iiMax
BEGIN
SELECT
@currentBalance=@currentBalance
+CASE WHEN Et='purchase'
THEN -amount
ELSE amount
END FROM #cb WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total) SELECT @ii,@currentBalance
SELECT @ii=@ii+1
END
SELECT MIN
(Total), AVG(Total), MAX(Total)
FROM @Runningtotals
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/*
elapsed time (secs)
-------------------
2

Hmm.. thats a lot better than a correlated subquery but slower than
using the SET trick

now what about a cursor?
*/
SET Nocount ON
DECLARE
@StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money
--Declare the cursor
--declare current_line cursor -- SQL-92 syntax
---scroll forward only

DECLARE current_line CURSOR fast_forward--SQL Server only
---scroll forward
FOR
SELECT
Et,amount
FROM #cb ORDER BY cb_id
FOR READ ONLY
--now we open the cursor to populate any temporary tables (in the case of
-- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line
--fetch the first row
FETCH NEXT FROM current_line
INTO @Et,@amount
WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN
SELECT
@CurrentBalance = COALESCE(@CurrentBalance,0)
+
CASE WHEN @Et='purchase'
THEN -@amount
ELSE @amount
END
INSERT INTO
@Runningtotals (Total)
SELECT @CurrentBalance
-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line
INTO @Et,@amount
END
SELECT MIN
(Total), AVG(Total), MAX(Total)
FROM @Runningtotals
CLOSE current_line--Do not forget to close when result set is not needed.
--especially a global updateable cursor!
DEALLOCATE current_line

SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/*
elapsed time (secs)
-------------------
2

The iterative solution takes exactly the same time as the cursor!
(I got it to be slower (3 secs) with a SQL92 standard cursor


Cursor Variables
------------------------

--@@CURSOR_ROWS The number of rows in the cursor
--@@FETCH_STATUS Boolean value, success or failure of most recent fetch
---2 if a keyset FETCH returns a deleted row
So here is a test harness just to see what the two variables will
give at various points. Try changing the cursor type to see what
@@Cursor_Rows and @@Fetch_Status returns. It works on our temporary

Table


*/
--Declare the cursor
DECLARE @Bucket INT
--declare current_line cursor--we only want to scroll forward
DECLARE current_line CURSOR keyset --we scroll about (no absolute fetch)
/* TSQL extended cursors can be specified
[LOCAL or GLOBAL] [FORWARD_ONLY or SCROLL] [STATIC, KEYSET, DYNAMIC
or FAST_FORWARD]
[READ_ONLY, SCROLL_LOCKS or OPTIMISTIC]
[TYPE_WARNING]*/

FOR SELECT 1 FROM #cb
SELECT @@FETCH_STATUS, @@CURSOR_ROWS
OPEN current_line
--fetch the first row
FETCH NEXT --NEXT , PRIOR, FIRST, LAST, ABSOLUTE n or RELATIVE n
FROM current_line INTO @bucket
WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN
SELECT
@@FETCH_STATUS, @@CURSOR_ROWS
FETCH NEXT FROM current_line INTO @Bucket
END
CLOSE
current_line
DEALLOCATE current_line /*

if you change the cursor type definition routine above you'll notice that
@@CURSOR_ROWS returns different values
a negative value >1 is the number of rows currently in the keyset.
If it is -1 The cursor is dynamic.
A 0 means that no cursors are open or no rows qualified for the last opened
cursor or the last-opened cursor is closed or deallocated.
a positive integer represents the number of rows in the cursor
the most important type of cursors are...
FORWARD_ONLY
you can only go forward in sequence from data source, and changes made
to the underlying data source appear instantly.
DYNAMIC
Similar to FORWARD_ONLY, but You can access data using any order.
STATIC
Rows are returned as 'read only' without showing changes to the underlying
data source. The data may be accessed in any order.
KEYSET
A dynamic data set with changes made to the underlying data appearing
instantly, but insertions do not appear.



Cursor Optimization
--------------------

. Use them only as a last resort. Set-based operations are usually fastest
(but not always-see above), then a simple iteration, followed by a cursor

. Make sure that the cursor's SELECT statement contains only the
rows and columns you need

. To avoid the overhead of locks, Use READ ONLY cursors rather than
updatable cursors, whenever possible.

. , static and keyset cursors cause a temporary table to be
created in TEMPDB, which can prove to be slow

. Use FAST_FORWARD cursors, whenever possible, and choose FORWARD_ONLY
cursors if you need updatable cursor and you only need to FETCH NEXT.

Questions
----------
1/ What is the fastest wasy of calculating a running total in
SQL Server? Does that depend on the size of the table?

2/ what does it suggest if the @@CURSOR_ROWS variable returns a -1?

3/ What is the scope of a cursor?

4/ When might you want locking in a cursor? Which would you choose?

5/ Why wouldn't the use of a cursor be a good idea for scrolling
through a table in a web-based application?



Acknowledgements
----------------

Thanks to Nigel Rivett, Phil Factor and Adam Machanic for their ideas */


This article has been viewed 23474 times.
Robyn Page

Author profile: Robyn Page

Robyn Page is a consultant with Enformatica and USP Networks. She is also a well known actress, being most famous for her role as Katie Williams, barmaid and man-eater in the Television Series Family Affairs, when she was nominated as 'Most sexy newcomer' at the British Soap awards.

Search for other articles by Robyn Page

Rate this article:   Avg rating: from a total of 62 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: excellent
Posted by: Anonymous (not signed in)
Posted on: Thursday, January 25, 2007 at 3:35 AM
Message: We have to read any article posted by Robin Page as it falls in the category of very informative and example ridden piece. Any way this article was also very informative and examples are real. Keep up the excellent work.

Subject: thanks for the info..
Posted by: Anonymous (not signed in)
Posted on: Thursday, January 25, 2007 at 4:48 PM
Message: nicely packaged i.e., good background and history... simple examples and nice approach to proof.. i feel like i learned something.. thanks!

Subject: What is Recursive cursors ?
Posted by: Anonymous (not signed in)
Posted on: Friday, January 26, 2007 at 1:40 AM
Message: Realy Good artical, please Can u Give more information about Recursive cursors?

Subject: Very helpful
Posted by: Anonymous (not signed in)
Posted on: Friday, January 26, 2007 at 4:33 AM
Message: Excellent workbench, especially the stats on the different approaches,

Thanks

Subject: Very helpful
Posted by: Anonymous (not signed in)
Posted on: Friday, January 26, 2007 at 5:18 AM
Message: Excellent workbench, especially the stats on the different approaches,

Thanks

Subject: UPDATE method
Posted by: Anonymous (not signed in)
Posted on: Monday, January 29, 2007 at 1:16 PM
Message: The UPDATE method is not reliable because it assumes the rows will be updated in a particular order, but there is no such guarantee.

Subject: re: Update Method
Posted by: Robyn Page (view profile)
Posted on: Tuesday, January 30, 2007 at 1:00 PM
Message: Well spotted! I agree. I've been worrying a bit about that but I felt that pinning down the update order for certain would have complicated clarity of the script (it can be done, I'm told!). I experimented and couldn't get a table variable to update in any other order than the order in which the records were inserted into it, so I felt it was safe enough to include. Can anyone get it to update in a different order?

Subject: good work
Posted by: Anonymous (not signed in)
Posted on: Friday, February 02, 2007 at 12:44 AM
Message: Robin this what i would like to say about u.

"Give a man a fish, you feed him for a day. Teach a man to fish, you feed him for a life."

Subject: Excellent
Posted by: Anonymous (not signed in)
Posted on: Friday, February 02, 2007 at 1:18 AM
Message: this really excellent

Subject: SQL 2005, CTE's, Ranking/Windowing Functions
Posted by: Anonymous (not signed in)
Posted on: Monday, February 05, 2007 at 10:59 PM
Message: Where do you see CTE's and Ranking/Windowing Functions (SQL 2005) fitting into the mix?

Subject: Memory Tables v/s cursors
Posted by: jacob.sebastian (view profile)
Posted on: Tuesday, February 06, 2007 at 12:30 AM
Message: I had been wondering if a memory table (using a table variable) is a better choice than a cursor.

do you have an idae whether a memory table would give better performance?

Subject: What a crock!
Posted by: SHATCHARD (view profile)
Posted on: Tuesday, February 06, 2007 at 3:46 AM
Message: Although Ms Page may be quite an accomplished developer, she must have been ON THE SET the day I learned about SETS

What about GROUP BY for aggregations?

Also, GIVE CURSORS a wide berth. I have been optimising SQL stored procedures for 5 years and by far the biggest performance and locking issues come from cursors.

There is nearly always a SET based alternative to cursors. Its the clash of the procedural programming mind against the database/SET mind.

Nearly all database activities are quicker when done in bulk compared to iterating a loop.

If you must iterate across a resultset either farm that out to a Business Rules Layer in a procedural language or create an @ temporary table variable to hold your iteration data. The locking for the iteration will be only on a temporary table not the actual data tables.




Subject: Tempoary Tables, Table Variables, and Cursors
Posted by: Paker (view profile)
Posted on: Tuesday, February 06, 2007 at 6:38 AM
Message: I recently had cause to try processing the same data using each of the subject approaches - by far the best result was achieved using a cursor.

I've also had experience in using running totals in a report - while this can be done at least two ways using set theory, using a cursor improved performance by more than an order of magnitude.

Shatchard, I don't feel that you are being particularly helpful - there are typically multiple ways to achieve a desired result, and the variety of configurations that people use mean that experiment and experience on a particular target platform are worth at least as much as the dogma you present so doggedly.

Ms. Page has provided huge service to the user community, and is definitely on my 'must read' list.

I am not aware of any similar contribution on your part - and certainly did not find it in your posting. A well worked out example on the problems of misusing cursors would have served us much better than what you provided.

Subject: Cursors vs Set Based operations
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 06, 2007 at 7:04 AM
Message: I also feel it's misleading to promote cursors for use in some of these operations (arguably, running totals are a valid use, although this can also be achieved by aggregation) or by using incrementing variables (I also shy away from loops where possible).

Cursors have aggressive locking models, spread out over a (typically) longer period of time than an aggregate query/stream would take to execute, which can lead to serious problems in a high transaction volume (or high user load) environment - it is however quite likely that they can perform faster (execution time only, not necessarily resource cost) in test environments.

Adding up the sum of column x in table y using a cursor will always scale to a far lesser degree than using SUM() and GROUP BY, to give a simple example - it might be quicker to use the cursor on a single run, but try 500-1000 concurrent executions.

To be honest, I'd go further than SHatchard and say that these days, there is no excuse for embedding this sort of logic in SQL Server at all - if you need to perform complex operations of this type, you need a middle tier. Then you get the option of scaling out, rather than up...

Subject: Dillusional
Posted by: SHATCHARD (view profile)
Posted on: Tuesday, February 06, 2007 at 7:30 AM
Message: In answer to Paker, of course, theres always many ways to do anything. Indeed you can drive up a motorway backwards on the wrong side of the road. You could also jump out of an aeroplane without a parachute. However, theres one thing having the option to do it, it's a whole different ball game to RECOMMEND it.

Theres usually one right way to do something for that particular system and environment and in my experience no system has ever been improved with the use of cursors.

My comments derive from ACTUALLY solving performance problems in the REAL world; comparing different methods and approaches for tackling the same problem and my conclusion is the always the same - avoid cursors.

Actually I'm doing myself out of a job here, carry on using cursors, use as many as you can then I can earn huge consultancy fees going round clearing them up when your systems break down.



Subject: re: what a crock, dillusional etc.
Posted by: Tony Davis (view profile)
Posted on: Tuesday, February 06, 2007 at 8:50 AM
Message: Yes, there are always a lot of options, but the article does not just say "here are all the options and one of them is using a cursor". It does try to provide a measure of the performance of each solution on a reasonably-large database. I agree with "anonymous" that this is just a standalone performance test, and the next step is to test the *scalability* of each of these solutions and find out where each one breaks down. How can we do that?

Nor does the article say "go ahead and use cursors wherever you like, you'll be fine". It actually says "Use them only as a last resort. Set-based operations are usually fastest".

It's really is valuable to have Shatchard and others share their real experiences with using cursors -- but how can we test this out for ourselves? These workbenches are all about test cases and hard data, so that readers can find out for themselves what solution will work for their system.

Subject: double-entry accounting?
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 06, 2007 at 10:06 AM
Message: Does anyone else think that the sample table is fundamentally flawed, as it uses the same sign for the income and outgo values? I'd think you'd either indicate flow direction through sign, or have seperate income and outgo columns.

Also, how much is the string comparison in the CASE costing relative to the total times?

Lastly, would performance be enhanced (especially if using Group By set logic) with an index of (et, amount)?

Subject: Re: Double-entry Accounting
Posted by: Robyn Page (view profile)
Posted on: Tuesday, February 06, 2007 at 3:28 PM
Message: Yes, originally I used the same convention for the sample data that you suggested, but I looked up a cashbook example in an accounting textbook and this was the way the original cashbook entries were made. When I originally wrote the workbench, I did a complete worked example of a double-entry cashbook but took it out again as it wasn't directly relevant to the main issue of whether there were any classic accounting calculations for which a cursor is necessary. I just left the stub.
On the last two points, I'd very much hope that you'd tell me! The workbench approach would, I hope, allow you to put this to the test. I'd expect that a covering non-clustered index would be an improvement, as one would expect SQL Server to be able to get the data from the index leaf and not to have to access the data at all. My guess is that it would favour the set-based approaches.

Subject: Re: SHatchard and the Crock
Posted by: Robyn Page (view profile)
Posted on: Tuesday, February 06, 2007 at 4:06 PM
Message: Many thanks to SHatchard, Paker, (and the anonymous author of 'Cursors vs Set Based operations') for their contributions. I agree with quite a lot of what they say. I'm a little surprised, though, that SHatchard interpreted an article that starts by stating 'Relational database programmers won't need them', as a defense of the use of Cursors.
I always use a set-based solution wherever I can, and have never had any need to put a cursor-based solution into a production system. (I will happily use an iterative solution occasionally). However, I'm just keen to keep an open mind and adopt an experimental approach to the issue. Do cursors ALWAYS have an aggressive locking strategy even if you specify a read-only cursor? Are there any common problems that are better solved by an iterative or cursor-based strategy?

Surely we should put this sort of issue to the test?

I would be very interested in SHatchard's better GROUP BY solution, If he'd be prepared to send it to the editor, and would like to do some comparative timings with the solutions I used, and add this to the article.


Subject: Where to use cursors
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 07, 2007 at 2:23 PM
Message: I wholeheartedly agree that cursors should generally be avoided, but they can be a tool in your belt, even if little-used. If scalability is a concern, then cursors are probably not the best solution. However, there are times when they are, in fact, the best tool for the job.

The main place I use cursors is in administrative scripts - compiling fragmentation data for all tables for all databases, for instance. The DMVs in SQL 2005 finally offer more ways to use set-based solutions to get the desired results, but cursors are still perfectly fine for this use, since scalability is not a problem.

If you are working with very large amounts of data, you can use cursors to split up very large operations so as to avoid disk thrashing and excessive use of tempdb.

To simply throw out cursors completely is like tossing out your hammer because you have access to a pile driver.

Subject: The crux of my argument
Posted by: SHATCHARD (view profile)
Posted on: Wednesday, February 07, 2007 at 5:03 PM
Message: If i may re-iterate the last point from my first reply

"If you must iterate across a resultset either farm that out to a Business Rules Layer in a procedural language or create an @ temporary table variable to hold your iteration data. The locking for the iteration will be only on a temporary table not the actual data tables"

I was not saying that you never need to loop or iterate with T-SQL. I was saying 2 things

a) For loops required for production then a Business Layer would normally be the most efficient. Pass a resultset back to the layer and iterate there.

b) If you must do a loop in T-SQL then here is an example of looping using a temporary table variable (for SQL 2K/2005)

SET NOCOUNT ON
DECLARE @tab TABLE (id INT)
DECLARE @iCurrentRow INT
DECLARE @vName SYSNAME

INSERT INTO @tab(id)
SELECT id FROM master.dbo.sysobjects

WHILE EXISTS (SELECT 1 FROM @tab)
BEGIN
SELECT @iCurrentRow=id FROM @tab

SELECT @vName=name from master.dbo.sysobjects WHERE id=@iCurrentRow
PRINT @vName

DELETE FROM @tab WHERE id=@iCurrentRow
END

You dump the data you want use in your loop into the temporary table, thus moving the locking away from the actual tables to the temporary table.

My argument is that an article designed to explain cursors would actually lead to encouraging people to try them when actually there are much better (and efficient) alternatives.

As for a GROUP BY example, hmmmm, I think it was during my first day of learning about SQL that it dawned on me "ah SQL is designed for aggregations!".

Subject: Temp Tables
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 07, 2007 at 6:47 PM
Message: The whole point of having the locking on the base tables is for data integrity. Using a temp table circumvents the locking mechanism and can lead to integrity issues when more than 1 person is using a system.

Still waiting for a "running total" implementation using GROUP BY ...
(preferably one that can be used in SQL2000, so no custom aggregate functions)

Subject: Set vs. Procedural
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 07, 2007 at 10:00 PM
Message: I, for one, don't understand the distinction people make all the time between "set" and "procedural" in regards to DBMS's.

The simple fact is that the execution of the query is ultimately procedural. Either we program the procedure ourselves, or we describe our query and let the optimizer generate a procedure for us. The reason "set"-based solutions are usually faster is because the optimizer usually comes up with a very good procedure -- especially considering that it can leverage parallelism and (just about) as much memory as it wants.

If you look at the execution plans generated, you can see very clearly that SQL Server is simply converting "set" operations into "procedural" operations.

Take, for example, the correlated subquery and self-join execution plans. You will probably see a "nested loop" operation and later a "stream aggregate" operation. Not only are nested loops inefficient, but it generates a huge dataset that it then iterates through to calculate the aggregates.

The most efficient way is to pass through the data as few times as possible, which is what happens with the "quirky update", iterative, and cursor solutions. I am guessing that the iterative and cursor solutions are slower because of the implicit transactions that surround the DML being executed within the loops -- the update method makes 2 passes through the data, but only incurs the overhead of 2 transactions. You might want to try enclosing the iterative and cursor methods in a transaction to see if the elapsed times come closer to the time of the update method.

I believe the "quirky update" method does indeed rely on the details of the execution plan, so in more complex systems an index or statistics change could alter the execution plan and subsequently break the calculation. Also, because of its quirky nature, this method would probably be confusing to anyone trying to maintain the code in the future.

When using cursors, two key things to keep in mind are locking and transactions. Be sure you understand them and use them carefully! The availability of your system and the integrity of your data depend on them.

As far as the example from b) above, the proposed method accesses @tab many more times than it needs to. It scans @tab from top to bottom in the first SELECT statement of the loop and then scans @tab from top to bottom (due to no index) in order to execute the DELETE at the end of the loop. It performs these scans for every record in the table -- not the best use of system resources.

Please understand that I am not trying to provoke anybody here -- just trying to offer an honest analysis of the methods presented.

J. Steele

(Both my previous posts were marked "Anonymous" even though I had filled in the Name textbox - must I sign in for "nonanonymous" posting?)

Subject: Other uses for Cursors
Posted by: Anonymous (not signed in)
Posted on: Friday, February 09, 2007 at 1:25 PM
Message: I’ve got to weigh in on this to get a reaction to the following. Here is a real world example of using cursors that I’ve constructed. I have a table that contains a 2K varchar field (among other variables). The content of this field is a stylized stored procedure with one or more ‘replaceable’ variables. The purpose of this table of stored procedures is to perform data validation in a myriad of ways and process order. So, the processing sequence goes something like this. With a cursor, identify the validation records to run based on several parameters passed into the stored procedure. The results order from the cursor is based on an ordinal value associated with the record. For each record in the cursor set, do a search / replace on the replaceable fields. Construct a dynamic query where I have a standardized Insert into <table> [the select query from the cursor record with the ‘where’ variables updated from passed in parameter(s)]. I iterate through until no more validation routines need to be processed. This routine is called as part of a bigger data import process. Different data file formats trigger different validation requirements hence the passed in variables.

Why not do this outside the sql system? Or for that matter using sql stored procedures. Because the scenarios for validation change with enough frequency that test/release cycles to release binary code make it impractical. The current method allows the user to insert a new validation to be used in a file load within minutes of identifying the need. There is an interface that guides the end user department in constructing these validations thus allowing them to be self sufficient without requiring support / assistance from IT. It’s not always about faster execution.

The discussion has been centered on iterating through tables for some result but a SQL system can be a complete application in itself. As eluded to in various previous threads, what are you trying to achieve and are you smart about it. This article allows those without years of seat experience to see something tangible and build upon what was presented. Many thanks to Ms. Page for taking the time to outline ideas for others to digest and expand there breadth of knowledge. Thanks to those with their counter points as well.

Subject: flawed comparisons
Posted by: Anonymous (not signed in)
Posted on: Monday, February 12, 2007 at 7:45 PM
Message: Several things stand out on this performance comparison.
The example using the "SET" technique included an unused (but populated) field Et in the table variable @cb, this table variable was missing a primary key, and was missing a SET Nocount ON command.
The example using the cursor did not have this useless field created and populated, had a SET Nocount ON command. A primary key on the table variable used in the cursor is not possible.
By making some minor tweaks to the "SET" technique (adding a primary key to @cb, NOCOUNT, getting rid of the useless field), I next had to change to measure to milliseconds - performance was 15 to 20 times better than the cursor method.
So, if this article is an attempt to compare approaches, it is misleading. It does demonstrate that cursors can be almost as slow as poorly thought out set based solutions. However, that wasn't the message the author intended, and I'm still in the group of developers that view cursors as the technique of last resort.

Subject: re: Flawed comparisons
Posted by: Robyn Page (view profile)
Posted on: Tuesday, February 13, 2007 at 2:59 PM
Message:

Thanks very much for the contribution. It has certainly given food for thought.

Several points:

ET (EntryType) is indeed used in all the methods. One wants the comparison to be fair.

NOCOUNT was ON for all the timings.

The introduction of the primary key on the table variable on the 'Quirky Update' approach slows the performance very slightly on my test system.

Changing the index on the test table to a primary key clustered or a compound (covering) index made little difference to the timings.

Timings were all in seconds as the worst performers were running into several minutes.

The 'message that the author intended' about cursors was quite clearly stated at the beginning of the article 'Relational database programmers won't need them', but I would want to keep an open mind, and to put matters to the test. I'd particularly like to hear of any application for which a cursor provides the best approach.


Subject: Possible cursor application
Posted by: Anonymous (not signed in)
Posted on: Thursday, February 15, 2007 at 8:10 PM
Message: How about a scenario where you want to retrieve a threshold based on a percentage of a total?

Say you are a non-profit with a large pool of benefactors, and you want to view a "Top X%" list - where the percentage represents a portion of the total sum of contributions for a time period. So if you wanted a "Top 75%" list, you would start with your largest contributor and in descending order keep a running total until you have a sum that meets or exceeds 75% of your total. The amount that pushed the running total past 75% would then be used as the threshold for the list.

As a small example, say you received $100 total this month, with contributions of the following amounts: $55, $22, $5, $4, $4, $3, $2, $2, $2, $1

If you asked for "Top 50%", you would get a threshold of $55, since the first contribution pushed the running total over $50, and the list would contain 1 record.

If you asked for "Top 75%", you would get a threshold of $22, and the list would contain 2 records.

Seems like an odd type of request, but when dealing with large numbers of entities with widely varying values, it can be an effective way to view significant records.

Sure, this could be implemented outside of the database - but that would require pumping a lot of data outside of the SQL Server process.

Any thoughts on (MSSQL2000) implementations that might be more effective than a cursor or iteration?

Subject: Re: Possible Cursor Application
Posted by: Robyn Page (view profile)
Posted on: Saturday, February 17, 2007 at 11:39 AM
Message:
SET nocount ON

DECLARE
@Donors TABLE (donor VARCHAR(10),Amount money)
INSERT INTO @Donors (donor,amount) VALUES('Bill',55)
INSERT INTO @Donors (donor,amount) VALUES('Fred',22)
INSERT INTO @Donors (donor,amount) VALUES('Alice',5)
INSERT INTO @Donors (donor,amount) VALUES('Tom',4)
INSERT INTO @Donors (donor,amount) VALUES('May',4)
INSERT INTO @Donors (donor,amount) VALUES('Joan',3)
INSERT INTO @Donors (donor,amount) VALUES('Britney',2)
INSERT INTO @Donors (donor,amount) VALUES('Henry',2)
INSERT INTO @Donors (donor,amount) VALUES('George',2)
INSERT INTO @Donors (donor,amount) VALUES('Humph',1)

DECLARE @RankedDonors TABLE (donor VARCHAR(10),Amount money,
percentage numeric (8,2), cumulativePercent numeric (8,2))

--OK, we inserted them in rank order but let's be certain
--by inserting them into a new table and calculate the
--percentages at the same time
INSERT INTO @RankedDonors (Donor, Amount,Percentage)
SELECT Donor, Amount,[percentage]=(amount/total)*100 FROM
@donors CROSS JOIN
(SELECT [total]=SUM(amount) FROM @Donors) f
ORDER BY amount DESC

--now update this to give the cumulative percentage
DECLARE @RunningPercentage numeric (8,2)
UPDATE @RankedDonors
SET @RunningPercentage=cumulativePercent
=COALESCE(@RunningPercentage,0)+percentage

--and get the two top donors who breached the 75% limit
SELECT * FROM @RankedDonors
WHERE cumulativePercent <=
(
SELECT MIN(