|
|
Mysteries of the NET Framework: Question 1
Last post 10-27-2008, 2:25 AM by Tormod. 22 replies.
-
09-23-2008, 3:32 AM |
|
|
Mysteries of the NET Framework: Question 1
When processing some data into a list, is it faster to build up a collection and return it as an enumerator, or use yield to create the enumerator 'on the fly'?
(See the article Mysteries of The Net Framework)
|
|
-
-
09-24-2008, 4:47 AM |
-
David Connell
-
-
-
Joined on 11-14-2005
-
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Also does it not depend on how big the result set could be? Couldn't that impact on the overall performance...
I have also found that yield style code can generate easier to read code, hence easier to maintain & debug etc.
So we used it in Red Gate Data Generator for that reason. (for more examples checkout Data Generator on http://www.codeplex.com/SDGGenerators .)
|
|
-
09-30-2008, 8:17 AM |
-
Steve S
-
-
-
Joined on 09-07-2007
-
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Could the result set be infinite? That would make it expensive to compute the whole set up-front, both in terms of time and memory!
For finite result sets, there's still a time/memory tradeoff, because yield introduces the overhead of allocating an iterator object on the heap, then repeatedly calling and returning from its GetNext method. Yield (and the idea of deferred execution) is important in LINQ, which uses it extensively.
|
|
-
09-30-2008, 10:05 AM |
-
Andrew Hunter
-
-
-
Joined on 09-09-2008
-
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
You get the overhead of calling 'MoveNext' and of creating at least one object on the heap whether or not you're using an enumerator generated by yield or another type of enumerable object. Even so, the performance characteristics of yield might be worth investigating: the results are interesting (as are the potential effects on memory usage).
Something that might also be worth thinking about is what happens when you try to debug a method that returns its results through yield, and what kinds of things that can happen when a yield method calls another yield method.
|
|
-
09-30-2008, 10:16 AM |
-
RobertChipperfield
-
-
-
Joined on 11-21-2006
-
Cambridge, UK
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
When I was developing the "Read from Backup" technology in Data Compare 6, I made use of yield quite a bit.
One of the particularly nice bits was a class that took n ordered IEnumerable<T>s, and was itself an ordered IEnumerable<T>. So you could have three incoming "streams" of results, something like {1,3,5}, {2, 2, 4}, {7, 8, 9}, and it'd return {1, 2, 2, 3, 4, 5, 7, 8, 9}, all without having to read in all of the (potentially very large amount of) data...
(I think that was for partitioned tables, by the way!)
Robert Chipperfield Developer Red Gate Software Ltd
|
|
-
09-30-2008, 10:42 PM |
-
Eric Gunn
-
-
-
Joined on 10-01-2008
-
Portland, ME
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
I'd say in most cases yield will be faster. Taking a look at the disassembled code for yield, using a tool like Reflector, the compiler creates an internal facade class to the Enumerator of the source data. So a call to a method that returns an Enumerator only returns a new instance to the facade. Then while iterating, the calls to MoveNext are passed through to the backing enumerator. Creating the "enumerator" takes effectively no time where as building up a collection to get the enumerator from takes time and memory.
Some superficial testing with large collections, and collections of collections, seems to confirm this. Comparing two methods that return IEnumerator<char> build up from a 100 element array of StringBuilders, each with 50000 character strings (5,000,000 char total), the method that uses yield completes quicker than building a char collection and returning its Enumerator. This is true also when the results are filtered, say return only the 'a's.
There's also the advantage that if the entire collection may not be enumerated then there is even greater time savings. If the enumeration needs to stop, say as soon as a 'g' is found, than with yield you only need to iterate over the collection until 'g' is found. When building up the collection, first the entire collection needs to be created, assuming you don't know how the Enumerator will eventually be used. Then iterated until a 'g' is found.
This does highlight a potential issue. If you think you can muck about with the original data once you have the enumerator, that's not safe if you used yield. You're actually using the original data's enumerator, so you'll get an exception if the data is changed while still iterating. If you build up a new collection and return it's enumerator then you can do whatever you want with the original data.
|
|
-
09-30-2008, 10:52 PM |
-
Damon
-
-
-
Joined on 06-26-2006
-
Dallas, TX
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
I was under the distinct impression that mucking around with the original data while you are enumerating on it, regardless of whether you are using yield or not, would muck things up (at least add/remove operations). Is that not the case?
Damon Armstrong, Technology Consultant [ Blog] [ Articles]
|
|
-
10-01-2008, 3:23 AM |
-
RobertChipperfield
-
-
-
Joined on 11-21-2006
-
Cambridge, UK
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Damon - I'm guessing Eric meant something like: public IEnumerable<T> FilterMyTs (IEnumerable<T> original, Filter<T> filter) { // Return the Ts that match the filter } Then if you used a yield, changing original after calling FilterMyTs would potentially cause a concurrent modification exception if you hadn't finished iterating over the result of FilterMyTs. However, if FilterMyTs built up the entire collection before returning, you'd be free to much around with original as much as you wanted, as soon as it returns. I guess this could tie in nicely with question 4 - what happens when it gets multi-threaded? If you build up the whole collection first, you only need to hold a lock on original (or some appropriate mutex that keeps other threads out, at least) for as long as that operation takes. However, using yield, you'd potentially need to ensure it didn't get modified for the whole life of the consume operation...
Robert Chipperfield Developer Red Gate Software Ltd
|
|
-
10-01-2008, 4:43 AM |
-
Nuno Gomes
-
-
-
Joined on 10-01-2008
-
Lisboa, Portugal
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
I definitly go for the yield approach, not just for the efective performance improvement but also for the defered execution.
From my ASP.NET programmer point of view I always have to keep in mind the question "Does it look faster for the client?". What I'm trying to say is that the overall client perception is as important as the server application performance.
Imagine the case where we got a GridView and, in some scenarios, we want to bind a something to it. Using the yield I know that the enumerator will only be created when I do a DataBind, and this is perfect, I'm defering the processing of data to when I really need the data available.
In a more general perspective the key to choose between the two scenarios is to know how many times we need to iterate thru the enumerator and whether the underlaying data changes must be reflected in the enumerator.
If you must iterate several times thru the enumerator then I would say to create an enumerator, but if you only iterate once or you must always be sync with the underlaying data then yield is definitly the correct answer.
|
|
-
10-01-2008, 5:16 AM |
-
Eric Gunn
-
-
-
Joined on 10-01-2008
-
Portland, ME
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Robert - Thank you for clarifying my point.
In the scenario where you build up a collection, I was considering "original data" to be the source data that the built up collection is derived from, and not the collection the enumerator is ultimately created for.
|
|
-
10-01-2008, 6:38 PM |
-
Lewis Moten
-
-
-
Joined on 10-06-2006
-
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Tricky one. I suppose one benefit to having an enum on the fly is that dependent resources (managed/unmanaged) don't need to waste time being closed until after the end-user disposes the object providing the enum. However, you could suffer a penalty if the end-user needs to repetitively access the same enum when having "cached" data in a collection would increase successive calls, resulting in the lack of a need to build up a collection after the first request.
Is the collection being cached? What type of collection is it? A hybrid dictionary seams like it would be optimal in this case since it can handle small and large lists of data differently. Are we working with excessive amounts of data or objects that hold blobs? An enumerator on the fly may be preferable in this case so that performance isn't hindered by an excessive use of memory.
|
|
-
10-05-2008, 6:52 PM |
-
mrhassell
-
-
-
Joined on 10-06-2008
-
Melbourne, Australia
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Yield. Very simple question.
|
|
-
10-15-2008, 8:21 AM |
-
Hugh Worm
-
-
-
Joined on 10-15-2008
-
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
If you have to go down the hen hutch, collect all your eggs in your basket while you're there. If they're already in the fridge, let it yield them up for your omlette on the fly. Horses for courses (if you're really hungry). (Oh dear.)
|
|
-
10-15-2008, 1:36 PM |
-
Lewis Moten
-
-
-
Joined on 10-06-2006
-
-
-
-
|
Re: Mysteries of the NET Framework: Question 1
Yea, but what do you do if you can't carry all the eggs back in one trip? Or even worse, what if the egg is bigger than you? Seems you would have to either make many small trips while keeping the door open, or bring the family to the hen-house to have dinner or help carry the egg back.
|
|
Page 1 of 2 (23 items)
1
|
|