|
ms
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
foreach, IEnumerable and modifying contentsI have a class that parses web pages and extracts all relevant file addresses. It allows me to download every pdf on a web page, for instance. I would like to incorporate threads so that I can download N files separately. The obvious solution is a thread pool. However, I need to make sure that I download the files Async - so I can get percentage and status information to my interface. I have decided that the best way to do this is to have my Download (a class representing the file to download) to have events raised when they are finished. I was hoping to have my threads rejoin with the thread pool when the downloads are finished. However, I have my Download instances coming out of an IEnumerable<Download> that is recieved from the WebExtractor class (which parses the HTML) on-the-fly using "yield return". I think I am lacking some basics about Thread Pools. How can I use a thread pool and have the events fired by the Downloads still reach the interface? Is there are way to add an event handler to an instance while in a foreach or IEnumerator code block? Any help would put me one step closer to being done with my second release of the software. Thanks in advance! ~Travis On 2007-11-27 20:24:28 -0800, "jehugalea***@gmail.com"
<jehugalea***@gmail.com> said: > I have a rather complex need. Perhaps. Though, I suspect it's more that you've created a complex need, where it wasn't really necessary to do so. > I have a class that parses web pages and extracts all relevant file A reasonably common operation.> addresses. It allows me to download every pdf on a web page, for > instance. I would like to incorporate threads so that I can download N > files separately. > The obvious solution is a thread pool. However, I need to make sure It seems to me that a different "obvious" solution would be to just use > that I download the files Async - so I can get percentage and status > information to my interface. the async methods on the HttpWebRequest class, or even just a plain TcpClient or Socket instance, along with a queue. The producer of the queue would add URLs to be downloaded, while the consumer would keep track of how many active downloads are going on (via HttpWebRequest, TcpClient, or Socket). Every time the producer adds something to the queue, it would signal the consumer. The consumer in response would remove items from the queue, stopping when either the queue is empty or your maximum number of concurrent operations has been reached, whichever comes first. Upon completion of an item, the consumer would also be signaled, allowing it to pull a new item from the queue. In the above, I'm thinking of the consumer and producer as individual threads. But you could easily implement it without a thread dedicated to either, with the consumer and producer classes simply being called by whatever thread happens to be managing them at the time. In that case, "signaling" the consumer would be more a matter of just executing the method that attempts to dequeue more download operations. > I have decided that the best way to do this is to have my Download (a If you use the async methods on the above-mentioned classes, you get > class representing the file to download) to have events raised when > they are finished. I was hoping to have my threads rejoin with the > thread pool when the downloads are finished. the thread pooling behavior for free. > However, I have my Download instances coming out of an This is another reason I think a queue would be better. There's no > IEnumerable<Download> that is recieved from the WebExtractor class > (which parses the HTML) on-the-fly using "yield return". technical reason you can't implement an asynchronous enumerator, but having done so in this case seems to have overcomplicated the issue. A queue seems like a much more natural fit to me, and wouldn't have the same complicating factors you seem to be running into. > I think I am lacking some basics about Thread Pools. How can I use a I think you can avoid the question altogether, but the basic answer is > thread pool and have the events fired by the Downloads still reach the > interface? that the idea of a thread pool and having "the events...reach the interface" are orthogonal ideas. Because of the thread pool, you may have thread synchronization issues to deal with. But the basic question of raising an event in a way that some implementer of some interface receives them isn't affected by whether there are multiple threads involved. > Is there are way to add an event handler to an instance You can subscribe to an event at any time you find convenient.> while in a foreach or IEnumerator code block? > Any help would put me one step closer to being done with my second See above. I recommend abandoning this asynchronous enumerator idea > release of the software. Thanks in advance! and going with a nice, simple queue. Pete On Nov 27, 10:04 pm, Peter Duniho <NpOeStPe***@NnOwSlPiAnMk.com>
wrote: Show quote > On 2007-11-27 20:24:28 -0800, "jehugalea***@gmail.com" My first implementation actually had a Queue<Download> that was> <jehugalea***@gmail.com> said: > > > I have a rather complex need. > > Perhaps. Though, I suspect it's more that you've created a complex > need, where it wasn't really necessary to do so. > > > I have a class that parses web pages and extracts all relevant file > > addresses. It allows me to download every pdf on a web page, for > > instance. I would like to incorporate threads so that I can download N > > files separately. > > A reasonably common operation. > > > The obvious solution is a thread pool. However, I need to make sure > > that I download the files Async - so I can get percentage and status > > information to my interface. > > It seems to me that a different "obvious" solution would be to just use > the async methods on the HttpWebRequest class, or even just a plain > TcpClient or Socket instance, along with a queue. The producer of the > queue would add URLs to be downloaded, while the consumer would keep > track of how many active downloads are going on (via HttpWebRequest, > TcpClient, or Socket). > > Every time the producer adds something to the queue, it would signal > the consumer. The consumer in response would remove items from the > queue, stopping when either the queue is empty or your maximum number > of concurrent operations has been reached, whichever comes first. > > Upon completion of an item, the consumer would also be signaled, > allowing it to pull a new item from the queue. > > In the above, I'm thinking of the consumer and producer as individual > threads. But you could easily implement it without a thread dedicated > to either, with the consumer and producer classes simply being called > by whatever thread happens to be managing them at the time. In that > case, "signaling" the consumer would be more a matter of just executing > the method that attempts to dequeue more download operations. > > > I have decided that the best way to do this is to have my Download (a > > class representing the file to download) to have events raised when > > they are finished. I was hoping to have my threads rejoin with the > > thread pool when the downloads are finished. > > If you use the async methods on the above-mentioned classes, you get > the thread pooling behavior for free. > > > However, I have my Download instances coming out of an > > IEnumerable<Download> that is recieved from the WebExtractor class > > (which parses the HTML) on-the-fly using "yield return". > > This is another reason I think a queue would be better. There's no > technical reason you can't implement an asynchronous enumerator, but > having done so in this case seems to have overcomplicated the issue. A > queue seems like a much more natural fit to me, and wouldn't have the > same complicating factors you seem to be running into. > > > I think I am lacking some basics about Thread Pools. How can I use a > > thread pool and have the events fired by the Downloads still reach the > > interface? > > I think you can avoid the question altogether, but the basic answer is > that the idea of a thread pool and having "the events...reach the > interface" are orthogonal ideas. Because of the thread pool, you may > have thread synchronization issues to deal with. But the basic > question of raising an event in a way that some implementer of some > interface receives them isn't affected by whether there are multiple > threads involved. > > > Is there are way to add an event handler to an instance > > while in a foreach or IEnumerator code block? > > You can subscribe to an event at any time you find convenient. > > > Any help would put me one step closer to being done with my second > > release of the software. Thanks in advance! > > See above. I recommend abandoning this asynchronous enumerator idea > and going with a nice, simple queue. > > Pete consumed when I recieved that a download had finished. However, it was difficult for my code to say, "Hey, stop trying to consume!" I ended up having a very rigid code set and I was hoping to get away from it. I was having BIG issues with the events of from one download finishing interrupting while another thread was in the middle of a locked block. I kept getting the occasional dead lock. My hope in my new design was to get away from the need for so much concurrency management. I did that by using the yield return statement and making that my Queue, in a sense. It also makes the termination point a lot easier to see. However, without a way of saying, "Hey, we're not ready to start downloading you yet - wait for a moment", I was downloading as many files at once as my computer could handle. So my hope was to find a way to say, "Hey wait" while not needing to necessarily manage the number of threads/concurrent downloads manually. I could try to manage the downloads manually again. I did move a lot of code around to separate the interface from the downloading, so it might be easier now than before. ThreadPools seemed more intuitive to me the second time around. Perhaps my first approach is the better one. Thanks for your thoughts, Travis On 2007-11-27 21:23:40 -0800, "jehugalea***@gmail.com"
<jehugalea***@gmail.com> said: > My first implementation actually had a Queue<Download> that was Typically with a queue, that point is when the queue is empty. It's > consumed when I recieved that a download had finished. However, it was > difficult for my code to say, "Hey, stop trying to consume!" not usually difficult. > I ended Well, for what it's worth you seem to be dealing with threading issues > up having a very rigid code set and I was hoping to get away from it. > I was having BIG issues with the events of from one download finishing > interrupting while another thread was in the middle of a locked block. > I kept getting the occasional dead lock. anyway. Dead lock is a consequence of a buggy implementation. If you had trouble dealing with thread synchronization in the previous design, you're likely to have trouble with any other design that also involves threads. > My hope in my new design was to get away from the need for so much How you intended to do that by introducing your own thread pool, I'm > concurrency management. not really clear on. :) > I did that by using the yield return statement Managing that with a the queue/async paradigm I mentioned would be > and making that my Queue, in a sense. It also makes the termination > point a lot easier to see. However, without a way of saying, "Hey, > we're not ready to start downloading you yet - wait for a moment", I > was downloading as many files at once as my computer could handle. So > my hope was to find a way to say, "Hey wait" while not needing to > necessarily manage the number of threads/concurrent downloads > manually. simple. Especially given the efficiency advantages of using the async i/o methods on the network classes, it seems to me that managing the concurrent consumer count by creating your own thread pool is much more complicated and error-prone. I'd say the puzzlement you appear to have put yourself into here is a good indication of that. :) > I could try to manage the downloads manually again. I did move a lot If it's like what I suggested, obviously I'd agree. :)> of code around to separate the interface from the downloading, so it > might be easier now than before. ThreadPools seemed more intuitive to > me the second time around. Perhaps my first approach is the better > one. Pete On Nov 27, 10:53 pm, Peter Duniho <NpOeStPe***@NnOwSlPiAnMk.com>
wrote: > On 2007-11-27 21:23:40 -0800, "jehugalea***@gmail.com" Well, the first go around, the queue being empty didn't mean I was> <jehugalea***@gmail.com> said: > > > My first implementation actually had a Queue<Download> that was > > consumed when I recieved that a download had finished. However, it was > > difficult for my code to say, "Hey, stop trying to consume!" > > Typically with a queue, that point is when the queue is empty. It's > not usually difficult. > done. It occurred quite often that I would finish downloading all my files before more files were added to the list. I should have mentioned that the application pulls all web pages off of a page and descends into those as well. It happened often that a web page was slow to download or that one would have many links, but not much media. I ended up having an empty queue regularly toward the beginning of a run. Since I had code for extracting html pages and another for specific file types, I had to keep them in sync so that the application would finish when and only when both were done. Again, this was a bit of a concurrency issue. Before I used the yield return method, my biggest indication that the program was being cancelled was a class-wide variable that need to be checked regularly (requiring lots of locks). However, I can just stop the web extractor now and the downloads will stop being yielded, which stops the downloader. The downloader can then cancel all running downloads and break out of the consuming loop. It did make concurrency simplier in this case. However, now I just have Downloads coming in as fast as they are found. I will try your approach of starting the next download when I have time. What I will have to do is make the Download consumer without a loop. But just MoveNext of the enumerator when I am indicated that a download finished. Here is a scenario: One download finishes and my code begins pulling the next Download. However, the web extractor is not ready. While waiting, another download finishes and now a second piece of code begins pulling the next Download. Now I have two pieces of code trying to access the same enumerator. Can I be sure that this won't corrupt my enumerator? If I were to lock the IEnumerator<Download>, would this cause a deadlock since they are different event handlers? Concurrency isn't that simple for someone who hasn't had to deal with it. I had plenty of theory in school, including producer/consumer algorithms. Dealing with events seems similar to threads, but they take complete control. Threads at least switch context when they hit a lock. Thanks again, Travis On 2007-11-28 07:01:58 -0800, "jehugalea***@gmail.com"
<jehugalea***@gmail.com> said: > Well, the first go around, the queue being empty didn't mean I was The queue being empty did in fact mean you were done, at least for the moment.> done. It occurred quite often that I would finish downloading all my > files before more files were added to the list. In a typical queue design, you would gracefully deal with an empty queue. A queue that's empty just means there's no work to do. The consumer sits idle (either as an actual thread blocked on an wait event, or just a class that doesn't do anything until some code calls something that adds something new to the queue) until there's more work to do. The logic is the same for the case of starting up some processing as it is for the case of temporarily running out of work to do and then being presented with some more. If your design didn't support that, then you probably did not separate the logic of the producer, consumer, and client of the queue well enough. > [...] I can't really comment on an enumerator that you haven't posted code > Here is a scenario: One download finishes and my code begins pulling > the next Download. However, the web extractor is not ready. While > waiting, another download finishes and now a second piece of code > begins pulling the next Download. Now I have two pieces of code trying > to access the same enumerator. Can I be sure that this won't corrupt > my enumerator? If I were to lock the IEnumerator<Download>, would this > cause a deadlock since they are different event handlers? for. Also, I haven't used any custom enumerators in real-world code, so I don't have much experience with them. However, I would say that if you have two pieces of code trying to access the same enumerator, you've got a bug. I would think that each call to GetEnumerator() should return a brand new one, so that different parts of the code don't interfere with each other. If you do decide to return the same enumerator to different parts of the code, or different instances of the same code, I'd say that at a bare minimum you will need to be VERY careful about how you use the enumerator (and for sure it will need to be written in a thread-safe way to account for this multiple access usage), and it's very likely there's a better way to design the code (like using a queue :) ). > Concurrency isn't that simple for someone who hasn't had to deal with Events and threads are, as I mentioned, orthogonal to each other. An > it. I had plenty of theory in school, including producer/consumer > algorithms. Dealing with events seems similar to threads, but they > take complete control. Threads at least switch context when they hit a > lock. event is really just a nice syntax for a multi-subscriber callback mechanism. When an event is raised, the handler always executes in the same thread in which it was raised. Multiple threads impose synchronization requirements on your code, and these requirements are the same whether you are using events or not. That said, I never meant to imply that concurrency was simple. It's not. If anything, my intent is to point out that concurrency is _not_ simple, and that your second design appears to have just made it more complicated than it otherwise needed to be. If you want to have multiple threads processing things, you _are_ going to have to deal with concurrency. So the question is not whether you can get away from concurrency issues or not; you can't. The question is how complicated are you going to make those issues. So far, it seems that you've made them very complicated. :) For fun, I'm thinking about working on a simple download simulation that uses a queue to manage the downloads. If and when it's finished, I'll post the code here in case you or anyone else is interested. Might not be done today, as I've got a busy day, but maybe tomorrow. Pete On Nov 28, 12:00 pm, Peter Duniho <NpOeStPe***@NnOwSlPiAnMk.com>
wrote: Show quote > On 2007-11-28 07:01:58 -0800, "jehugalea***@gmail.com" Your extended effort to help me is commendable. Thank you very much.> <jehugalea***@gmail.com> said: > > > Well, the first go around, the queue being empty didn't mean I was > > done. It occurred quite often that I would finish downloading all my > > files before more files were added to the list. > > The queue being empty did in fact mean you were done, at least for the moment. > > In a typical queue design, you would gracefully deal with an empty > queue. A queue that's empty just means there's no work to do. The > consumer sits idle (either as an actual thread blocked on an wait > event, or just a class that doesn't do anything until some code calls > something that adds something new to the queue) until there's more work > to do. The logic is the same for the case of starting up some > processing as it is for the case of temporarily running out of work to > do and then being presented with some more. > > If your design didn't support that, then you probably did not separate > the logic of the producer, consumer, and client of the queue well > enough. > > > [...] > > Here is a scenario: One download finishes and my code begins pulling > > the next Download. However, the web extractor is not ready. While > > waiting, another download finishes and now a second piece of code > > begins pulling the next Download. Now I have two pieces of code trying > > to access the same enumerator. Can I be sure that this won't corrupt > > my enumerator? If I were to lock the IEnumerator<Download>, would this > > cause a deadlock since they are different event handlers? > > I can't really comment on an enumerator that you haven't posted code > for. Also, I haven't used any custom enumerators in real-world code, > so I don't have much experience with them. However, I would say that > if you have two pieces of code trying to access the same enumerator, > you've got a bug. I would think that each call to GetEnumerator() > should return a brand new one, so that different parts of the code > don't interfere with each other. > > If you do decide to return the same enumerator to different parts of > the code, or different instances of the same code, I'd say that at a > bare minimum you will need to be VERY careful about how you use the > enumerator (and for sure it will need to be written in a thread-safe > way to account for this multiple access usage), and it's very likely > there's a better way to design the code (like using a queue :) ). > > > Concurrency isn't that simple for someone who hasn't had to deal with > > it. I had plenty of theory in school, including producer/consumer > > algorithms. Dealing with events seems similar to threads, but they > > take complete control. Threads at least switch context when they hit a > > lock. > > Events and threads are, as I mentioned, orthogonal to each other. An > event is really just a nice syntax for a multi-subscriber callback > mechanism. When an event is raised, the handler always executes in the > same thread in which it was raised. Multiple threads impose > synchronization requirements on your code, and these requirements are > the same whether you are using events or not. > > That said, I never meant to imply that concurrency was simple. It's > not. If anything, my intent is to point out that concurrency is _not_ > simple, and that your second design appears to have just made it more > complicated than it otherwise needed to be. > > If you want to have multiple threads processing things, you _are_ going > to have to deal with concurrency. So the question is not whether you > can get away from concurrency issues or not; you can't. The question > is how complicated are you going to make those issues. > > So far, it seems that you've made them very complicated. :) > > For fun, I'm thinking about working on a simple download simulation > that uses a queue to manage the downloads. If and when it's finished, > I'll post the code here in case you or anyone else is interested. > Might not be done today, as I've got a busy day, but maybe tomorrow. > > Pete > In a typical queue design, you would gracefully deal with an empty I grasp what you are saying, but I'm not sure what the thread does> queue. A queue that's empty just means there's no work to do. The > consumer sits idle (either as an actual thread blocked on an wait > event, or just a class that doesn't do anything until some code calls > something that adds something new to the queue) until there's more work > to do. The logic is the same for the case of starting up some > processing as it is for the case of temporarily running out of work to > do and then being presented with some more. while it is idle. That or I'm not sure how to wake it up. When you use "yield return", it actually is very much like a thread. It returns one thing and goes away until the next is needed. The class processing the downloads does idle before the next Download is yielded. This is just how "yield return" works and it did make my code *seem* cleaner. All methods with "yield return" return IEnumerable. You access the yielded data using an IEnumerator. So, I'm just using a foreach loop. It looks like this: public class DownloadManager { WebExtractor extractor = new WebExtractor(/* Arguments */); bool cancelled = false; object cancelSync = new object(); public void DownloadFiles() { // BEGIN THREAD foreach (Download download in extractor.Start()) // WebExtractor.Start yield returns // Downloads as they are found. { // add event handlers download.Start(); lock (cancelSync) { if (cancelled) { break; } } } // END THREAD } public void Cancel() { // BEGIN THREAD lock (cancelSync) { cancelled = true; } // END THREAD } } With Semaphores for example:
public class DownloadManager { WebExtractor extractor = new WebExtractor(/* Arguments */); bool cancelled = false; object cancelSync = new object(); Semaphore semaphore = new Semaphore(5, 5); public void DownloadFiles() { // BEGIN THREAD foreach (Download download in extractor.Start()) // WebExtractor.Start yield returns // Downloads as they are found. { // add event handlers semaphore.WaitOne(); download.StatusChanged += new StatusChangedEventArgs(status_Changed); download.Start(); lock (cancelSync) { if (cancelled) { break; } } } // END THREAD } private void status_Changed(object sender, StatusChangedEventArgs e) { if (e.Status == DownloadStatus.Complete) { semaphore.Release(); } } public void Cancel() { // BEGIN THREAD lock (cancelSync) { cancelled = true; } // END THREAD }
Show quote
On Nov 28, 3:07 pm, "jehugalea***@gmail.com" <jehugalea***@gmail.com> Actually, it appears that using Semaphores with WebClient is a no-no.wrote: > With Semaphores for example: > > public class DownloadManager > { > WebExtractor extractor = new WebExtractor(/* Arguments */); > bool cancelled = false; > object cancelSync = new object(); > Semaphore semaphore = new Semaphore(5, 5); > > public void DownloadFiles() > { > // BEGIN THREAD > foreach (Download download in extractor.Start()) // > WebExtractor.Start yield returns > // > Downloads as they are found. > { > // add event handlers > semaphore.WaitOne(); > download.StatusChanged += new > StatusChangedEventArgs(status_Changed); > download.Start(); > lock (cancelSync) > { > if (cancelled) > { > break; > } > } > } > // END THREAD > } > > private void status_Changed(object sender, StatusChangedEventArgs > e) > { > if (e.Status == DownloadStatus.Complete) > { > semaphore.Release(); > } > } > > public void Cancel() > { > // BEGIN THREAD > lock (cancelSync) > { > cancelled = true; > } > // END THREAD > } <jehugalea***@gmail.com> wrote in message
Show quote news:15b074ac-3777-4b3d-bbfb-afeca6dd9784@o42g2000hsc.googlegroups.com... That surprises me.> On Nov 28, 3:07 pm, "jehugalea***@gmail.com" <jehugalea***@gmail.com> > wrote: >> With Semaphores for example: >> >> public class DownloadManager >> { >> WebExtractor extractor = new WebExtractor(/* Arguments */); >> bool cancelled = false; >> object cancelSync = new object(); >> Semaphore semaphore = new Semaphore(5, 5); >> >> public void DownloadFiles() >> { >> // BEGIN THREAD >> foreach (Download download in extractor.Start()) // >> WebExtractor.Start yield returns >> >> // >> Downloads as they are found. >> { >> // add event handlers >> semaphore.WaitOne(); >> download.StatusChanged += new >> StatusChangedEventArgs(status_Changed); >> download.Start(); >> lock (cancelSync) >> { >> if (cancelled) >> { >> break; >> } >> } >> } >> // END THREAD >> } >> >> private void status_Changed(object sender, StatusChangedEventArgs >> e) >> { >> if (e.Status == DownloadStatus.Complete) >> { >> semaphore.Release(); >> } >> } >> >> public void Cancel() >> { >> // BEGIN THREAD >> lock (cancelSync) >> { >> cancelled = true; >> } >> // END THREAD >> } > > Actually, it appears that using Semaphores with WebClient is a no-no. I thought you might have some issues with the spidering/page parsing not running until there is a download slot available, and the code you posted clearly won't cancel the spider until one of the downloads completes (perhaps you can cancel each download somehow). What exactly is going wrong? Does it help to use BeginInvoke to perform the download from a thread other than the one holding the semaphore? <jehugalea***@gmail.com> wrote in message
Show quote news:71a6cde4-2d04-46f8-aa41-e3ec39226702@e23g2000prf.googlegroups.com... something like:> On Nov 27, 10:53 pm, Peter Duniho <NpOeStPe***@NnOwSlPiAnMk.com> > wrote: >> On 2007-11-27 21:23:40 -0800, "jehugalea***@gmail.com" >> <jehugalea***@gmail.com> said: >> >> > My first implementation actually had a Queue<Download> that was >> > consumed when I recieved that a download had finished. However, it was >> > difficult for my code to say, "Hey, stop trying to consume!" >> >> Typically with a queue, that point is when the queue is empty. It's >> not usually difficult. >> > > Well, the first go around, the queue being empty didn't mean I was > done. It occurred quite often that I would finish downloading all my > files before more files were added to the list. I should have > mentioned that the application pulls all web pages off of a page and > descends into those as well. It happened often that a web page was > slow to download or that one would have many links, but not much > media. I ended up having an empty queue regularly toward the beginning > of a run. > > Since I had code for extracting html pages and another for specific > file types, I had to keep them in sync so that the application would > finish when and only when both were done. Again, this was a bit of a > concurrency issue. Before I used the yield return method, my biggest > indication that the program was being cancelled was a class-wide > variable that need to be checked regularly (requiring lots of locks). > However, I can just stop the web extractor now and the downloads will > stop being yielded, which stops the downloader. The downloader can > then cancel all running downloads and break out of the consuming loop. > It did make concurrency simplier in this case. > > However, now I just have Downloads coming in as fast as they are > found. I will try your approach of starting the next download when I > have time. What I will have to do is make the Download consumer > without a loop. But just MoveNext of the enumerator when I am > indicated that a download finished. > > Here is a scenario: One download finishes and my code begins pulling > the next Download. However, the web extractor is not ready. While > waiting, another download finishes and now a second piece of code > begins pulling the next Download. Now I have two pieces of code trying > to access the same enumerator. Can I be sure that this won't corrupt > my enumerator? If I were to lock the IEnumerator<Download>, would this > cause a deadlock since they are different event handlers? delegate ... DownloadProcessor(...); Semaphore limit = new Semaphore(N); foreach (Download down in GetDownloads()) { limit.WaitOne(); DownloadProcessor dp = down.Process; dp.BeginInvoke(..., delegate { limit.Release(); } , null); // using the AsyncCallback to release one more semaphore after each download completes } Show quote > > Concurrency isn't that simple for someone who hasn't had to deal with > it. I had plenty of theory in school, including producer/consumer > algorithms. Dealing with events seems similar to threads, but they > take complete control. Threads at least switch context when they hit a > lock. > > Thanks again, > Travis
Show quote
On Nov 28, 12:26 pm, "Ben Voigt [C++ MVP]" <r...@nospam.nospam> wrote: The semaphore tells me to wait. I will try that, when I get a chance,> <jehugalea***@gmail.com> wrote in message > > news:71a6cde4-2d04-46f8-aa41-e3ec39226702@e23g2000prf.googlegroups.com... > > > > > > > On Nov 27, 10:53 pm, Peter Duniho <NpOeStPe***@NnOwSlPiAnMk.com> > > wrote: > >> On 2007-11-27 21:23:40 -0800, "jehugalea***@gmail.com" > >> <jehugalea***@gmail.com> said: > > >> > My first implementation actually had a Queue<Download> that was > >> > consumed when I recieved that a download had finished. However, it was > >> > difficult for my code to say, "Hey, stop trying to consume!" > > >> Typically with a queue, that point is when the queue is empty. It's > >> not usually difficult. > > > Well, the first go around, the queue being empty didn't mean I was > > done. It occurred quite often that I would finish downloading all my > > files before more files were added to the list. I should have > > mentioned that the application pulls all web pages off of a page and > > descends into those as well. It happened often that a web page was > > slow to download or that one would have many links, but not much > > media. I ended up having an empty queue regularly toward the beginning > > of a run. > > > Since I had code for extracting html pages and another for specific > > file types, I had to keep them in sync so that the application would > > finish when and only when both were done. Again, this was a bit of a > > concurrency issue. Before I used the yield return method, my biggest > > indication that the program was being cancelled was a class-wide > > variable that need to be checked regularly (requiring lots of locks). > > However, I can just stop the web extractor now and the downloads will > > stop being yielded, which stops the downloader. The downloader can > > then cancel all running downloads and break out of the consuming loop. > > It did make concurrency simplier in this case. > > > However, now I just have Downloads coming in as fast as they are > > found. I will try your approach of starting the next download when I > > have time. What I will have to do is make the Download consumer > > without a loop. But just MoveNext of the enumerator when I am > > indicated that a download finished. > > > Here is a scenario: One download finishes and my code begins pulling > > the next Download. However, the web extractor is not ready. While > > waiting, another download finishes and now a second piece of code > > begins pulling the next Download. Now I have two pieces of code trying > > to access the same enumerator. Can I be sure that this won't corrupt > > my enumerator? If I were to lock the IEnumerator<Download>, would this > > cause a deadlock since they are different event handlers? > > something like: > > delegate ... DownloadProcessor(...); > Semaphore limit = new Semaphore(N); > > foreach (Download down in GetDownloads()) { > limit.WaitOne(); > DownloadProcessor dp = down.Process; > dp.BeginInvoke(..., delegate { limit.Release(); } , null); // using the > AsyncCallback to release one more semaphore after each download completes > > > > } > > > Concurrency isn't that simple for someone who hasn't had to deal with > > it. I had plenty of theory in school, including producer/consumer > > algorithms. Dealing with events seems similar to threads, but they > > take complete control. Threads at least switch context when they hit a > > lock. > > > Thanks again, > > Travis- Hide quoted text - > > - Show quoted text -- Hide quoted text - > > - Show quoted text - as well. I will have to learn about Semaphores as well. |
|||||||||||||||||||||||