|
ms
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
get the actual size of a fileGenerally, FileInfo fi = new FileInfo(path); long size = fi.Length; gets you the length of a file in bytes. However, when copying files, even while the copy operation is still in progress, the filesize, as indicated in Windows Explorer or derived with the above two lines of code, will be the size of the file once the copy operation has completed. Is there a way to get the actual number of bytes written to the harddisk while a copy operation is under way? The reason I'm asking is that I have to copy rather large files and I'm currently using File.Copy(input, output) to do this. For a progress indication, I have a thread that gets the size of the output via the abovementioned code. Once the file has been copied, I append a second (binary) file, but prior to starting to append, I set the length of the output file to the total length the output is going to have. So, my progress indicator has 2 values only and my thread getting the filesize could just as well not exist. The only way around this I can imagine is dump File.Copy, create a new file manually, and copy the binary data from input to output in chunks of a certain size. Besides the additional complexity, is there any inheritent performance disadvantage of such a mechanism versus the built-in file copy mechanism? I'm just guessing here but I assume the size of I/O buffers could have a noticeable effect on performance. Regards Stephan You are absolutely correct. There will be a noticeable effect on
performance. Show quoteHide quote "Stephan Steiner" <stei***@isuisse.com> wrote in message news:OA9M983NFHA.1948@TK2MSFTNGP14.phx.gbl... > Hi > > Generally, > FileInfo fi = new FileInfo(path); > long size = fi.Length; > > gets you the length of a file in bytes. However, when copying files, even > while the copy operation is still in progress, the filesize, as indicated > in Windows Explorer or derived with the above two lines of code, will be > the size of the file once the copy operation has completed. Is there a way > to get the actual number of bytes written to the harddisk while a copy > operation is under way? > > The reason I'm asking is that I have to copy rather large files and I'm > currently using File.Copy(input, output) to do this. For a progress > indication, I have a thread that gets the size of the output via the > abovementioned code. Once the file has been copied, I append a second > (binary) file, but prior to starting to append, I set the length of the > output file to the total length the output is going to have. So, my > progress indicator has 2 values only and my thread getting the filesize > could just as well not exist. > > The only way around this I can imagine is dump File.Copy, create a new > file manually, and copy the binary data from input to output in chunks of > a certain size. Besides the additional complexity, is there any inheritent > performance disadvantage of such a mechanism versus the built-in file copy > mechanism? I'm just guessing here but I assume the size of I/O buffers > could have a noticeable effect on performance. > > Regards > Stephan > Stephany Young <noone@localhost> wrote:
> You are absolutely correct. There will be a noticeable effect on Well, there will be a noticeable effect on performance depending on the > performance. buffer size. There needn't be a noticeable effect on performance between File.Copy and copying chunk-by-chunk if the buffer size is chosen appropriately. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB to 2 GB, whereas the file to be appended will more likely be in the 10 - 100 MB area. Show quoteHide quote "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MPG.1cb8cd3ac10c98e498bf28@msnews.microsoft.com... > Stephany Young <noone@localhost> wrote: >> You are absolutely correct. There will be a noticeable effect on >> performance. > > Well, there will be a noticeable effect on performance depending on the > buffer size. There needn't be a noticeable effect on performance > between File.Copy and copying chunk-by-chunk if the buffer size is > chosen appropriately. I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the same machine as the machine the copy is exceuted on. (For instance, \\machA executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a common scenerio, but I was under the impression that in certain configurations one could avoid the bits going through \\machA at all. m Show quoteHide quote "Stephan Steiner" <stei***@isuisse.com> wrote in message news:%231wamp9NFHA.1172@TK2MSFTNGP12.phx.gbl... > So what would be a suitable buffer size? I need to make a copy of one file > and append another file to it. The first file can be anywhere from a 100 > MB to 2 GB, whereas the file to be appended will more likely be in the > 10 - 100 MB area. > > > "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message > news:MPG.1cb8cd3ac10c98e498bf28@msnews.microsoft.com... >> Stephany Young <noone@localhost> wrote: >>> You are absolutely correct. There will be a noticeable effect on >>> performance. >> >> Well, there will be a noticeable effect on performance depending on the >> buffer size. There needn't be a noticeable effect on performance >> between File.Copy and copying chunk-by-chunk if the buffer size is >> chosen appropriately. > > Mike <vimakef***@yahoo.com> wrote:
> I think, depending on the OS (and if file copy is calling the right APIs), I'm not sure, to be honest. I think I'd want to see it working before > File.Copy can be a huge winner especailly if both the files are not on the > same machine as the machine the copy is exceuted on. (For instance, \\machA > executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a > common scenerio, but I was under the impression that in certain > configurations one could avoid the bits going through \\machA at all. saying for certain either way :) -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too Stephan Steiner <stei***@isuisse.com> wrote:
> So what would be a suitable buffer size? I need to make a copy of one file I suspect with buffers larger than about 64K you end up with > and append another file to it. The first file can be anywhere from a 100 MB > to 2 GB, whereas the file to be appended will more likely be in the 10 - 100 > MB area. diminishing returns - and if the buffers are large enough to get on the large object heap, the memory won't be compacted. (It'll be collected after a long time, but not compacted, as far as I know.) Of course, if your app just runs and then exits after doing this copy, that isn't an issue. I suggest you try experiment with buffer sizes to find out what suits your app best. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too "Stephan Steiner" <stei***@isuisse.com> wrote in message All you can do is measure, however, you should keep in mind that ALL file IO news:%231wamp9NFHA.1172@TK2MSFTNGP12.phx.gbl... > So what would be a suitable buffer size? I need to make a copy of one file > and append another file to it. The first file can be anywhere from a 100 > MB to 2 GB, whereas the file to be appended will more likely be in the > 10 - 100 MB area. > > using the Framework IO classes are buffered IO's, that means that irrespective the buffer size you specify at the API level, the File System will buffer reads and writes from/to disk in the FS cache. The amount of bytes buffered depends on the used FS type (NTFS, FAT32, ....) and the usage pattern (sequential, random, mixed). So whether you read a byte or 256 KB at a time, the FS will always transfer a block of data from the disk device to the FS cache. That means that the transfer rate is theoretically determined by the speed of the physical IO path, however transferring the data blocks to the FS cache and from the FS cache further to the IO buffer in your application, means CPU overhead. It's obvious that the smaller the buffers at the API level the larger the overhead, buffers below a certain size will saturate the CPU, at which point the IO rate becomes CPU bound. So basically we have four determining factors for IO transfer rate: 1. Physical IO system (FS type, disk rotational speed, disk cache size, RAID level ...) 2. CPU speed and number of... 3. Sequential or Random IO. 4. Buffer size. I wrote a program to measure the impact of buffer size on the sequential IO rate and measured the CPU consumption and IO count (logical) and transfer speed. Following are the results obtained reading a single large file (10GB) from a single 10.000RPM, SATA drive (of course your mileage may vary). blocksize = 16 bytes, cpu = 99,99% speed = 10,44 MB/s, IO = 684444/s blocksize = 128 bytes, cpu = 78,31% speed = 57,89 MB/s, IO = 474199/s blocksize = 256 bytes, cpu = 43,42% speed = 56,37 MB/s, IO = 230906/s blocksize = 2 KB, cpu = 16,98% speed = 57,84 MB/s, IO = 29613/s blocksize = 4 KB, cpu = 14,63% speed = 56,37 MB/s, IO = 14432/s blocksize = 8 KB, cpu = 12,76% speed = 56,4 MB/s, IO = 7219/s blocksize = 16 KB, cpu = 13,23% speed = 57,84 MB/s, IO = 3702/s What does this tell us: - The transfer speed is optimal at buffer sizes > 128 bytes, anything smaller reduces the transfer speed to ~10MB/sec due to CPU saturation. - Anything larger than 128 bytes doesn't increase the IO rate but reduces the CPU consumption. - CPU consumption stabilizes with >4KB buffers (Cache managers overhead). If you want to further reduce CPU consumption, you will have to perform unbuffered IO using PInvoke. Conclusion, anything between 2KB and 8 KB gives you optimal results for both IO transfer and CPU consumption. Bigger buffers are only a waste of memory, too small buffers are a waste of CPU resources. Willy. TJB replied to:
> Conclusion, anything between 2KB and 8 KB gives you optimal results That conclusion is a bit too sweeping given the benchmark. For a 10kfor both > IO transfer and CPU consumption. Bigger buffers are only a waste of memory, > too small buffers are a waste of CPU resources. > RPM drive, those numbers sound underperforming. It sounds like your reads are synchronous and we also don't know what the underlying options that got sent to CreateFile in that case. I think you can get different results between sequential and random access caching. Also, it might be interesting to see what happens if you do a bunch of asynchronous random reads. Doesn't SATA support a form of command queuing? The drive might be able to do the SCSI like thing of ordering a batch of reads to conform to where it thinks the fastest order is.
Show quote
Hide quote
"stork" <tband***@mightyware.com> wrote in message Note that the transfer rates are Buffer from/to Disk NOT Buffer from/to news:1112542151.319533.99670@o13g2000cwo.googlegroups.com... > TJB replied to: > >> Conclusion, anything between 2KB and 8 KB gives you optimal results > for both >> IO transfer and CPU consumption. Bigger buffers are only a waste of > memory, >> too small buffers are a waste of CPU resources. >> > > That conclusion is a bit too sweeping given the benchmark. For a 10k > RPM drive, those numbers sound underperforming. > > It sounds like your reads are synchronous and we also don't know what > the underlying options that got sent to CreateFile in that case. I > think you can get different results between sequential and random > access caching. Also, it might be interesting to see what happens if > you do a bunch of asynchronous random reads. Doesn't SATA support a > form of command queuing? The drive might be able to do the SCSI like > thing of ordering a batch of reads to conform to where it thinks the > fastest order is. > Host, what makes you think that the numbers are underperforming? Note that this wasn't meant as a benchmark, my only purpose was to show the impact of the buffer sizes on CPU consumption and IO throughput for simple reads. Anyway to answer some of your questions; The reads are synchronous sequential from an non-fragmented single disk using a buffered Filestream IO .NET API. fs = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, blockSize); No additional options can be specified running v1.1 of the framework. Running on v2.0 with "Sequentialscan" option results in a 5% increase of the transfer rate. Running the same test asynchronously didn't result in a higher throughput (as expected). Doing sequential synchronous writes gave aprox. the same figures for the IO throughput with a smaller CPU overhead compared to the reads. Wily.
Other interesting topics
|
|||||||||||||||||||||||