|
ms
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Replace strings in a text file and get the number of replacements madeHi
Usually I only replace strings with text = text.Replace("old text", "new text"); Now I need to display the number of replacements made. Is there an easy way or do I need some custom replacement method? String a= ... ;
int n = (a.Length-a.Replace("old text", "").Length)/("old text".Length); should do. Vanderghast, Access MVP Show quoteHide quote "Smith" <n*@thank.you> wrote in message news:uJCFGKy$JHA.200@TK2MSFTNGP05.phx.gbl... > Hi > > Usually I only replace strings with text = text.Replace("old text", "new > text"); > > Now I need to display the number of replacements made. Is there an easy > way > or do I need some custom replacement method? On Tue, 07 Jul 2009 09:46:48 -0700, Smith <n*@thank.you> wrote:
> Hi As Vanderghast suggests, as long as the new text is always a different > > Usually I only replace strings with text = text.Replace("old text", "new > text"); > > Now I need to display the number of replacements made. Is there an easy > way > or do I need some custom replacement method? length than the original, you can simply look at the difference in length of the modified string, which will be an exact multiple of the difference in length between the searched-for text and the replacement text. If you don't have that guarantee -- that is, the new text could be the same length as the original -- then an alternative approach would be to use the Regex class. Normally I'd say for simple replacement it's overkill, but one thing it provides is an API that returns a list of matched sites in your original string; the length of that list is exactly the number you're looking for. Custom code to do the replacement would be most efficient, but Regex won't be awful, it's already there written for you, and if this isn't a bottleneck in your code it won't matter anyway. Pete I should have specified that you use String.Empty, NOT the new REPLACING
string, in the replace statement, so the result of myString.Replace(frament, String.Empty) will be the original myString shortenend by frament.Length each time fragment can be ***consumed*** from myString. Sure,that assumes you use the convention that "ohoh" occurs only two times in "ohohohohoh", not four times (***consuming*** assumption, rather than ***matching*** assumption). Vanderghast, Access MVP Show quoteHide quote "Peter Duniho" <no.peted.spam@no.nwlink.spam.com> wrote in message news:op.uwpmt30wvmc1hu@macbook-pro.local... > On Tue, 07 Jul 2009 09:46:48 -0700, Smith <n*@thank.you> wrote: > >> Hi >> >> Usually I only replace strings with text = text.Replace("old text", "new >> text"); >> >> Now I need to display the number of replacements made. Is there an easy >> way >> or do I need some custom replacement method? > > As Vanderghast suggests, as long as the new text is always a different > length than the original, you can simply look at the difference in length > of the modified string, which will be an exact multiple of the difference > in length between the searched-for text and the replacement text. > > If you don't have that guarantee -- that is, the new text could be the > same length as the original -- then an alternative approach would be to > use the Regex class. Normally I'd say for simple replacement it's > overkill, but one thing it provides is an API that returns a list of > matched sites in your original string; the length of that list is exactly > the number you're looking for. > > Custom code to do the replacement would be most efficient, but Regex won't > be awful, it's already there written for you, and if this isn't a > bottleneck in your code it won't matter anyway. > > Pete Hello Peter,
Looping through the string in a stringbuilder is probably the safest way to do this: string input = "bla bla bla bla bla bla blabla"; string search = "bla"; string replacement = "bli"; StringBuilder sb = new StringBuilder(input); int count = 0; for (int i = 0; i < sb.Length; ) { if (AreEqual(sb, search, i)) { sb.Remove(i, search.Length); sb.Insert(i, replacement); i += replacement.Length; count++; } else { i++; } } Console.WriteLine(count); Console.WriteLine(sb.ToString()); static bool AreEqual(StringBuilder sb, string val, int pos) { for (int i = 0; i < val.Length; i++) { if (sb[pos + i] != val[i]) { return false; } } return true; } It might be faster to use sb.Replace(search, replacement, i, search.Length) instead of a sb.Remove, sb.Insert, I'm not sure, but they won't differ that much. It is probably a lot faster than using a regex, though I haven't done any measurements. Jesse Show quoteHide quote > On Tue, 07 Jul 2009 09:46:48 -0700, Smith <n*@thank.you> wrote: jesse.houwing at sogeti.nl> >> Hi >> >> Usually I only replace strings with text = text.Replace("old text", >> "new text"); >> >> Now I need to display the number of replacements made. Is there an >> easy >> way >> or do I need some custom replacement method? > As Vanderghast suggests, as long as the new text is always a different > length than the original, you can simply look at the difference in > length of the modified string, which will be an exact multiple of the > difference in length between the searched-for text and the > replacement text. > > If you don't have that guarantee -- that is, the new text could be the > same length as the original -- then an alternative approach would be > to use the Regex class. Normally I'd say for simple replacement it's > overkill, but one thing it provides is an API that returns a list of > matched sites in your original string; the length of that list is > exactly the number you're looking for. > > Custom code to do the replacement would be most efficient, but Regex > won't be awful, it's already there written for you, and if this isn't > a bottleneck in your code it won't matter anyway. > > Pete > -- Jesse Houwing On Tue, 07 Jul 2009 14:45:02 -0700, Jesse Houwing
<jesse.houwing@newsgroup.nospam> wrote: Show quoteHide quote > Looping through the string in a stringbuilder is probably the safest way If you simply call StringBuilder.Replace(string, string, int, int) instead > to do this: > > [...] > int count = 0; > for (int i = 0; i < sb.Length; ) > { > if (AreEqual(sb, search, i)) > { > sb.Remove(i, search.Length); > sb.Insert(i, replacement); > i += replacement.Length; > count++; > } > else > { > i++; > } > } > [...] > > It might be faster to use sb.Replace(search, replacement, i, > search.Length) > > instead of a sb.Remove, sb.Insert, I'm not sure, but they won't differ > that much. of having your own AreEqual() method followed by a call to Remove() and Insert(), the performance should be practically identical, but you wouldn't get any information about how many replacements occurred. Alternatively, if you still call AreEqual() and then call StringBuilder.Replace(string, string, int, int), you're duplicating effort (which costs performance), because StringBuilder.Replace(string, string, int, int) has to actually do the string comparison again. That would actually be _slower_ than your original code. You could do a little hack by searching for the first character that differs between the search and replacement strings (as an initialization, not as part of the loop), and then bumping a counter after each call to StringBuilder.Replace() based on whether the character at the same offset within the current StringBuilder has changed. That would be only slightly slower than just calling StringBuilder.Replace(string, string, int, int), but would include the count. That said, I would hope that any Replace() method in .NET, including Regex.Replace(), String.Replace(), or StringBuilder.Replace() would be faster than the code you posted. The main reason being that all of those methods have the opportunity to optimize the construction of the new string, whereas your example doesn't optimize at all. At the very least, I would not use the Remove()/Insert() pattern you've shown. Instead, I would use a String as input, and a StringBuilder as output, appending text segments to the output StringBuilder as I scan the input String. That way the code avoids having to repeatedly shift your character buffer in the StringBuilder (which happens _twice_ for each replacement in your code). That's exactly the kind of optimization I'd expect to find inside the .NET classes. It might even be worthwhile to defer creation of the output StringBuilder until you detect the first match that needs to be replaced, if there's an expectation that for a significant frequency of input, no replacements would be needed. > It is probably a lot faster than using a regex, though I haven't done I would expect Regex to be on par with other explicit mechanisms like > any measurements. that, especially given the need to count the replacements (which for non-Regex solutions requires replacing the search text twice). If performance is an issue, then a "scan and build" approach as I suggest above is probably slightly faster than using built-in Replace() methods simply because you can incorporate a count into the replacement logic. All that said, if performance is an issue (and there's nothing in the OP to suggest it is), the only way to know for sure what the best solution is would be to try the different alternatives and measure them. Even theoretical advantages and disadvantages may be irrelevant for a typical data set, and intuition is a terrible way to measure performance. :) For best performance, it may be that none of the suggestions offered so far are probably suitable. There's an optimized text search algorithm, the name of which I can't recall at the moment, that can probably be adapted, but if not then a degenerate state-graph implementation (since there's only one string to search for) would probably work too. Either approach would avoid having to keep performing full string comparisons at each character index in the original string (consider an original string "aaaaaaaaaaaaaaaaaaaaaaaaaa" where you want to replace all occurrences of "aaaaaab" with something :) ). But even there, as I said, there's no way to know for sure without measuring. Performance of the various choices is to some extent going to be data dependent; liabilities that exist in the general case might not really be that much of a problem. For example, if dealing with essentially random data, it's not too terrible to just keep comparing over and over at an incremented index, because those comparisons will normally terminate quickly when there's no match. In other words, even the theoretically worst-case implementation might not turn out to be much different than the more optimized ones. Until there's a performance problem shown, the OP should stick with whatever solution _reads_ the best, and is the most maintainable. And if there is a performance problem shown, measuring each viable alternative is the only way to know for sure which will be fastest. Pete Hello Peter,
Agreed on the readability part, but using regex.replace opens up a new can of worms, which people aren't usually prepared for. Say this search/replace action can be entered from the UI, then adding . or * or { into your search pattern can lead to unexpected behaviour, or worse a regex parse error. The regex will also be expensive, because it will have to be parsed/compiled every time a new pattern is used (and if it is a user defined replacement, that would be more often than not). So this would have to be extended with a Regex.Escape call first. The same applies for the replacement pattern. Say I want to search $2 and replace it with $0.1 you'd get funny things... ($2.1 actually)... So it isn't just using a different call to get the same results. That said, I'd opt for an extention method on string and write an efficient version (could use mine as an example) of a Replace method that returns the number of matches. And from that moment on, use that. Just as readable (or even better) and no crazy unexpected regex problems due to not exactly understanding what is involved. Jesse Show quoteHide quote > On Tue, 07 Jul 2009 14:45:02 -0700, Jesse Houwing jesse.houwing at sogeti.nl> <jesse.houwing@newsgroup.nospam> wrote: > >> Looping through the string in a stringbuilder is probably the safest >> way to do this: >> >> [...] >> int count = 0; >> for (int i = 0; i < sb.Length; ) >> { >> if (AreEqual(sb, search, i)) >> { >> sb.Remove(i, search.Length); >> sb.Insert(i, replacement); >> i += replacement.Length; >> count++; >> } >> else >> { >> i++; >> } >> } >> [...] >> It might be faster to use sb.Replace(search, replacement, i, >> search.Length) >> >> instead of a sb.Remove, sb.Insert, I'm not sure, but they won't >> differ that much. >> > If you simply call StringBuilder.Replace(string, string, int, int) > instead of having your own AreEqual() method followed by a call to > Remove() and Insert(), the performance should be practically > identical, but you wouldn't get any information about how many > replacements occurred. > > Alternatively, if you still call AreEqual() and then call > StringBuilder.Replace(string, string, int, int), you're duplicating > effort (which costs performance), because > StringBuilder.Replace(string, string, int, int) has to actually do > the string comparison again. That would actually be _slower_ than > your original code. > > You could do a little hack by searching for the first character that > differs between the search and replacement strings (as an > initialization, not as part of the loop), and then bumping a counter > after each call to StringBuilder.Replace() based on whether the > character at the same offset within the current StringBuilder has > changed. That would be only slightly slower than just calling > StringBuilder.Replace(string, string, int, int), but would include > the count. > > That said, I would hope that any Replace() method in .NET, including > Regex.Replace(), String.Replace(), or StringBuilder.Replace() would be > faster than the code you posted. The main reason being that all of > those methods have the opportunity to optimize the construction of > the new string, whereas your example doesn't optimize at all. > > At the very least, I would not use the Remove()/Insert() pattern > you've shown. Instead, I would use a String as input, and a > StringBuilder as output, appending text segments to the output > StringBuilder as I scan the input String. That way the code avoids > having to repeatedly shift your character buffer in the StringBuilder > (which happens _twice_ for each replacement in your code). That's > exactly the kind of optimization I'd expect to find inside the .NET > classes. > > It might even be worthwhile to defer creation of the output > StringBuilder until you detect the first match that needs to be > replaced, if there's an expectation that for a significant frequency > of input, no replacements would be needed. > >> It is probably a lot faster than using a regex, though I haven't done >> any measurements. >> > I would expect Regex to be on par with other explicit mechanisms like > that, especially given the need to count the replacements (which for > non-Regex solutions requires replacing the search text twice). If > performance is an issue, then a "scan and build" approach as I suggest > above is probably slightly faster than using built-in Replace() > methods simply because you can incorporate a count into the > replacement logic. > > All that said, if performance is an issue (and there's nothing in the > OP to suggest it is), the only way to know for sure what the best > solution is would be to try the different alternatives and measure > them. Even theoretical advantages and disadvantages may be > irrelevant for a typical data set, and intuition is a terrible way to > measure performance. :) > > For best performance, it may be that none of the suggestions offered > so far are probably suitable. There's an optimized text search > algorithm, the name of which I can't recall at the moment, that can > probably be adapted, but if not then a degenerate state-graph > implementation (since there's only one string to search for) would > probably work too. Either approach would avoid having to keep > performing full string comparisons at each character index in the > original string (consider an original string > "aaaaaaaaaaaaaaaaaaaaaaaaaa" where you want to replace all occurrences > of "aaaaaab" with something :) ). > > But even there, as I said, there's no way to know for sure without > measuring. Performance of the various choices is to some extent going > to be data dependent; liabilities that exist in the general case > might not really be that much of a problem. For example, if dealing > with essentially random data, it's not too terrible to just keep > comparing over and over at an incremented index, because those > comparisons will normally terminate quickly when there's no match. > > In other words, even the theoretically worst-case implementation might > not turn out to be much different than the more optimized ones. > > Until there's a performance problem shown, the OP should stick with > whatever solution _reads_ the best, and is the most maintainable. And > if there is a performance problem shown, measuring each viable > alternative is the only way to know for sure which will be fastest. > > Pete > -- Jesse Houwing There is a difference between matching and replacing.
Someone can say "ohoh" is matched twice in "ohohoh", once starting at position 0 and once starting at position 2, but if you speak to replace (consume) it, you have only one possible 'action'. I haven't tried, but I assume Regex would find 2 matches, while replace will replace just once the pattern. And again, (InitialStrring.Length-InitialString.Replace(pattern, String.Empty).Length) / pattern.Length is 'safe', as far as I know, for all cases, and use no external loop, to count the number of replacements where will be of pattern into InitialString (by whatever newPattern, which is irrelevant). Vanderghast, Access MVP Show quoteHide quote "Jesse Houwing" <jesse.houwing@newsgroup.nospam> wrote in message news:e5317a7e78012e2d8cbcde395c74afa@news.microsoft.com... > Hello Peter, > > Agreed on the readability part, but using regex.replace opens up a new can > of worms, which people aren't usually prepared for. Say this > search/replace action can be entered from the UI, then adding . or * or > { into your search pattern can lead to unexpected behaviour, or worse a > regex parse error. The regex will also be expensive, because it will have > to be parsed/compiled every time a new pattern is used (and if it is a > user defined replacement, that would be more often than not). > > So this would have to be extended with a Regex.Escape call first. The same > applies for the replacement pattern. Say I want to search $2 and replace > it with $0.1 you'd get funny things... ($2.1 actually)... So it isn't just > using a different call to get the same results. > > That said, I'd opt for an extention method on string and write an > efficient version (could use mine as an example) of a Replace method that > returns the number of matches. And from that moment on, use that. Just as > readable (or even better) and no crazy unexpected regex problems due to > not exactly understanding what is involved. > > Jesse > >> On Tue, 07 Jul 2009 14:45:02 -0700, Jesse Houwing >> <jesse.houwing@newsgroup.nospam> wrote: >> >>> Looping through the string in a stringbuilder is probably the safest >>> way to do this: >>> >>> [...] >>> int count = 0; >>> for (int i = 0; i < sb.Length; ) >>> { >>> if (AreEqual(sb, search, i)) >>> { >>> sb.Remove(i, search.Length); >>> sb.Insert(i, replacement); >>> i += replacement.Length; >>> count++; >>> } >>> else >>> { >>> i++; >>> } >>> } >>> [...] >>> It might be faster to use sb.Replace(search, replacement, i, >>> search.Length) >>> >>> instead of a sb.Remove, sb.Insert, I'm not sure, but they won't >>> differ that much. >>> >> If you simply call StringBuilder.Replace(string, string, int, int) >> instead of having your own AreEqual() method followed by a call to >> Remove() and Insert(), the performance should be practically >> identical, but you wouldn't get any information about how many >> replacements occurred. >> >> Alternatively, if you still call AreEqual() and then call >> StringBuilder.Replace(string, string, int, int), you're duplicating >> effort (which costs performance), because >> StringBuilder.Replace(string, string, int, int) has to actually do >> the string comparison again. That would actually be _slower_ than >> your original code. >> >> You could do a little hack by searching for the first character that >> differs between the search and replacement strings (as an >> initialization, not as part of the loop), and then bumping a counter >> after each call to StringBuilder.Replace() based on whether the >> character at the same offset within the current StringBuilder has >> changed. That would be only slightly slower than just calling >> StringBuilder.Replace(string, string, int, int), but would include >> the count. >> >> That said, I would hope that any Replace() method in .NET, including >> Regex.Replace(), String.Replace(), or StringBuilder.Replace() would be >> faster than the code you posted. The main reason being that all of >> those methods have the opportunity to optimize the construction of >> the new string, whereas your example doesn't optimize at all. >> >> At the very least, I would not use the Remove()/Insert() pattern >> you've shown. Instead, I would use a String as input, and a >> StringBuilder as output, appending text segments to the output >> StringBuilder as I scan the input String. That way the code avoids >> having to repeatedly shift your character buffer in the StringBuilder >> (which happens _twice_ for each replacement in your code). That's >> exactly the kind of optimization I'd expect to find inside the .NET >> classes. >> >> It might even be worthwhile to defer creation of the output >> StringBuilder until you detect the first match that needs to be >> replaced, if there's an expectation that for a significant frequency >> of input, no replacements would be needed. >> >>> It is probably a lot faster than using a regex, though I haven't done >>> any measurements. >>> >> I would expect Regex to be on par with other explicit mechanisms like >> that, especially given the need to count the replacements (which for >> non-Regex solutions requires replacing the search text twice). If >> performance is an issue, then a "scan and build" approach as I suggest >> above is probably slightly faster than using built-in Replace() >> methods simply because you can incorporate a count into the >> replacement logic. >> >> All that said, if performance is an issue (and there's nothing in the >> OP to suggest it is), the only way to know for sure what the best >> solution is would be to try the different alternatives and measure >> them. Even theoretical advantages and disadvantages may be >> irrelevant for a typical data set, and intuition is a terrible way to >> measure performance. :) >> >> For best performance, it may be that none of the suggestions offered >> so far are probably suitable. There's an optimized text search >> algorithm, the name of which I can't recall at the moment, that can >> probably be adapted, but if not then a degenerate state-graph >> implementation (since there's only one string to search for) would >> probably work too. Either approach would avoid having to keep >> performing full string comparisons at each character index in the >> original string (consider an original string >> "aaaaaaaaaaaaaaaaaaaaaaaaaa" where you want to replace all occurrences >> of "aaaaaab" with something :) ). >> >> But even there, as I said, there's no way to know for sure without >> measuring. Performance of the various choices is to some extent going >> to be data dependent; liabilities that exist in the general case >> might not really be that much of a problem. For example, if dealing >> with essentially random data, it's not too terrible to just keep >> comparing over and over at an incremented index, because those >> comparisons will normally terminate quickly when there's no match. >> >> In other words, even the theoretically worst-case implementation might >> not turn out to be much different than the more optimized ones. >> >> Until there's a performance problem shown, the OP should stick with >> whatever solution _reads_ the best, and is the most maintainable. And >> if there is a performance problem shown, measuring each viable >> alternative is the only way to know for sure which will be fastest. >> >> Pete >> > -- > Jesse Houwing > jesse.houwing at sogeti.nl > > Smith wrote:
> Hi You can do the replacing yourself using IndexOf and a StringBuilder, so > > Usually I only replace strings with text = text.Replace("old text", "new > text"); > > Now I need to display the number of replacements made. Is there an easy way > or do I need some custom replacement method? that you can count them: string original = "1234567890123456789012345678901234567890"; string find = "23"; string replacement = "twentythree"; StringBuilder result = new StringBuilder(); int replacements = 0; int index = 0; do { int newIndex = original.IndexOf(find, index); if (newIndex != -1) { result.Append(original, index, newIndex - index); result.Append(replacement); replacements++; index = newIndex + find.Length; } else { result.Append(original, index, original.Length - index); index = original.Length; } } while (index < original.Length); Console.WriteLine(result.ToString()); Console.WriteLine(replacements);
Other interesting topics
Re: Ideas as to why file is locked
Performace hit due to Console.WriteLine Getting StandardIn has not been redirected when starting new Process Label and TextAlign Control.Invoke, label.Text property didn't get updated String Array to List Question Regarding RCW/COM Interop Excel in C# absolute digits with NumberFormatInfo class BinarySerialization size not consist?! |
|||||||||||||||||||||||