|
ms
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
replacing characters in a stringin my application I get a lot of strings which I have to "clean up" before I pass them to a third-party library. The strings I have contain characters which are invalid for the third-party library, so I have to either remove them or replace them with reasonable alternatives. What is a good method of doing this? At the moment I have the following: string Clean(string element) { element = element.Replace(",", ""); element = element.Replace("-", ""); element = element.Replace("!", ""); element = element.Replace("/", ""); element = element.Replace("\\", ""); element = element.Replace("æ", "ae"); element = element.Replace("Æ", "AE"); element = element.Replace("ä", "ae"); element = element.Replace("Ä", "AE"); element = element.Replace("ø", "oe"); element = element.Replace("Ø", "OE"); element = element.Replace("ö", "oe"); element = element.Replace("Ö", "OE"); element = element.Replace("å", "aa"); element = element.Replace("Å", "AA"); element = element.Trim(' ', '.'); return element; } Thanks, Peter It will work but you are scanning the whole string for each replace. I have
no experience of it but Regex.Replace is likely to only scan the string once and call a delegate each time it finds a match against one of many patterns you specify... http://msdn.microsoft.com/en-us/library/ms149475.aspx
Show quote
Hide quote
On Wed, 17 Dec 2008 01:10:24 -0800, "Peter" <xdz***@hotmail.com> You are creating a new String for every replace. Using awrote: >Hi > >in my application I get a lot of strings which I have to "clean up" >before I pass them to a third-party library. The strings I have contain >characters which are invalid for the third-party library, so I have to >either remove them or replace them with reasonable alternatives. > >What is a good method of doing this? > >At the moment I have the following: > >string Clean(string element) >{ > element = element.Replace(",", ""); > element = element.Replace("-", ""); > element = element.Replace("!", ""); > element = element.Replace("/", ""); > element = element.Replace("\\", ""); > > element = element.Replace("æ", "ae"); > element = element.Replace("Æ", "AE"); > element = element.Replace("ä", "ae"); > element = element.Replace("Ä", "AE"); > element = element.Replace("ø", "oe"); > element = element.Replace("Ø", "OE"); > element = element.Replace("ö", "oe"); > element = element.Replace("Ö", "OE"); > element = element.Replace("Ã¥", "aa"); > element = element.Replace("Ã…", "AA"); > > element = element.Trim(' ', '.'); > > return element; >} > > >Thanks, >Peter StringBuilder instead avoids this and may run faster: string Clean(string element) { StringBuilder sb = new StringBuilder(element); sb.Replace(",", ""); // Other replaces return sb.ToString(); } rossum > element = element.Replace("æ", "ae"); Any chance to get a new version of the 3rd party library?> element = element.Replace("Æ", "AE"); > element = element.Replace("ä", "ae"); > element = element.Replace("Ä", "AE"); > element = element.Replace("ø", "oe"); > element = element.Replace("Ø", "OE"); > element = element.Replace("ö", "oe"); > element = element.Replace("Ö", "OE"); > element = element.Replace("å", "aa"); > element = element.Replace("Å", "AA"); Some of these replacements are locale sensitive. And even for the locales where they are valid, they affect (negatively) the quality of the text. So that is not "clean up", that is "crap" Imagine someone whould do this to English strings: element = element.Replace("w", "vv"); because some stupid library does not support 'w'. -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
Show quote
Hide quote
"Peter" <xdz***@hotmail.com> wrote in message My take: build a "conversion matrix" and then run every character in that news:%236MjMdCYJHA.4596@TK2MSFTNGP06.phx.gbl... > in my application I get a lot of strings which I have to "clean up" > before I pass them to a third-party library. The strings I have contain > characters which are invalid for the third-party library, so I have to > either remove them or replace them with reasonable alternatives. > > What is a good method of doing this? > > At the moment I have the following: > > string Clean(string element) > { > element = element.Replace(",", ""); > element = element.Replace("-", ""); > element = element.Replace("!", ""); > element = element.Replace("/", ""); > element = element.Replace("\\", ""); > > element = element.Replace("æ", "ae"); > element = element.Replace("Æ", "AE"); > element = element.Replace("ä", "ae"); > element = element.Replace("Ä", "AE"); > element = element.Replace("ø", "oe"); > element = element.Replace("Ø", "OE"); > element = element.Replace("ö", "oe"); > element = element.Replace("Ö", "OE"); > element = element.Replace("å", "aa"); > element = element.Replace("Å", "AA"); > > element = element.Trim(' ', '.'); > > return element; > } string through the matrix, outputting a clean string in the end. Something like this (air code!): private Dictionary<char, string> _conversions; // Constructor public <your class name> { // Ideally you would read these from a database or settings file so // that you wouldn't have to recompile if you find new things to replace _conversions.Add(',', ""); _conversions.Add('-', ""); _conversions.Add('!', ""); _conversions.Add('/', ""); _conversions.Add('\\', ""); _conversions.Add('æ', "ae"); _conversions.Add('Æ', "AE"); _conversions.Add('ä', "ae"); _conversions.Add('Ä', "AE"); _conversions.Add('ø', "oe"); _conversions.Add('Ø', "OE"); _conversions.Add('ö', "oe"); _conversions.Add('Ö', "OE"); _conversions.Add('å', "aa"); _conversions.Add('Å', "AA"); } private string Clean(string element) { StringBuilder sb = new StringBuilder(); foreach(char c in element) { // NOTE: The following line may not compile since one option returns // a string and the other a char. In that case, make it a full blown // if/else clause. sb.Append(_conversions.Contains(c) ? _conversions[c] : c); } return sb.ToString().Trim(' ', '.'); } Oh, and for what it's worth, it sounds like your third-party library sucks.... Thanks for all the comments.
With regards to the 3rd-party library, it is a content management system, and it imposes rules on the names that can be used for path elements and the "items" or "nodes" which make up the hierarchical content structure. Some things I do accept, like / or \ in a name (much the same as in windows) - but I don't really know why one can't use [ or ) or "international" letters like æ or ø. I don't have an exhaustive list of all the invalid characters. The data I receive comes from a database, and I have to then insert it in the CMS - which gives problems if I read "invalid" strings from the database, so I have to make some sort of "conversion". /Peter > The data I receive comes from a database, and I have to then insert it Is the result visible somewhere "as is", or it will always go thru some> in the CMS - which gives problems if I read "invalid" strings from the > database, so I have to make some sort of "conversion". "conversion layer"? Maybe you can come with some kind of escaping system? For instance have the string as utf-8, then escape all bytes > 127 When you get them back, you unescape and get the original utf-8 strings, not characters damaged. -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email Mihai N. wrote:
> > The data I receive comes from a database, and I have to then insert Hi - I'm not sure I completely follow you. What I am doing is reading> > it in the CMS - which gives problems if I read "invalid" strings > > from the database, so I have to make some sort of "conversion". > > Is the result visible somewhere "as is", or it will always go thru > some "conversion layer"? > > Maybe you can come with some kind of escaping system? > > For instance have the string as utf-8, then escape all bytes > 127 > When you get them back, you unescape and get the original utf-8 > strings, not characters damaged. company data from a database, and putting them into the hierarchical structure of the CMS (as items/nodes in the CMS) - as well as some accompanying data (like contact info, address, images etc). This is to make it easy for site editors to access and change information which is shown on some of the website's pages. Eg. IT companies microsoft yahoo And some of the companies might have "illegal" characters in their names (eg ! in Yahoo!). /Peter
Other interesting topics
How to Verify if EventHandler for ValueChanged has been set?
Procedure xxx expects parameter '@ID' which was not supplied Linq Style Can C# 3.0 not be used in websites? Design of a suer interface similar to visual studio Cross Platform encryption/decryption SQLException - message property was empty Bogus control property values Import a form Repository and Services |
|||||||||||||||||||||||