Join this group if you use the Hadoop version of DMX.

81 Members
Join Us!

RegExReplace

Hello,

   Quick question about RegExReplace. I have a source column that I want to remove 2 different characters from the same column.

First is control characters, the second is the carat '^'. My RegExReplace command below is accepted, but it seems to ignore the removal of the carat.

RegExReplace(tableA.COLUMNA,
             U'[[:cntrl:][^]]', U' ')

/* Replace control characters with spaces and remove all carats '^' */

any help would be appreciated!

Thank You!

You need to be a member of Syncsort Community to add comments!

Join Syncsort Community

Email me when people reply –

Replies

  • Hi Don,

    The caret has a special meaning in regular expressions. If the caret is contained in square brackets ‘[]’ and is the first character in the brackets, then it will be interpreted with its special meaning. If it is not the first character in the brackets, it will be interpreted as the ‘^’ character, like you’d expect.

    In your case, it gets interpreted as its special meaning since it is the first and only thing in the brackets. Therefore, to get your intended behavior and actually remove the caret character from your data, you have to use an escape character before the ‘^’ so the regular expression will read it as the literal character, as opposed to interpreting its special meaning. However, because of the level of interpretation of the regex command within DMExpress, to get the functionality here, the backslash character needs to also be escaped by another backslash. The GUI will add the second backslash for you if you’re editing the RegExReplace expression from the pop-up display, but you will still need to type “\^” in the pattern box.

    So your final command would now look like this:

    RegExReplace(tableA.COLUMNA,
                 U'[[:cntrl:][\\^]]', U' ')

    Note the section with the caret now contains two escape characters (the ‘\\’ before the ‘^’).

    Also, just to clarify, the command you wrote will remove all instances of a control character followed by a caret character. Was that your intended behavior?

    Let me know if this helped!

This reply was deleted.

To access Syncsort Knowledge Base, visit:

My Support