<< Click to Display Table of Contents >> Navigation: 5. Detailed description of the Actions > 5.5. Standard > 5.5.12. Replace String (High-Speed action) |
Icon:
Function: ReplaceStrings
Property window:
Short description:
Replace a given string by another into some selected columns.
Long Description:
This operator searches for some given strings into some selected columns. If there is a “match”, then the “found” string is replaced by another one (specified inside the “After” field).
There are three different ways to compute the “matches”:
a)String-Replace Partial String.
For example, the default settings of this Action:
… will process the column “Value” and transform the string “1,000.9” into this string: “1000.9” (i.e. it transforms a number from the “French notation” to the more common“English notation” used everywhere into Anatella). This is not a very good example because the changeDataType Action is usually the best way to perform such transformation.
b)String-Replace Whole Column.
For example, these settings:
… will transform a column containing the “.” string (that represents the “missing value” inside the SAS system) into a column containing a string of zero-length (which usually represents the “missing” value in many systems). Note that the string “1,000.9” will NOT be changed (it won’t be replaced by “1,1009”) because there is a “match” ONLY IF the whole column matches the given string.
c)String-Replace based on Regular expression.
There are 2 operating modes when working with regular expressions:
I.You regular expressions do not contain any capturing parenthesis.
Anatella replace the whole string matching your regular expression with the (unique) “After” field. For example:
II.You regular expressions contain capturing parenthesis.
You must give a different “After” string for each different capturing parenthesis. By default, the many different “After” strings are written in the “After” field, separated by a comma (or separated by a semicolon in the example below):
The many “Regular expression” options (24 in total) are all the different combinations of the following basic options:
I.Case Sensitive/Case insensitive (self-explanatory)
II.Replace All Matching Strings / Replace First Matching String (self-explanatory)
III.Maximal Matching / Minimal Matching.
Let’s assume that we are searching for the regular expression “<b>.*</b>” inside the string “We must be <b>bold</b>, very <b>bold</b>!”. Each match must be replaced with “small”. We obtain:
a. For Minimal Matching: “We must be small, very small!”
b.For Maximal Matching: “We must be small!”
This option has no effect on how the capturing parenthesis are working.
IV.Perl Like / Perl Like with greedy captures / Reg.Exp. as defined by W3C.
Regular expression syntax is slightly different inside Perl than inside a web page. With this option, you can:
a)select the desired syntax (Perl or Web).
b)select how the capturing parenthesis are working: For historical reasons, quantifiers (e.g. *) that apply to capturing parentheses are more "greedy" than other quantifiers. For example, in the Regular Expression “ba*(a*)b”, we want to replace all capturing parenthesis with the “!” character. The Regular Expression “a*(a*)” will, of course, match "baaab" but we obtain, as output:
owithout greedy capture (i.e. perl-like): “baaab”
owith greedy capture: “b!b”
The optional table on the second input pin of this Action is also used to do “whole column” content replacement. It contains 3 columns:
1.The name of the column to process
2.The string to search for (before)
3.The replacement string (after)
Usually, the second input pin is attached to an “Inline-Table” that contains a list of replacements to make to correct spelling mistakes.
The last tab of the property window of this parameter describes some string processing that you can activate on specific columns. These processings are performed BEFORE any attempt to replace anything. In particular, when you are using the “String Replace” action with the “Correct Spelling” Action of Anatella, it’s strongly suggested to always activate the following string-processing:
These String-Processings are performed in a top-to-bottom order (referring to the order of the controls inside the property window of the “String Replace” action). Here is a small explanation of these String-Processings:
a)Remove Punctuation: replace the character dot, comma, exclamation mark, quotation mark, semi-column, double-dot with the blank (‘ ‘) character.
b)Simplifies String: remove any un-necessary blank character. For example the string “ AB C D ” simplifies to “AB C D”.
c)To Upper Case: self-explanatory
d)To Lower Case: self-explanatory
e)Un-Accent Case: self-explanatory
f)Remove the first ‘n’ characters: self-explanatory
g)Remove the last ‘n’ characters: self-explanatory
h)Truncate to ‘n’ characters: self-explanatory