Hi,
The following is an example of file i have in ubuntu platform. Column 2 consists of reference strings with which strings from column 1 are compared. The symbols that denote the additional or lacking of characters can only be put on strings in column 1, when strings at the same row are compared across the 2 columns.
Note:
1) . (a full stop) denotes lacking of a character at the end of string.
2) .. (two full stops) denote lacking of 2 characters at the end of string and so on.
3)'' (apostrophe) denotes additional character(s) at the end of string.
4) This additional & lacking of characters only happen at the end of strings.
5) Range of the number additional/lacking (in my actual file) = 0 - 10
6) length of strings in my actual file = around 20
input:
Code:
Column 1 Column 2
PETER PETER
PETER PETERAB
PETER PETERABC
JOHN ABJOHN
JOHN ABCJOHN
JOHNSON JOHN
JOHNSON JOH
JOHN OHN
JOHN HN
JOHNSON ABJOHN
ABJOHN JOHNSON
Expected output
Code:
Column 1 Column 2
PETER PETER
PETER.. PETERAB
PETER... PETERABC
..JOHN ABJOHN
...JOHN ABCJOHN
JOHN'SON' JOHN
JOH'NSON' JOH
'J'OHN OHN
'JO'HN HN
..JOHN'SON' ABJOHN
'AB'JOHN... JOHNSON
Can anyone provide me with scripts to solve this data processing. Thank you very much.