Replace 2nd occurance of a special character after nth occurance of a delimiter from

dhiru_b25@rediffmail.com · 10-24-2013, 03:32 AM

Here My question is,

Replace 2nd or all occurance of a special character after nth occurance of a delimiter from string,in unix/linux

or

Replace "Text Qualifier" character from data field in unix.

I have below string where '"'(Double Quote) should get replaced with space.

String:- "123"~"23"~"abc"~24.50~"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"~"2013-03-13"

From above string, i want below output:- "123"~"23"~"abc"~24.50~"descr :- nut size 12 & bolt size 12 1/2, Quantity=20"~"2013-03-13"

I have replaced " double quote character with space character.

"descr :- nut size 12" & bolt size 12"1/2, Quantity=20" & "descr :- nut size 12 & bolt size 12 1/2, Quantity=20"

I want to identify such rows from file & would like to replace such text qualifier character from data in unix/linux.

Request you to provide your inputs, & thanking you in advance.

pan64 · 10-24-2013, 04:25 AM

I would split the line into 3 parts (before, interesting, after). Probably you can split by the delimiters descr and Quantity. Next I will replace all the " chars in the interesting part and finally recreate the line. I do not know how can you identify such rows.

danielbmartin · 10-24-2013, 11:50 AM

Replace the SECOND comma after the FOURTH tilde with a #.

With this InFile ...

Code:

aaaa~bb~cc,cccc~d,,d,,d,,~ee,ee,ee,ee,~fff,f,fff~gg

... this cut-and-paste code ...

Code:

cut -d~ -f5-  $InFile >$Work1
cut -d\, -f1-2 $Work1 >$Work2
cut -d\, -f3-  $Work1 >$Work3
cut -d~ -f1-4 $InFile |paste -d'~#' - $Work2 $Work3 >$OutFile

... produced this OutFile ...

Code:

aaaa~bb~cc,cccc~d,,d,,d,,~ee,ee#ee,ee,~fff,f,fff~gg

Daniel B. Martin

danielbmartin · 10-24-2013, 11:52 AM

Replace the SECOND comma after the FOURTH tilde with a #.

With this InFile ...

Code:

aaaa~bb~cc,cccc~d,,d,,d,,~ee,ee,ee,ee,~fff,f,fff~gg

... this awk ...

Code:

awk -F "" '{tc=0; cc=0;        # tc = tilde count;  cc = comma count
  for (j=1;j<=NF;j++)          # examine each character, left-to-right 
  {if ($j=="~") {tc++; cc=0};  # at each tilde, reset the comma count
   if ($j==",") cc++;          # increment comma count
   if (tc==4 && cc==2) break}; # when criteria are met, bail out!
   {print substr($0,1,j-1)"#"substr($0,j+1)}}' $InFile >$OutFile

... produced this OutFile ...

Code:

aaaa~bb~cc,cccc~d,,d,,d,,~ee,ee#ee,ee,~fff,f,fff~gg

Daniel B. Martin

grail · 10-24-2013, 12:58 PM

Or maybe:

Code:

awk 'BEGIN{OFS=FS="~"}$5 = gensub(/,/,"#",2,$5)' file

This example is based on Daniels example input

danielbmartin · 10-24-2013, 01:28 PM

Quote:

Originally Posted by grail

Code:

awk 'BEGIN{OFS=FS="~"}$5 = gensub(/,/,"#",2,$5)' file

Once again, grail comes through with a solution which is concise and elegant. Bravo!

Daniel B. Martin

PTrenholme · 11-01-2013, 11:27 PM

CoSince the OP has not marked this thread as "Solved," and the proposed solutions do not seem to address the OP's original question, here's another stab:

The O.P. phrased the question like this:

Quote:

Replace "Text Qualifier" character from data field in unix.

I have below string where '"'(Double Quote) should get replaced with space.

String:- "123"~"23"~"abc"~24.50~"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"~"2013-03-13"

From above string, i want below output:- "123"~"23"~"abc"~24.50~"descr :- nut size 12 & bolt size 12 1/2, Quantity=20"~"2013-03-13"

I have replaced " double quote character with space character.

"descr :- nut size 12" & bolt size 12"1/2, Quantity=20" & "descr :- nut size 12 & bolt size 12 1/2, Quantity=20"

I want to identify such rows from file & would like to replace such text qualifier character from data in unix/linux.

The first part of the question is "I want to identify such rows [in the] file." Phrased that way, there is no reasonable way to answer it, since "such rows" has nowhere been defined.

Some possibilities:

Rows containing "size <number>"{<number>/<number>}
Rows containing 5 or more ~ delimited fields, and at least 1 "<number>{<number>{/<number>} in the fifth field.
Rows containing <ordinal number> or more <symbol> delimited fields, and at least 1 "<number>{<number>{/<number>} in the <ordinal number> field.
Rows containing <ordinal number> or more <symbol> delimited fields, and at least 1 standard linear SAE measurement in the <ordinal number> field.

Since that last interpretation seems most likely, here an expansion of gail's solution:

Code:

$field ~ /[[:digit:]][\047\042]/ {
  OFS=FS
  $field=gensub(/([[:digit:]]+)[\047\042](.)/,"\\1 \\2","G",$field)
}
{print}

Warning: This code uses gawk extensions, and may not work for other AWK programs.

Using the (single) line provided by the OP, I get:

Code:

$ echo ' "123"~"23"~"abc"~24.50~"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"~"2013-03-13"' | gawk -f ./replace.gawk -- FS=\~ field=5 -
 "123"~"23"~"abc"~24.50~"descr :- nut size 12  & bolt size 12 1/2, Quantity=20"~"2013-03-13"

Notes:

The "awk" part of the line has been highlighted. The final dash, at the end, is a gawk shorthand for /dev/stdin.
The program was saved in a file called replace.gawk for testing.
The last line, {print} is there so that non-SAE lines will be printed as well as the ones converted.
The single-quote (feet) and quote (inches) characters were entered as \047 and \042 to avoid problems when the code is parsed. (The number of backslashes needed is too dependent on the source of the code - command line, program, inside other quotes, etc.)

The program could be made into a command like this:

Code:

#!/bin/gawk -f
$field ~ /[[:digit:]][\047\042]/ {
  OFS=FS
  $field=gensub(/([[:digit:]]+)[\047\042](.)/,"\\1 \\2","G",$field)
}
{print}

, saving it, and making the file executable. If that code were to be saved as replace, this would result:

Code:

$ echo ' "123"~"23"~"abc"~24.50~"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"~"2013-03-13"' | ./replace FS=\~ field=5 -
 "123"~"23"~"abc"~24.50~"descr :- nut size 12  & bolt size 12 1/2, Quantity=20"~"2013-03-13"

Since the FS= and field= are passed as arguments rather than parameters, if you need to you can change those values when before different input files.
As written, all output goes to /dev/stdout. If the final line, {print} were changed to {print > out}, then placing an out=name before an input file would create the name file.