How to ignore Pipe in Pipe delimited file?

rohit_shinez · 08-08-2013, 01:22 PM

Hi guys,

I need to know how i can ignore Pipe '|' if Pipe is coming as a column in Pipe delimited file

for eg:

file 1:
xx|yy|"xyz|zzz"|zzz|12...
using below awk command

awk 'BEGIN {FS=OFS="|" } print $3

i would get xyz

But i want as :

xyz|zzz to consider as whole column representing as 3rd coulmn in that file

rtmistler · 08-08-2013, 01:30 PM

Try backslash or double backslash. Those are usually the delimiters. I just don't know awk syntax too well, that's my guess though.

---------- Post added 08-08-13 at 02:31 PM ----------

This appears similar to your question:

http://stackoverflow.com/questions/1...elimiter-regex

rohit_shinez · 08-08-2013, 01:40 PM

back slash where to use i want to ignore pipe if pipe is coming as value in a column and to consider as single column while i use to print a coulmn using awk command

szboardstretcher · 08-08-2013, 01:42 PM

Nested delimiters,.. interesting. Very first result from Google:

http://stackoverflow.com/questions/5...ted-delimiters

Good use of Awk there.

colucix · 08-08-2013, 02:26 PM

awk version 4 provides a way to manage such situations. Using the internal variable FPAT you can decide how fields are defined based on regular expressions. This means you don't set a field separator, but you decide what is a field. In your example a field is everything not containing a pipe or everything inside double quotes. Here we go:

Code:

echo 'xx|yy|"xyz|zzz"|zzz|12' | awk 'BEGIN{ FPAT = "([^|]+)|(\"[^\"]+\")" }{ for ( i = 1; i <= NF; i++ ) print $i }'
xx
yy
"xyz|zzz"
zzz
12

This is explained in the GNU awk manual, here: http://www.gnu.org/software/gawk/man...ing-By-Content.

konsolebox · 08-08-2013, 02:41 PM

This could be a good concept for that:

Code:

#!/usr/bin/gawk -f

BEGIN {
    OFS = "|"
}

{
    string = $0
    NF = 0

    if (length(string)) {
        while (match(string, /^"([^"]+)"\|(.*)/, temp) || match(string, /^([^|]*)\|(.*)/, temp)) {
            $(++NF) = temp[1]
            string = temp[2]
        }

        $(++NF) = string
    }

    print $3
}

Setting OFS to | is actually not necessary. And you could use other OFS as well.

rohit_shinez · 08-09-2013, 01:19 AM

i will try with above one guys but wat i actually needed is i am having a file with | seperated in which i need to search char in 3rd column and replace with null. i need to replace only the coulmn where character occurs in 3rd field
for eg:

Code:
file1.txt
xx|yy|xx|12

output file:
xx|yy||12

the above one i achieved with this below code
awk 'BEGIN {FS=OFS="|" } $3 ~ /[[:alnum:]]/ { $3="" }1' file

but wat i faced is if there is any column having pipe that should consider as single column

xx|yy|"xyz|xx"|AAA|12...

not i should achieve my requirement like this

xx|yy|"xyz|xx"||12

now AAA should replace with null considering as AAA as 4th column if use

awk 'BEGIN {FS=OFS="|" } $4 ~ /[[:alnum:]]/ { $4="" }1' file

danielbmartin · 08-09-2013, 06:02 AM

Quote:

Originally Posted by rohit_shinez

i will try with above one guys but wat i actually needed is i am having a file with | seperated in which i need to search char in 3rd column and replace with null. i need to replace only the coulmn where character occurs in 3rd field
for eg:

Code:
file1.txt
xx|yy|xx|12

output file:
xx|yy||12

the above one i achieved with this below code
awk 'BEGIN {FS=OFS="|" } $3 ~ /[[:alnum:]]/ { $3="" }1' file

but wat i faced is if there is any column having pipe that should consider as single column

xx|yy|"xyz|xx"|AAA|12...

not i should achieve my requirement like this

xx|yy|"xyz|xx"||12

now AAA should replace with null considering as AAA as 4th column if use

awk 'BEGIN {FS=OFS="|" } $4 ~ /[[:alnum:]]/ { $4="" }1' file

I want to write this post tactfully and respectfully. I realize that English is not your first language. Your post (quoted above) is confusing. Reword it carefully -- get help from a friend if necessary. Strive for clarity. Give more than two examples of input strings and the corresponding desired output strings.

Daniel B. Martin

grail · 08-09-2013, 07:19 AM

colucix's solution will work with what you need to do.

rohit_shinez · 08-09-2013, 10:45 AM

Hi Martin,

let me be clear with my requirements

for eg:
input file1.txt

xx|yy|"abc|xyz"|zz|12 .. .... ...

output file:

xx|yy|"abc|xyz"||12 .. .... ....

i want to replace the fourth column of file1.txt with space where 4th column will be alphanumeric value and also to consider zz value as fourth column instead of 5th column

awk 'BEGIN {FS=OFS="|" } $4 ~ /[[:alnum:]]/ { $4="" }1' file

i have achieved my requirement of replacing the column with space by below code but its not considering zz value as 4th column instead its replacing xyz as space since third coloumn i.e ""abc|xyz" is seperated by Pipe delimted

konsolebox · 08-09-2013, 11:41 AM

Can you tell us what version of awk you're using? Most of the solutions provided here already gives what you want to do. Only some minor modifications are needed.

rohit_shinez · 08-09-2013, 12:38 PM

nawk is the version under solaris OS

schneidz · 08-09-2013, 12:48 PM

this seems related (comma inside of feild of csv file):
http://www.linuxquestions.org/questi...0/#post5001726

grail · 08-09-2013, 11:19 PM

For future reference, you should include that you are working on Solaris as it is quite a different beast from linux and often has a smaller / different application set.

I have not tested konsolebox's solution, but the one from colucix will not work in nawk.

You could also look at Perl or Ruby if they are options.

konsolebox · 08-10-2013, 12:28 AM

Mine won't work with it as well. The array-generation of match() is an extension of gnu.

I tried to give a solution with this. This works but implementation in other awks compared to GNU awk is slower since when altering $x and NF they regenerate $0 right away.

Code:

#!/usr/bin/awk -f

BEGIN {
    OFS = "|"
}

function delete_column(i) {
    j = 0
    for (k = 1; k <= NF; ++k) {
        if (k == i) {
            ++j
        } else if (j) {
            $(k - j) = $k
        }
    }
    NF -= j
}

{
    string = $0
    NF = 0

    if (l = length(string)) {
        for (;;) {
            if (match(string, /^"[^"]+"\|/)) {
                next_string = string
                sub(/^"[^"]+"\|/, "", next_string)
            }
            else if (match(string, /^[^|]*\|/)) {
                next_string = string
                sub(/^[^|]*\|/, "", next_string)
            }
            else {
                break
            }

            $(++NF) = substr(string, 1, l - length(next_string) - 1)
            string = next_string
            l = length(string)
        }

        $(++NF) = string
    }

    #
    # Do anything with $<any> here e.g. $3 = "". or delete_column 3 - which deletes it and not just set it to null value.
    #

    print
}