LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-08-2013, 01:22 PM   #1
rohit_shinez
Member
 
Registered: Aug 2013
Posts: 40

Rep: Reputation: Disabled
How to ignore Pipe in Pipe delimited file?


Hi guys,

I need to know how i can ignore Pipe '|' if Pipe is coming as a column in Pipe delimited file

for eg:


file 1:
xx|yy|"xyz|zzz"|zzz|12...
using below awk command

awk 'BEGIN {FS=OFS="|" } print $3

i would get xyz

But i want as :

xyz|zzz to consider as whole column representing as 3rd coulmn in that file
 
Old 08-08-2013, 01:30 PM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Try backslash or double backslash. Those are usually the delimiters. I just don't know awk syntax too well, that's my guess though.

---------- Post added 08-08-13 at 02:31 PM ----------

This appears similar to your question:

http://stackoverflow.com/questions/1...elimiter-regex
 
Old 08-08-2013, 01:40 PM   #3
rohit_shinez
Member
 
Registered: Aug 2013
Posts: 40

Original Poster
Rep: Reputation: Disabled
back slash where to use i want to ignore pipe if pipe is coming as value in a column and to consider as single column while i use to print a coulmn using awk command
 
Old 08-08-2013, 01:42 PM   #4
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Nested delimiters,.. interesting. Very first result from Google:

http://stackoverflow.com/questions/5...ted-delimiters

Good use of Awk there.

Last edited by szboardstretcher; 08-08-2013 at 01:44 PM.
 
Old 08-08-2013, 02:26 PM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
awk version 4 provides a way to manage such situations. Using the internal variable FPAT you can decide how fields are defined based on regular expressions. This means you don't set a field separator, but you decide what is a field. In your example a field is everything not containing a pipe or everything inside double quotes. Here we go:
Code:
echo 'xx|yy|"xyz|zzz"|zzz|12' | awk 'BEGIN{ FPAT = "([^|]+)|(\"[^\"]+\")" }{ for ( i = 1; i <= NF; i++ ) print $i }'
xx
yy
"xyz|zzz"
zzz
12
This is explained in the GNU awk manual, here: http://www.gnu.org/software/gawk/man...ing-By-Content.
 
Old 08-08-2013, 02:41 PM   #6
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
This could be a good concept for that:
Code:
#!/usr/bin/gawk -f

BEGIN {
    OFS = "|"
}

{
    string = $0
    NF = 0

    if (length(string)) {
        while (match(string, /^"([^"]+)"\|(.*)/, temp) || match(string, /^([^|]*)\|(.*)/, temp)) {
            $(++NF) = temp[1]
            string = temp[2]
        }

        $(++NF) = string
    }

    print $3
}
Setting OFS to | is actually not necessary. And you could use other OFS as well.

Last edited by konsolebox; 08-08-2013 at 02:45 PM.
 
Old 08-09-2013, 01:19 AM   #7
rohit_shinez
Member
 
Registered: Aug 2013
Posts: 40

Original Poster
Rep: Reputation: Disabled
i will try with above one guys but wat i actually needed is i am having a file with | seperated in which i need to search char in 3rd column and replace with null. i need to replace only the coulmn where character occurs in 3rd field
for eg:


Code:
file1.txt
xx|yy|xx|12

output file:
xx|yy||12

the above one i achieved with this below code
awk 'BEGIN {FS=OFS="|" } $3 ~ /[[:alnum:]]/ { $3="" }1' file

but wat i faced is if there is any column having pipe that should consider as single column

xx|yy|"xyz|xx"|AAA|12...

not i should achieve my requirement like this

xx|yy|"xyz|xx"||12

now AAA should replace with null considering as AAA as 4th column if use

awk 'BEGIN {FS=OFS="|" } $4 ~ /[[:alnum:]]/ { $4="" }1' file
 
Old 08-09-2013, 06:02 AM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by rohit_shinez View Post
i will try with above one guys but wat i actually needed is i am having a file with | seperated in which i need to search char in 3rd column and replace with null. i need to replace only the coulmn where character occurs in 3rd field
for eg:


Code:
file1.txt
xx|yy|xx|12

output file:
xx|yy||12

the above one i achieved with this below code
awk 'BEGIN {FS=OFS="|" } $3 ~ /[[:alnum:]]/ { $3="" }1' file

but wat i faced is if there is any column having pipe that should consider as single column

xx|yy|"xyz|xx"|AAA|12...

not i should achieve my requirement like this

xx|yy|"xyz|xx"||12

now AAA should replace with null considering as AAA as 4th column if use

awk 'BEGIN {FS=OFS="|" } $4 ~ /[[:alnum:]]/ { $4="" }1' file
I want to write this post tactfully and respectfully. I realize that English is not your first language. Your post (quoted above) is confusing. Reword it carefully -- get help from a friend if necessary. Strive for clarity. Give more than two examples of input strings and the corresponding desired output strings.

Daniel B. Martin

Last edited by danielbmartin; 08-09-2013 at 06:03 AM.
 
Old 08-09-2013, 07:19 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
colucix's solution will work with what you need to do.
 
Old 08-09-2013, 10:45 AM   #10
rohit_shinez
Member
 
Registered: Aug 2013
Posts: 40

Original Poster
Rep: Reputation: Disabled
Hi Martin,

let me be clear with my requirements

for eg:
input file1.txt

xx|yy|"abc|xyz"|zz|12 .. .... ...

output file:

xx|yy|"abc|xyz"||12 .. .... ....

i want to replace the fourth column of file1.txt with space where 4th column will be alphanumeric value and also to consider zz value as fourth column instead of 5th column

awk 'BEGIN {FS=OFS="|" } $4 ~ /[[:alnum:]]/ { $4="" }1' file

i have achieved my requirement of replacing the column with space by below code but its not considering zz value as 4th column instead its replacing xyz as space since third coloumn i.e ""abc|xyz" is seperated by Pipe delimted
 
Old 08-09-2013, 11:41 AM   #11
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Can you tell us what version of awk you're using? Most of the solutions provided here already gives what you want to do. Only some minor modifications are needed.
 
Old 08-09-2013, 12:38 PM   #12
rohit_shinez
Member
 
Registered: Aug 2013
Posts: 40

Original Poster
Rep: Reputation: Disabled
nawk is the version under solaris OS
 
Old 08-09-2013, 12:48 PM   #13
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
this seems related (comma inside of feild of csv file):
http://www.linuxquestions.org/questi...0/#post5001726
 
Old 08-09-2013, 11:19 PM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
For future reference, you should include that you are working on Solaris as it is quite a different beast from linux and often has a smaller / different application set.

I have not tested konsolebox's solution, but the one from colucix will not work in nawk.

You could also look at Perl or Ruby if they are options.
 
Old 08-10-2013, 12:28 AM   #15
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Mine won't work with it as well. The array-generation of match() is an extension of gnu.

I tried to give a solution with this. This works but implementation in other awks compared to GNU awk is slower since when altering $x and NF they regenerate $0 right away.

Code:
#!/usr/bin/awk -f

BEGIN {
    OFS = "|"
}

function delete_column(i) {
    j = 0
    for (k = 1; k <= NF; ++k) {
        if (k == i) {
            ++j
        } else if (j) {
            $(k - j) = $k
        }
    }
    NF -= j
}

{
    string = $0
    NF = 0

    if (l = length(string)) {
        for (;;) {
            if (match(string, /^"[^"]+"\|/)) {
                next_string = string
                sub(/^"[^"]+"\|/, "", next_string)
            }
            else if (match(string, /^[^|]*\|/)) {
                next_string = string
                sub(/^[^|]*\|/, "", next_string)
            }
            else {
                break
            }

            $(++NF) = substr(string, 1, l - length(next_string) - 1)
            string = next_string
            l = length(string)
        }

        $(++NF) = string
    }

    #
    # Do anything with $<any> here e.g. $3 = "". or delete_column 3 - which deletes it and not just set it to null value.
    #

    print
}

Last edited by konsolebox; 08-10-2013 at 01:19 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk with pipe delimited file (specific column matching and multiple pattern matching) lolmon Programming 4 08-31-2011 12:17 PM
[SOLVED] Move line of pipe delimited flat file if field 27 = sold Radical-Rick Programming 4 09-27-2010 06:59 PM
Help needed in removing intermediate segments from a pipe delimited segment file naren_0101bits Programming 12 12-03-2007 10:47 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:33 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration