LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-01-2018, 12:28 AM   #1
azurite
LQ Newbie
 
Registered: May 2016
Posts: 29

Rep: Reputation: Disabled
text processing file using awk


Hello,

I am learning awk and would like some help performing specific actions on a file.

I have File A as follows: it's a list of id's in the 1st column, with x's or no x's in subsequent columns. I thin k it is either tab or space delimited.
Code:
101939-200   X X X X X
108409-100   X X     X 
108674-100   X X         X
110007-100_2   X X 
110056-100     
104714-100   X X 
109556-100   X X 
109937-100   X X 
109556-100_2   X X 
109937-100_2   X X 
107990-100_2  X X X 
105762-100   X X
I would like to get an output file that will (1) sort by column first, specifically if there's an X or not in that specified column and (2) also echo ID 'is complete' for column # if there is an X for that column and ID 'is incomplete' for column # if there's no X. So if I sort by Column 3, it would print something like below.
Code:
101939-200 is complete 
107990-100_2 is complete 
108409-100   is incomplete
108674-100   is incomplete
110007-100_2   is incomplete
110056-100    is incomplete
104714-100   is incomplete
109556-100   is incomplete
109937-100   is incomplete 
109556-100_2   is incomplete 
109937-100_2   is incomplete
105762-100   is incomplete
I'd like to have a code that I can reuse in the future by changing the column # and message echo'd depending on the situation. I know I can compare two files using awk but I don't know how to go about writing something for I want to do above.

========
the second part of my question is how can I keep on filtering by columns. Basically print lines if there's an X in column 2, then of those lines print the lines that have an X in column 3. Then of the lines remaining, print lines that have an X in column 4 and so forth. Also, I would like it look for both X and x. If I can achieve this using awk, that would be great. If there's a script that will do all of this, I'm also open to that option.


Thank you in advance.

Last edited by azurite; 01-01-2018 at 12:39 AM.
 
Old 01-01-2018, 12:58 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,164

Rep: Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628Reputation: 2628
Seems you'd better get hold of some documentation - I like the "GNU awk users guide" from the website. Others may find it a bit too much of a reference manual, rather than a learning resource.
Others on LQ will have other (better ?) recommendations.

You need to understand how fields are referenced and processed. Simple but necessary starting point. The document I referenced above has a "getting started" section that will help.
 
Old 01-01-2018, 03:11 AM   #3
hazel
Senior Member
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: Debian, Crux, LFS, AntiX
Posts: 2,354
Blog Entries: 5

Rep: Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039
By default awk treats strings that are separated by any kind of whitespace as fields. They are called $1, $2, etc. So you need to set up an instruction that checks for an X in the field you are interested in, then prints out $1 followed by the appropriate diagnostic string.

The awk command can accept variable assignments as arguments using the syntax 'name = "string_value"'. So running with an argument like 'fieldno="3"' and a match with $(fieldno) should do it.

Here's one I did earlier:
Code:
awk '$col ~ /hazel/' /etc/passwd "col=1" 
hazel:x:1000:100::/home/hazel:/bin/bash
 
1 members found this post helpful.
Old 01-01-2018, 10:42 AM   #4
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,048

Rep: Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798
Quote:
Originally Posted by hazel View Post
By default awk treats strings that are separated by any kind of whitespace as fields. They are called $1, $2, etc. So you need to set up an instruction that checks for an X in the field you are interested in, then prints out $1 followed by the appropriate diagnostic string.
It's not going to be that simple. An empty column just becomes part of the "any kind of whitespace" leading to the next field. As far as awk is concerned, the following two lines have exactly the same fields following the ID field:
Code:
108409-100   X X     X
108674-100   X X         X
Both have four fields. The amount of whitespace between fields is irrelevant. If all you cared about was "complete" vs. "incomplete" you could just check the number of fields, which is available in the awk variable NF. However, that's not going to tell you which are the missing "X"s.

The situation is further complicated by the ID field having variable width, so you can't just use FIELDWIDTHS to process the line as fixed-width fields. You'll need to strip off the characters of the ID field, then process the rest of the line character-by-character to see where there is or is not an "X".
Code:
{
    data = substr($0, length($1))
    nfields = split(data, fields, "")
Now the "fields" array has an element for each column that followed the ID field, with each odd-numbered column being a separator character. You can look at the even-numbered columns and check for an "X".
Code:
    for (n = 2; n <= nfields; n += 2)
        if (fields[n] == "X") {
            .
            .
            .
        }
        else {
            .
            .
            .
        }
    }
}
I haven't tested the above, so there may be errors. Yes, there's probably a more elegant way to do that.

Last edited by rknichols; 01-01-2018 at 10:46 AM. Reason: Now there's one less error
 
Old 01-01-2018, 06:18 PM   #5
MadeInGermany
Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 934

Rep: Reputation: 403Reputation: 403Reputation: 403Reputation: 403Reputation: 403
Quote:
Originally Posted by hazel View Post
The awk command can accept variable assignments as arguments using the syntax 'name = "string_value"'. So running with an argument like 'fieldno="3"' and a match with $(fieldno) should do it.

Here's one I did earlier:
Code:
awk '$col ~ /hazel/' /etc/passwd "col=1" 
hazel:x:1000:100::/home/hazel:/bin/bash
The example is not correct, or at least is misleading.
The arguments are processed in sequence. The col=1 must happen before the file /etc/passwd is read.
Otherwise it is unset, zero in number context, so $0 (the full line) is matched against "hazel".
Further, if the col=1 works, the field separator FS must be set to a colon.
Correct is
Code:
awk -F: '$col=="hazel"' col=1 /etc/passwd
Or
Code:
awk -vcol=1 -F: '$col=="hazel"' /etc/passwd
 
Old 01-01-2018, 09:21 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,628

Rep: Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943
If you set FS (either in BEGIN or with -F switch) then the separator is not treated as greedy. Therefore, if data is separated by tabs, each tab would indicate a new field and you can then
specify if a field is empty
 
Old 01-02-2018, 04:55 AM   #7
MadeInGermany
Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 934

Rep: Reputation: 403Reputation: 403Reputation: 403Reputation: 403Reputation: 403
Allowing the first field separator to be a whitespace, and all others one tab or space character
Code:
awk '
{
  data = substr($0, length($1))
  sub(/^[[:blank:]]+/, "", data)
  nfields = split(data, fields, "[[:blank:]]")
  for (n = 1; n <= nfields; n++)
    if (fields[n] == "X") {
      .
      .
      .
    } else {
      .
      .
      .
    }
}
' FileA

Last edited by MadeInGermany; 01-02-2018 at 04:57 AM.
 
Old 01-02-2018, 09:30 AM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,048

Rep: Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798Reputation: 1798
Quote:
Originally Posted by MadeInGermany View Post
Allowing the first field separator to be a whitespace, and all others one tab or space character
That mishandles the case where the first "X" is missing.
 
1 members found this post helpful.
Old 01-02-2018, 01:04 PM   #9
MadeInGermany
Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 934

Rep: Reputation: 403Reputation: 403Reputation: 403Reputation: 403Reputation: 403
Ah yes thanks.
Correction, delete one space (a 2nd space means a missing X)
Code:
  sub(/^[[:blank:]]/, "", data)

Last edited by MadeInGermany; 01-02-2018 at 01:06 PM.
 
Old 01-03-2018, 12:26 AM   #10
azurite
LQ Newbie
 
Registered: May 2016
Posts: 29

Original Poster
Rep: Reputation: Disabled
Wow, thank you everyone that replied! Now, it makes sense why I was getting strange results. However, I'm very very new to this so could you guys explain a little bit more what I have to change or input in your respective codes to check for what I want to do? I'm getting a bit lost trying to figure it out the whole leading spaces/tabs etc thing. I would really appreciate the guidance.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] text processing -- awk with regular expression danielbmartin Programming 10 03-17-2015 05:02 PM
Text processing with awk master-of-puppets Programming 2 09-27-2014 10:23 AM
again stucked with text processing (sed/awk/perl), copy the line and change rahmathullakm Programming 4 01-19-2009 02:53 PM
Text substitution and processing with sed and awk shanecraddock@gmail.com Linux - Newbie 1 12-18-2008 12:34 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration