LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Help with awk (https://www.linuxquestions.org/questions/linux-newbie-8/help-with-awk-4175643221/)

maddyfreaks 11-27-2018 05:44 PM

Help with awk
 
Hi Team

I have a file with below info
Code:

COL1  | COL2 | COL3
----------------------
A1    | 98  | P
A1    | 98  | P
A1    | 98  | P
B1    | 98  | P
B1    | 98  | P
B1    | 98  | P
C1    | 98  | P
C1    | 98  | P
C1    | 98  | P

need to convert and the awk/sed need to be applied on col1 only


 COL1  | COL2 | COL3
----------------------
A1    | 98  | P
      | 98  | P
      | 98  | P
B1    | 98  | P
      | 98  | P
      | 98  | P
C1    | 98  | P
      | 98  | P
      | 98  | P


tried this

awk '!x[$1]++' file <-- is removing whole line

berndbausch 11-27-2018 06:58 PM

You don't provide a very thorough description of your task, so that I have to make a few assumptions.

1. Input: Entries are collated, perhaps also sorted according to col1. That is, all A1 entries are kept together, all B1 entries etc.
2. Output: Essentially identical to the input, except that each col1 values appears only once.

In this case, I would write an awk program that checks whether the value in col1 has changed. When it detects a change, it prints col1. Otherwise, it doesn't, but prints all other fields.

Here is a possible fragment:
Code:

$1 != previous_col1 { printf $1 }            # value in col1 changed
                    { previous_col1 = $1      # remember current col1 value
                      for (col=2;col<=$NF;col++)  # print remaining columns
                          printf $col " "
                    }

Disclaimers: I am sure there are more elegant ways to solve the problem. This is just a suggestion and hasn't been tested. I leave the pretty formatting as an exercise for the reader.

EDIT: Another solution is using the sub() function to replace the A1, B1 etc by a string of blanks. This way, you don't have to worry about re-creating the pretty formatting.

maddyfreaks 11-27-2018 08:27 PM

Sorry for missing the detail info

I have a file which was provided , Col1/Field 1 will always have duplicate data the rest of the fields may/may not but am not worried of other columns, all I need is if there is duplicate data it need to be printed with empty space and the values of Field 1 will be ordered so no values repeats further down the rows.

Hope this is clear.

I tried your code - but I do see errors. - can you let me know where I made mistake

$ cat /tmp/A1|awk '$1 != previous_col1 { printf $1 } { previous_col1 = $1 for (col=2;col<=$NF;col++) printf $col " " }'
awk: syntax error at source line 1
context is
$1 != previous_col1 { printf $1 } { previous_col1 = $1 >>> for <<< (col=2;col<=$NF;col++) printf $col " " }
awk: illegal statement at source line 1

AwesomeMachine 11-27-2018 08:48 PM

I have to say there is excellent documentation on awk. Is this homework?

maddyfreaks 11-27-2018 09:26 PM

no homework
writing a script struck at the end/final part.

tried to do my best as said am a new bee so is asking for help on how to achieve

berndbausch 11-27-2018 10:32 PM

Quote:

Originally Posted by maddyfreaks (Post 5930887)
awk: syntax error at source line 1
context is
$1 != previous_col1 { printf $1 } { previous_col1 = $1 >>> for <<< (col=2;col<=$NF;col++) printf $col " " }
awk: illegal statement at source line 1

The error message does its best to mark the location of the error. The for statement must either be on a separate line or separated by a semicolon.

I agree that the awk user guide is pretty good, and that there are many tutorials out there that help you come up to this level of awk programming. Its worthwhile investing a few hours to learn this tool.

Turbocapitalist 11-27-2018 10:33 PM

It would help if you were to use [code] [/code] tags when posting scripts. There was an extraneous dollar sign changing how the NF field was being used in the for loop, and a missing output field separator:

Code:

#!/usr/bin/awk -f

$1 != previous_col1 {
        printf $1
}
{
        previous_col1 = $1
        printf OFS
        for (col=2;col<=NF;col++) {
                printf $col OFS
        }
        printf ("\n")
}

Please look at AWK's manual page and find the many mentions of NF and how it can be used as an indirect reference (or not).

grail 11-28-2018 12:22 AM

You can play with how to fix up the column alignments, but you can simply do:
Code:

awk '{if($1 == prev)$1 = "";else prev = $1}1' file

berndbausch 11-28-2018 12:55 AM

Quote:

Originally Posted by Turbocapitalist (Post 5930929)
There was an extraneous dollar sign changing how the NF field was being used in the for loop

which I added to make the task a little more interesting. Thanks for spotting it.


All times are GMT -5. The time now is 10:36 PM.