LinuxQuestions.org - Help with awk

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Help with awk (https://www.linuxquestions.org/questions/linux-newbie-8/help-with-awk-4175643221/)

Hi Team

I have a file with below info

Code:

 COL1  | COL2 | COL3

----------------------

A1    | 98  | P

A1    | 98  | P

A1    | 98  | P

B1    | 98  | P

B1    | 98  | P

B1    | 98  | P

C1    | 98  | P

C1    | 98  | P

C1    | 98  | P



need to convert and the awk/sed need to be applied on col1 only 





 COL1  | COL2 | COL3

----------------------

A1    | 98  | P

      | 98  | P

      | 98  | P

B1    | 98  | P

      | 98  | P

      | 98  | P

C1    | 98  | P

      | 98  | P

      | 98  | P

tried this

awk '!x[$1]++' file <-- is removing whole line

You don't provide a very thorough description of your task, so that I have to make a few assumptions.

1. Input: Entries are collated, perhaps also sorted according to col1. That is, all A1 entries are kept together, all B1 entries etc.
2. Output: Essentially identical to the input, except that each col1 values appears only once.

In this case, I would write an awk program that checks whether the value in col1 has changed. When it detects a change, it prints col1. Otherwise, it doesn't, but prints all other fields.

Here is a possible fragment:

Code:

$1 != previous_col1 { printf $1 }            # value in col1 changed

                    { previous_col1 = $1      # remember current col1 value

                      for (col=2;col<=$NF;col++)  # print remaining columns

                          printf $col " " 

                    }

Disclaimers: I am sure there are more elegant ways to solve the problem. This is just a suggestion and hasn't been tested. I leave the pretty formatting as an exercise for the reader.

EDIT: Another solution is using the sub() function to replace the A1, B1 etc by a string of blanks. This way, you don't have to worry about re-creating the pretty formatting.

Sorry for missing the detail info

I have a file which was provided , Col1/Field 1 will always have duplicate data the rest of the fields may/may not but am not worried of other columns, all I need is if there is duplicate data it need to be printed with empty space and the values of Field 1 will be ordered so no values repeats further down the rows.

Hope this is clear.

I tried your code - but I do see errors. - can you let me know where I made mistake

$ cat /tmp/A1|awk '$1 != previous_col1 { printf $1 } { previous_col1 = $1 for (col=2;col<=$NF;col++) printf $col " " }'
awk: syntax error at source line 1
context is
$1 != previous_col1 { printf $1 } { previous_col1 = $1 >>> for <<< (col=2;col<=$NF;col++) printf $col " " }
awk: illegal statement at source line 1

I have to say there is excellent documentation on awk. Is this homework?

no homework
writing a script struck at the end/final part.

tried to do my best as said am a new bee so is asking for help on how to achieve

Quote:

Originally Posted by maddyfreaks (Post 5930887)

awk: syntax error at source line 1
context is
$1 != previous_col1 { printf $1 } { previous_col1 = $1 >>> for <<< (col=2;col<=$NF;col++) printf $col " " }
awk: illegal statement at source line 1

The error message does its best to mark the location of the error. The for statement must either be on a separate line or separated by a semicolon.

I agree that the awk user guide is pretty good, and that there are many tutorials out there that help you come up to this level of awk programming. Its worthwhile investing a few hours to learn this tool.

It would help if you were to use [code] [/code] tags when posting scripts. There was an extraneous dollar sign changing how the NF field was being used in the for loop, and a missing output field separator:

Code:

#!/usr/bin/awk -f



$1 != previous_col1 {

        printf $1

}

{ 

        previous_col1 = $1

        printf OFS

        for (col=2;col<=NF;col++) {

                printf $col OFS

        }

        printf ("\n")

}

Please look at AWK's manual page and find the many mentions of NF and how it can be used as an indirect reference (or not).

You can play with how to fix up the column alignments, but you can simply do:

Code:

awk '{if($1 == prev)$1 = "";else prev = $1}1' file

Quote:

Originally Posted by Turbocapitalist (Post 5930929)

There was an extraneous dollar sign changing how the NF field was being used in the for loop

which I added to make the task a little more interesting. Thanks for spotting it.