[SOLVED] manipulating variable columns in shell or in perl

baidym · 09-07-2009, 03:14 PM

Hello all,

Having an ongoing battle with manipulating a string of numbers. I need to multiply all columns except 1 & 2 by 100 and output all columns except 2 in whole numbers as below.

If I have a file with
FNUM,SET,C1N,C2N,C3N,C1S,C2S,C3S,
4535, 109, 5.0709, 5.1546, 5.2002, 304.4215, 315.4393, 299.0198,
4536, 109, 5.1311, 5.2059, 5.2861, 282.5050, 295.5363, 288.6789,
4537, 109, 4.7416, 4.9326, 5.1422, 305.3368, 316.0573, 297.5717,

and I want an output in the form
FNUM C1N C2N C3N C1S C2S C3S
4535 507 515 520 30442 31543 29901
4536 513 520 528 28250 29553 28867
4537 474 493 514 30533 31605 29757

I can use

Quote:

in csh:
awk '{if(NR!=1) printf("%-6d%-6d%-6d%-6d\t%-6d%-6d%-6d\n",$1,($3*100),($4*100),($5*100),($6*100),($7*100),($8*100))}' $rfile > ${set}_tmp

or in perl:
system "awk \'{if(NR!=1) printf(\"%-6d%-6d%-6d%-6d\t%-6d%-6d%-6d\\n\",\$1,(\$3*100),(\$4*100),(\$5*100),(\$6*100),(\$7*100),(\$8*100))}\' $file > ${seq}_rtmp";

The problem is, every time I have a file with a different number of C values I have to change the script.
I want to be able to be able to use the script to do the same thing regardless of the number of columns without having to change the script every time. So if i had a file with 4 "C" values:

FNUM,SET,C1N,C2N,C1S,C2S,
4535, 109, 5.0709, 5.1546, 304.4215, 315.4393,
4536, 109, 5.1311, 5.2059, 282.5050, 295.5363,
4537, 109, 4.7416, 4.9326, 305.3368, 316.0573,

I could use the same script and get:
FNUM C1 C2 C1 C2
4535 507 515 30442 31543
4536 513 520 28250 29553
4537 474 493 30533 31605

Can anyone show me how to do this in shell or in perl?
Many thanks,
M.

colucix · 09-07-2009, 03:38 PM

In awk the NF built-in variable stores the number of fields in every line, independently from the actual length of the line itself. Hence you can print any number of "C fields" using the same criteria, that is the very same awk script:

Code:

BEGIN { FS = "," }

NR == 1 {
  printf "%-6s", $1
  for ( i = 3; i <= NF-1; i++)
     printf "%-6s", $i
  print ""
}

NR > 1 {
  printf "%-6d", $1
  for ( i = 3; i <= NF-1; i++)
     printf "%-6d", $i * 100
  print ""  
}

The code above matches exactly your requirements, given you want to preserve the header in the output file and the fact that every line terminates with a comma.

catkin · 09-07-2009, 04:56 PM

And, less elegantly than colucix's awk, here's a bash solution

Code:

#!/bin/bash
shopt -s extglob
line_no=0
while read line
do
    let line_no++
    output=''
    if [[ "$line_no" -eq 1 ]]; then
        # Parse the headings into an array
        IFS=',' headings=( $line )
        last_col_no="${#headings[*]}"
        for (( i = 0; i <= $last_col_no; i++ ))
        do  
            if [[ "$i" -ne 1 ]]; then  # Skip second column
                output="$output ${headings[$i]}"
            fi  
        done
        echo "${output# }" > output.txt
    else
        # Parse the numbers into an array
        IFS=',' numbers=( $line )
        for (( i = 0; i <= $last_col_no; i++ ))
        do  
            case $i in
                0 ) 
                    # First column: number is unchanged
                    number="${numbers[$i]##*( )}" 
                    output="$output $number"
                    ;;  
                1 ) 
                    # Second column: skip
                    ;;  
                * ) 
                    # Other columns: multiply number 100 and truncate to integer
                    number="${numbers[$i]##*( )}" 
                    if [[ "$number" != '' ]]; then  # Skip any empty columns
                        number="$( echo "$number * 100 / 1" | /usr/bin/bc )"
                    fi  
                    output="$output $number"
            esac
        done
        echo "${output##*( )}" >> output.txt
    fi  
done < input.txt

EDIT:

Code:

IFS=',' numbers=( $line )

is dangerous; it leaves IFS set to ",". See this post for an explanation.

baidym · 09-08-2009, 09:30 AM

Thanks for the replies.

How can I use the awk in a c shell script? If I use:

Quote:

#!/bin/csh

awk 'BEGIN { FS = "," }

NR == 1 {
printf "%-6s", $1
for ( i = 3; i <= NF-1; i++)
printf "%-6s", $i
print ""
}

NR > 1 {
printf "%-6d", $1
for ( i = 3; i <= NF-1; i++)
printf "%-6d", $i * 100
print ""
}'

it comes back with unmatched '. Will I need to specify the infile within that string?

Thanks,
M

colucix · 09-08-2009, 09:43 AM

C-shell does not interprets unmatched quotes as "continue to the next line until the closing quote" like bash do. You have to explicitly put the continuation character at the end of each line:

Code:

#!/bin/csh
awk 'BEGIN { FS = "," } \
\
NR == 1 { \
  printf "%-6s", $1 \
  for ( i = 3; i <= NF-1; i++ ) \
     printf "%-6s", $i \
  print "" \
} \
\
NR > 1 { \
  printf "%-6d", $1 \
  for ( i = 3; i <= NF-1; i++ ) \
     printf "%-6d", $i * 100 \
  print "" \
}' file

The input file has to be specified in the same way as one-line awk commands: put it as argument at the end of the last line (see "file" above).

baidym · 09-08-2009, 06:02 PM

Many thanks colucix!!