LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   "Paste" output in Bash script not as expected... (http://www.linuxquestions.org/questions/programming-9/paste-output-in-bash-script-not-as-expected-881475/)

jb2011 05-18-2011 02:37 PM

"Paste" output in Bash script not as expected...
 
Hi.. I have several files I am trying to 'paste' together via bash script.

Input file format:
FileA.txt:

HeaderA
LineA1
LineA2
LineA3
LineA4
...

FileB.txt:

HeaderB
LineB1
LineB2
LineB3
LineB4
...

FileC.txt:

HeaderC
LineC1
LineC2
LineC3
LineC4
...

etc.

What I want is the output file to look like this:

output.txt:

HeaderA HeaderB HeaderC ...
LineA1 LineB1 LineC1 ...
LineA2 LineB2 LineC2 ...
LineA3 LineB3 LineC3 ...
LineA4 LineB4 LineC4 ...
... ... ...

So, this is my code:

>paste FileA.txt FileB.txt FileC.txt ... > output.txt

However, the result is coming out as this:

output.txt:

HeaderA HeaderB HeaderC ...
LineA1
LineB1
LineC1
...
LineA2
LineB2
LineC2
...
LineA3
LineB3
LineC3
...
LineA4
LineB4
LineC4
...

I can't figure out why it's behaving this way..
Any ideas?

Much appreciated,
J

SL00b 05-18-2011 02:54 PM

I have no idea. I created three files exactly like the examples, you provided, and here's what I got:

Code:

:~> paste fileA.txt fileB.txt fileC.txt > output.txt
:~> less output.txt
HeaderA HeaderB HeaderC
LineA1  LineB1  LineC1
LineA2  LineB2  LineC2
LineA3  LineB3  LineC3
LineA4  LineB4  LineC4
...    ...    ...

It looks to me like your example is very different from the real-world files you're working with. In your place, I'd first try it with this simplified exercise. If it works, that would validate that it's not a code problem, but a data problem, and you can start looking at your data format.

the dsc 05-18-2011 07:08 PM

This is quite curious, I didn't know of such command, I thought that in order to do that one would have to do some relatively complex script that would read one line of each file at a time, write to a file and go on proceeding this way.

And it worked for me as well.

I have no clue on why it does not work for you, the only thing I can think, a remote possibility, regards some subtle differences on windows', linux' and mac's files, something that I think that don't even exist anymore.

I don't remember details, but it used to be back in the day that you'd have to convert a windows' pure text (.txt) to linux format and vice versa, otherwise one would have no "new lines" and/or it would have extra new lines, depending on who wrote and who reads it.

But I think these days are gone now, this would no longer be an issue. But I can't really tell.

MTK358 05-18-2011 07:11 PM

Quote:

Originally Posted by the dsc (Post 4360430)
But I think these days are gone now, this would no longer be an issue. But I can't really tell.

What do you mean?

Linux and Widnows still use different line endings.

At least Mac OS uses LF newlines just like Linux since Mac OS X.

crts 05-18-2011 07:41 PM

Hi,

I tried to replicate your "error". This is the closest I got:
Code:

$ cat fileA
headerA
lineA1


lineA2


lineA3


$ cat fileB
headerB

lineB1


lineB2


lineB3

$ cat fileC
headerC


lineC1


lineC2


lineC3
$ paste fileA fileB fileC
headerA        headerB        headerC
lineA1               
        lineB1       
                lineC1
lineA2               
        lineB2       
                lineC2
lineA3               
        lineB3       
                lineC3

Notice, the positions of the blank lines in the text files.

I also ran the command with windows files on Linux. The results were still correct.
Is it possible that you are trying to 'paste' Linux files in Windows? Not sure, if Windows would handle Linux files correctly in this case.

AnanthaP 05-18-2011 10:00 PM

OP's post
Quote:

>paste FileA.txt FileB.txt FileC.txt ... > output.txt
three dots after FileC.txt

the dsc 05-18-2011 11:05 PM

Quote:

Originally Posted by MTK358 (Post 4360431)
What do you mean?

Linux and Widnows still use different line endings.

At least Mac OS uses LF newlines just like Linux since Mac OS X.

I didn't know that. I just assumed because I think I've opened a few txt files from windows over the years without noticing anything strange, but perhaps the effect is only in the other direction. Or maybe there's some workaround in some text editors like kwrite.

the dsc 05-18-2011 11:11 PM

Quote:

Originally Posted by AnanthaP (Post 4360513)
OP's post
three dots after FileC.txt

With me it just interprets three dots as another file (and the OP probably just used that to omit more files, not as an actual command) and ends in error: "paste: ...: No such file or directory"

jb2011 05-19-2011 06:49 AM

Thank you for the responses,

"With me it just interprets three dots as another file (and the OP probably just used that to omit more files, not as an actual command) and ends in error: "paste: ...: No such file or directory"


That's correct, I used the ellipses to show more than just the three files. I actually have about 20 actual data files that go in this way. As to the environment I'm using, its Bash on a Windows XP machine. Sorry, should have included that in my original post. The problem seems to lie within the format of the data files; other 'dummy' files that I created work fine using this method. I'll have to investigate some more.

Nominal Animal 05-19-2011 07:12 AM

If you have awk, you can write a paste replacement that supports any newline convention, removes leading (except on the first line) and trailing whitespace on each line, and ignores empty lines. Supply the file names separated by a pipe | at the beginning:
Code:

awk -v "files=file1|file2|fileN" -v "separator=\t" '
    BEGIN {
        RS="[\t\v\f ]*[\n\r][\t\n\v\f\r ]*"
        n=split(files, file, "|")
        while (1) {
            ok=0
            sep=""
            for (i=1; i<=n; i++) {
                if (getline field < file[i])
                    ok=1
                else
                    field=""
                printf("%s%s", sep, field)
                sep=separator
            }
            if (!ok) exit(0)
            printf("\n")
        }
    }'

The above uses TAB (\t) as the field separator, but you can change that too.

grail 05-19-2011 08:13 AM

Well I would agree that paste is the tool for the job, but Nominal got me think (as usual), although I have assumed that printing will stop once the first file's data is consumed:
Code:

#!/usr/bin/awk -f

BEGIN{
    while(getline < ARGV[1]){
        for(i=2;i<=(ARGC - 1);i++){
            getline add < ARGV[i]
            $0 = $0"\t"add
        }
        print
    }
}

The nice thing here is you can call it like so:
Code:

./script.awk file*.txt

jb2011 05-19-2011 09:11 AM

Thank you for the work-around :)

To try to figure out what is going on, I took SL00b's advice and actually created the simplified version of the input files:

Code:

fileA.txt
HeaderA
LineA1
LineA2
LineA3
LineA4

fileB.txt
HeaderB
LineB1
LineB2
LineB3
LineB4

fileC.txt
HeaderC
LineC1
LineC2
LineC3
LineC4

and when I run

Code:

>paste fileA.txt fileB.txt fileC.txt > test.txt
I get this:

Code:

HeaderA
        HeaderB
        HeaderC
LineA1
        LineB1
        LineC1
LineA2
        LineB2
        LineC2
LineA3
        LineB3
        LineC3
LineA4        LineB4        LineC4

For giggles, I put a delimiter in the paste command:
Code:

>paste -d* fileA.txt fileB.txt fileC.txt > test.txt
And as similarly before,
Code:

HeaderA
*HeaderB
*HeaderC
LineA1
*LineB1
*LineC1
LineA2
*LineB2
*LineC2
LineA3
*LineB3
*LineC3
LineA4*LineB4*LineC4

It almost looks like its trying to use the serial option of 'paste' except for the last line.
I then ran it again with the serial option:
Code:

>paste -sd* fileA.txt fileB.txt fileC.txt > test.txt
And it appears that this actually works correctly:
Code:

HeaderA
*LineA1
*LineA2
*LineA3
*LineA4
HeaderB
*LineB1
*LineB2
*LineB3
*LineB4
HeaderC
*LineC1
*LineC2
*LineC3
*LineC4

I'm really scratching my head.

SL00b 05-19-2011 09:22 AM

I'd say something is goofy with the implementation of the paste command in an XP environment.

I read your post saying you were running it in Bash on XP, so I created the same three text files on my XP desktop, ran the paste command from a Cygwin client, and received the following:

$ paste fileA.txt fileB.txt fileC.txt
HeaderA HeaderC
LineA1 LineC1
LineA2 LineC2
LineA3 LineC3
LineA4 LineC4
... ...

Same data, same command, different environment, and for some reason fileB.txt was randomly ignored. I double and triple-checked the files, and they're all the same.

Since this simplified test doesn't work, that says "code problem" to me.

konsolebox 05-19-2011 09:29 AM

in bash, you can perhaps do it like this:
Code:

#!/bin/bash

for (( ;; )); do
        C=0 A=()
        read -u 3 A\[0\] && (( C++ ))
        read -u 4 A\[1\] && (( C++ ))
        read -u 5 A\[2\] && (( C++ ))
        [[ C -eq 0 ]] && break
        echo "${A[@]}" >&6
done 3<"file1" 4<"file2" 5<"file3" 6>"outputfile"

---- edit ----

or:
Code:

until
        C=0 A=()
        read -u 3 A[0] && (( C++ ))
        read -u 4 A[1] && (( C++ ))
        read -u 5 A[2] && (( C++ ))
        [[ C -eq 0 ]]
do
        echo "${A[@]}" >&6
done 3<"file1" 4<"file2" 5<"file3" 6>"outputfile"

---- edit ----

as a complete script:
Code:

#!/bin/bash

(
        set -- . . "$@"

        for (( I = 3; I < $#; I++ )); do
                eval "exec $I<\"${!I}\""
        done

        exec >"${!#}"

        until
                C=0 A=()
                for (( I = 3; I < $#; I++ )); do
                        read -u "$I" "A[$I]" && (( C++ ))
                done
                [[ C -eq 0 ]]
        do
                echo "${A[@]}"
        done
)

Code:

bash script.sh file1 file2 ... outputfile

Chirel 05-19-2011 09:30 AM

Hi,

Try dos2unix file*.txt before the paste.

To control you can also use
Code:

paste -d'-_' fileA.txt fileB.txt fileC.txt > test.txt
You should have the result :
Code:

HeaderA-HeaderB_HeaderC
LineA1-LineB1_LineC1
LineA2-LineB2_LineC2
LineA3-LineB3_LineC3
LineA4-LineB4_LineC4



Just made the test unix2dos of my files and the result is :
Code:

$ od -c test.txt
0000000  H  e  a  d  e  r  A  \r  -  H  e  a  d  e  r  B
0000020  \r  _  H  e  a  d  e  r  C  \r  \n  L  i  n  e  A
0000040  1  \r  -  L  i  n  e  B  1  \r  _  L  i  n  e  C
0000060  1  \r  \n  L  i  n  e  A  2  \r  -  L  i  n  e  B
0000100  2  \r  _  L  i  n  e  C  2  \r  \n  L  i  n  e  A
0000120  3  \r  -  L  i  n  e  B  3  \r  _  L  i  n  e  C
0000140  3  \r  \n  L  i  n  e  A  4  \r  -  L  i  n  e  B
0000160  4  \r  _  L  i  n  e  C  4  \r  \n
$



All times are GMT -5. The time now is 11:01 AM.