LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-18-2011, 02:37 PM   #1
jb2011
LQ Newbie
 
Registered: May 2011
Posts: 3

Rep: Reputation: Disabled
"Paste" output in Bash script not as expected...


Hi.. I have several files I am trying to 'paste' together via bash script.

Input file format:
FileA.txt:

HeaderA
LineA1
LineA2
LineA3
LineA4
...

FileB.txt:

HeaderB
LineB1
LineB2
LineB3
LineB4
...

FileC.txt:

HeaderC
LineC1
LineC2
LineC3
LineC4
...

etc.

What I want is the output file to look like this:

output.txt:

HeaderA HeaderB HeaderC ...
LineA1 LineB1 LineC1 ...
LineA2 LineB2 LineC2 ...
LineA3 LineB3 LineC3 ...
LineA4 LineB4 LineC4 ...
... ... ...

So, this is my code:

>paste FileA.txt FileB.txt FileC.txt ... > output.txt

However, the result is coming out as this:

output.txt:

HeaderA HeaderB HeaderC ...
LineA1
LineB1
LineC1
...
LineA2
LineB2
LineC2
...
LineA3
LineB3
LineC3
...
LineA4
LineB4
LineC4
...

I can't figure out why it's behaving this way..
Any ideas?

Much appreciated,
J
 
Old 05-18-2011, 02:54 PM   #2
SL00b
Member
 
Registered: Feb 2011
Location: LA, US
Distribution: SLES
Posts: 375

Rep: Reputation: 111Reputation: 111
I have no idea. I created three files exactly like the examples, you provided, and here's what I got:

Code:
:~> paste fileA.txt fileB.txt fileC.txt > output.txt
:~> less output.txt
HeaderA HeaderB HeaderC
LineA1  LineB1  LineC1
LineA2  LineB2  LineC2
LineA3  LineB3  LineC3
LineA4  LineB4  LineC4
...     ...     ...
It looks to me like your example is very different from the real-world files you're working with. In your place, I'd first try it with this simplified exercise. If it works, that would validate that it's not a code problem, but a data problem, and you can start looking at your data format.
 
Old 05-18-2011, 07:08 PM   #3
the dsc
Member
 
Registered: May 2009
Distribution: Debian
Posts: 136
Blog Entries: 71

Rep: Reputation: 33
This is quite curious, I didn't know of such command, I thought that in order to do that one would have to do some relatively complex script that would read one line of each file at a time, write to a file and go on proceeding this way.

And it worked for me as well.

I have no clue on why it does not work for you, the only thing I can think, a remote possibility, regards some subtle differences on windows', linux' and mac's files, something that I think that don't even exist anymore.

I don't remember details, but it used to be back in the day that you'd have to convert a windows' pure text (.txt) to linux format and vice versa, otherwise one would have no "new lines" and/or it would have extra new lines, depending on who wrote and who reads it.

But I think these days are gone now, this would no longer be an issue. But I can't really tell.

Last edited by the dsc; 05-18-2011 at 07:11 PM.
 
Old 05-18-2011, 07:11 PM   #4
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
Quote:
Originally Posted by the dsc View Post
But I think these days are gone now, this would no longer be an issue. But I can't really tell.
What do you mean?

Linux and Widnows still use different line endings.

At least Mac OS uses LF newlines just like Linux since Mac OS X.
 
Old 05-18-2011, 07:41 PM   #5
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,

I tried to replicate your "error". This is the closest I got:
Code:
$ cat fileA
headerA
lineA1


lineA2


lineA3


$ cat fileB
headerB

lineB1


lineB2


lineB3

$ cat fileC
headerC


lineC1


lineC2


lineC3
$ paste fileA fileB fileC
headerA	headerB	headerC
lineA1		
	lineB1	
		lineC1
lineA2		
	lineB2	
		lineC2
lineA3		
	lineB3	
		lineC3
Notice, the positions of the blank lines in the text files.

I also ran the command with windows files on Linux. The results were still correct.
Is it possible that you are trying to 'paste' Linux files in Windows? Not sure, if Windows would handle Linux files correctly in this case.
 
Old 05-18-2011, 10:00 PM   #6
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 627

Rep: Reputation: 137Reputation: 137
OP's post
Quote:
>paste FileA.txt FileB.txt FileC.txt ... > output.txt
three dots after FileC.txt

Last edited by AnanthaP; 05-18-2011 at 10:01 PM.
 
Old 05-18-2011, 11:05 PM   #7
the dsc
Member
 
Registered: May 2009
Distribution: Debian
Posts: 136
Blog Entries: 71

Rep: Reputation: 33
Quote:
Originally Posted by MTK358 View Post
What do you mean?

Linux and Widnows still use different line endings.

At least Mac OS uses LF newlines just like Linux since Mac OS X.
I didn't know that. I just assumed because I think I've opened a few txt files from windows over the years without noticing anything strange, but perhaps the effect is only in the other direction. Or maybe there's some workaround in some text editors like kwrite.
 
Old 05-18-2011, 11:11 PM   #8
the dsc
Member
 
Registered: May 2009
Distribution: Debian
Posts: 136
Blog Entries: 71

Rep: Reputation: 33
Quote:
Originally Posted by AnanthaP View Post
OP's post
three dots after FileC.txt
With me it just interprets three dots as another file (and the OP probably just used that to omit more files, not as an actual command) and ends in error: "paste: ...: No such file or directory"
 
Old 05-19-2011, 06:49 AM   #9
jb2011
LQ Newbie
 
Registered: May 2011
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thank you for the responses,

"With me it just interprets three dots as another file (and the OP probably just used that to omit more files, not as an actual command) and ends in error: "paste: ...: No such file or directory"


That's correct, I used the ellipses to show more than just the three files. I actually have about 20 actual data files that go in this way. As to the environment I'm using, its Bash on a Windows XP machine. Sorry, should have included that in my original post. The problem seems to lie within the format of the data files; other 'dummy' files that I created work fine using this method. I'll have to investigate some more.
 
Old 05-19-2011, 07:12 AM   #10
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
If you have awk, you can write a paste replacement that supports any newline convention, removes leading (except on the first line) and trailing whitespace on each line, and ignores empty lines. Supply the file names separated by a pipe | at the beginning:
Code:
awk -v "files=file1|file2|fileN" -v "separator=\t" '
    BEGIN {
        RS="[\t\v\f ]*[\n\r][\t\n\v\f\r ]*"
        n=split(files, file, "|")
        while (1) {
            ok=0
            sep=""
            for (i=1; i<=n; i++) {
                if (getline field < file[i])
                    ok=1
                else
                    field=""
                printf("%s%s", sep, field)
                sep=separator
            }
            if (!ok) exit(0)
            printf("\n")
        }
    }'
The above uses TAB (\t) as the field separator, but you can change that too.
 
Old 05-19-2011, 08:13 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,513

Rep: Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895
Well I would agree that paste is the tool for the job, but Nominal got me think (as usual), although I have assumed that printing will stop once the first file's data is consumed:
Code:
#!/usr/bin/awk -f

BEGIN{
    while(getline < ARGV[1]){
        for(i=2;i<=(ARGC - 1);i++){
            getline add < ARGV[i]
            $0 = $0"\t"add
        }
        print
    }
}
The nice thing here is you can call it like so:
Code:
./script.awk file*.txt
 
Old 05-19-2011, 09:11 AM   #12
jb2011
LQ Newbie
 
Registered: May 2011
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thank you for the work-around

To try to figure out what is going on, I took SL00b's advice and actually created the simplified version of the input files:

Code:
fileA.txt
HeaderA
LineA1
LineA2
LineA3
LineA4

fileB.txt
HeaderB
LineB1
LineB2
LineB3
LineB4

fileC.txt
HeaderC
LineC1
LineC2
LineC3
LineC4
and when I run

Code:
>paste fileA.txt fileB.txt fileC.txt > test.txt
I get this:

Code:
HeaderA
	HeaderB
	HeaderC
LineA1
	LineB1
	LineC1
LineA2
	LineB2
	LineC2
LineA3
	LineB3
	LineC3
LineA4	LineB4	LineC4
For giggles, I put a delimiter in the paste command:
Code:
>paste -d* fileA.txt fileB.txt fileC.txt > test.txt
And as similarly before,
Code:
HeaderA
*HeaderB
*HeaderC
LineA1
*LineB1
*LineC1
LineA2
*LineB2
*LineC2
LineA3
*LineB3
*LineC3
LineA4*LineB4*LineC4
It almost looks like its trying to use the serial option of 'paste' except for the last line.
I then ran it again with the serial option:
Code:
>paste -sd* fileA.txt fileB.txt fileC.txt > test.txt
And it appears that this actually works correctly:
Code:
HeaderA
*LineA1
*LineA2
*LineA3
*LineA4
HeaderB
*LineB1
*LineB2
*LineB3
*LineB4
HeaderC
*LineC1
*LineC2
*LineC3
*LineC4
I'm really scratching my head.
 
Old 05-19-2011, 09:22 AM   #13
SL00b
Member
 
Registered: Feb 2011
Location: LA, US
Distribution: SLES
Posts: 375

Rep: Reputation: 111Reputation: 111
I'd say something is goofy with the implementation of the paste command in an XP environment.

I read your post saying you were running it in Bash on XP, so I created the same three text files on my XP desktop, ran the paste command from a Cygwin client, and received the following:

$ paste fileA.txt fileB.txt fileC.txt
HeaderA HeaderC
LineA1 LineC1
LineA2 LineC2
LineA3 LineC3
LineA4 LineC4
... ...

Same data, same command, different environment, and for some reason fileB.txt was randomly ignored. I double and triple-checked the files, and they're all the same.

Since this simplified test doesn't work, that says "code problem" to me.
 
Old 05-19-2011, 09:29 AM   #14
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 15

Rep: Reputation: 233Reputation: 233Reputation: 233
in bash, you can perhaps do it like this:
Code:
#!/bin/bash

for (( ;; )); do
	C=0 A=()
	read -u 3 A\[0\] && (( C++ ))
	read -u 4 A\[1\] && (( C++ ))
	read -u 5 A\[2\] && (( C++ ))
	[[ C -eq 0 ]] && break
	echo "${A[@]}" >&6
done 3<"file1" 4<"file2" 5<"file3" 6>"outputfile"
---- edit ----

or:
Code:
until
	C=0 A=()
	read -u 3 A[0] && (( C++ ))
	read -u 4 A[1] && (( C++ ))
	read -u 5 A[2] && (( C++ ))
	[[ C -eq 0 ]]
do
	echo "${A[@]}" >&6
done 3<"file1" 4<"file2" 5<"file3" 6>"outputfile"
---- edit ----

as a complete script:
Code:
#!/bin/bash

(
	set -- . . "$@"

	for (( I = 3; I < $#; I++ )); do
		eval "exec $I<\"${!I}\""
	done

	exec >"${!#}"

	until
		C=0 A=()
		for (( I = 3; I < $#; I++ )); do
			read -u "$I" "A[$I]" && (( C++ ))
		done
		[[ C -eq 0 ]]
	do
		echo "${A[@]}"
	done
)
Code:
bash script.sh file1 file2 ... outputfile

Last edited by konsolebox; 05-19-2011 at 09:45 AM.
 
Old 05-19-2011, 09:30 AM   #15
Chirel
Member
 
Registered: Nov 2009
Posts: 55

Rep: Reputation: 19
Hi,

Try dos2unix file*.txt before the paste.

To control you can also use
Code:
paste -d'-_' fileA.txt fileB.txt fileC.txt > test.txt
You should have the result :
Code:
HeaderA-HeaderB_HeaderC
LineA1-LineB1_LineC1
LineA2-LineB2_LineC2
LineA3-LineB3_LineC3
LineA4-LineB4_LineC4


Just made the test unix2dos of my files and the result is :
Code:
$ od -c test.txt
0000000   H   e   a   d   e   r   A  \r   -   H   e   a   d   e   r   B
0000020  \r   _   H   e   a   d   e   r   C  \r  \n   L   i   n   e   A
0000040   1  \r   -   L   i   n   e   B   1  \r   _   L   i   n   e   C
0000060   1  \r  \n   L   i   n   e   A   2  \r   -   L   i   n   e   B
0000100   2  \r   _   L   i   n   e   C   2  \r  \n   L   i   n   e   A
0000120   3  \r   -   L   i   n   e   B   3  \r   _   L   i   n   e   C
0000140   3  \r  \n   L   i   n   e   A   4  \r   -   L   i   n   e   B
0000160   4  \r   _   L   i   n   e   C   4  \r  \n
$

Last edited by Chirel; 05-19-2011 at 09:36 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script: using "select" to show multi-word options? (like "option 1"/"o zidane_tribal Programming 6 03-21-2013 10:35 AM
[SOLVED] troubles with bash script tests "-z" and "-n" SaintDanBert Linux - Software 7 04-10-2012 09:26 AM
C++ - "snprintf" inside "for" doesn't work as expected. (int to char*) Repgahroll Programming 14 08-31-2010 08:27 AM
Standard commands give "-bash: open: command not found" even in "su -" and "su root" mibo12 Linux - General 4 11-11-2007 10:18 PM
How to write a bash script to replace all "KH" to "K" in file ABC??? cqmyg5 Slackware 4 07-24-2007 09:00 AM


All times are GMT -5. The time now is 03:20 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration