Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
05-18-2011, 02:37 PM
|
#1
|
|
LQ Newbie
Registered: May 2011
Posts: 3
Rep: 
|
"Paste" output in Bash script not as expected...
Hi.. I have several files I am trying to 'paste' together via bash script.
Input file format:
FileA.txt:
HeaderA
LineA1
LineA2
LineA3
LineA4
...
FileB.txt:
HeaderB
LineB1
LineB2
LineB3
LineB4
...
FileC.txt:
HeaderC
LineC1
LineC2
LineC3
LineC4
...
etc.
What I want is the output file to look like this:
output.txt:
HeaderA HeaderB HeaderC ...
LineA1 LineB1 LineC1 ...
LineA2 LineB2 LineC2 ...
LineA3 LineB3 LineC3 ...
LineA4 LineB4 LineC4 ...
... ... ...
So, this is my code:
>paste FileA.txt FileB.txt FileC.txt ... > output.txt
However, the result is coming out as this:
output.txt:
HeaderA HeaderB HeaderC ...
LineA1
LineB1
LineC1
...
LineA2
LineB2
LineC2
...
LineA3
LineB3
LineC3
...
LineA4
LineB4
LineC4
...
I can't figure out why it's behaving this way..
Any ideas?
Much appreciated,
J
|
|
|
|
05-18-2011, 02:54 PM
|
#2
|
|
Member
Registered: Feb 2011
Location: LA, US
Distribution: SLES
Posts: 375
Rep: 
|
I have no idea. I created three files exactly like the examples, you provided, and here's what I got:
Code:
:~> paste fileA.txt fileB.txt fileC.txt > output.txt
:~> less output.txt
HeaderA HeaderB HeaderC
LineA1 LineB1 LineC1
LineA2 LineB2 LineC2
LineA3 LineB3 LineC3
LineA4 LineB4 LineC4
... ... ...
It looks to me like your example is very different from the real-world files you're working with. In your place, I'd first try it with this simplified exercise. If it works, that would validate that it's not a code problem, but a data problem, and you can start looking at your data format.
|
|
|
|
05-18-2011, 07:08 PM
|
#3
|
|
Member
Registered: May 2009
Distribution: Debian testing
Posts: 93
Rep:
|
This is quite curious, I didn't know of such command, I thought that in order to do that one would have to do some relatively complex script that would read one line of each file at a time, write to a file and go on proceeding this way.
And it worked for me as well.
I have no clue on why it does not work for you, the only thing I can think, a remote possibility, regards some subtle differences on windows', linux' and mac's files, something that I think that don't even exist anymore.
I don't remember details, but it used to be back in the day that you'd have to convert a windows' pure text (.txt) to linux format and vice versa, otherwise one would have no "new lines" and/or it would have extra new lines, depending on who wrote and who reads it.
But I think these days are gone now, this would no longer be an issue. But I can't really tell.
Last edited by the dsc; 05-18-2011 at 07:11 PM.
|
|
|
|
05-18-2011, 07:11 PM
|
#4
|
|
LQ 5k Club
Registered: Sep 2009
Distribution: Arch x86_64
Posts: 6,443
|
Quote:
Originally Posted by the dsc
But I think these days are gone now, this would no longer be an issue. But I can't really tell.
|
What do you mean?
Linux and Widnows still use different line endings.
At least Mac OS uses LF newlines just like Linux since Mac OS X.
|
|
|
|
05-18-2011, 07:41 PM
|
#5
|
|
Senior Member
Registered: Jan 2010
Posts: 1,604
|
Hi,
I tried to replicate your "error". This is the closest I got:
Code:
$ cat fileA
headerA
lineA1
lineA2
lineA3
$ cat fileB
headerB
lineB1
lineB2
lineB3
$ cat fileC
headerC
lineC1
lineC2
lineC3
$ paste fileA fileB fileC
headerA headerB headerC
lineA1
lineB1
lineC1
lineA2
lineB2
lineC2
lineA3
lineB3
lineC3
Notice, the positions of the blank lines in the text files.
I also ran the command with windows files on Linux. The results were still correct.
Is it possible that you are trying to 'paste' Linux files in Windows? Not sure, if Windows would handle Linux files correctly in this case.
|
|
|
|
05-18-2011, 10:00 PM
|
#6
|
|
Member
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 535
Rep:
|
OP's post
Quote:
|
>paste FileA.txt FileB.txt FileC.txt ... > output.txt
|
three dots after FileC.txt
Last edited by AnanthaP; 05-18-2011 at 10:01 PM.
|
|
|
|
05-18-2011, 11:05 PM
|
#7
|
|
Member
Registered: May 2009
Distribution: Debian testing
Posts: 93
Rep:
|
Quote:
Originally Posted by MTK358
What do you mean?
Linux and Widnows still use different line endings.
At least Mac OS uses LF newlines just like Linux since Mac OS X.
|
I didn't know that. I just assumed because I think I've opened a few txt files from windows over the years without noticing anything strange, but perhaps the effect is only in the other direction. Or maybe there's some workaround in some text editors like kwrite.
|
|
|
|
05-18-2011, 11:11 PM
|
#8
|
|
Member
Registered: May 2009
Distribution: Debian testing
Posts: 93
Rep:
|
Quote:
Originally Posted by AnanthaP
OP's post
three dots after FileC.txt
|
With me it just interprets three dots as another file (and the OP probably just used that to omit more files, not as an actual command) and ends in error: "paste: ...: No such file or directory"
|
|
|
|
05-19-2011, 06:49 AM
|
#9
|
|
LQ Newbie
Registered: May 2011
Posts: 3
Original Poster
Rep: 
|
Thank you for the responses,
"With me it just interprets three dots as another file (and the OP probably just used that to omit more files, not as an actual command) and ends in error: "paste: ...: No such file or directory"
That's correct, I used the ellipses to show more than just the three files. I actually have about 20 actual data files that go in this way. As to the environment I'm using, its Bash on a Windows XP machine. Sorry, should have included that in my original post. The problem seems to lie within the format of the data files; other 'dummy' files that I created work fine using this method. I'll have to investigate some more.
|
|
|
|
05-19-2011, 07:12 AM
|
#10
|
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
If you have awk, you can write a paste replacement that supports any newline convention, removes leading (except on the first line) and trailing whitespace on each line, and ignores empty lines. Supply the file names separated by a pipe | at the beginning:
Code:
awk -v "files=file1|file2|fileN" -v "separator=\t" '
BEGIN {
RS="[\t\v\f ]*[\n\r][\t\n\v\f\r ]*"
n=split(files, file, "|")
while (1) {
ok=0
sep=""
for (i=1; i<=n; i++) {
if (getline field < file[i])
ok=1
else
field=""
printf("%s%s", sep, field)
sep=separator
}
if (!ok) exit(0)
printf("\n")
}
}'
The above uses TAB ( \t) as the field separator, but you can change that too.
|
|
|
|
05-19-2011, 08:13 AM
|
#11
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,316
|
Well I would agree that paste is the tool for the job, but Nominal got me think (as usual), although I have assumed that printing will stop once the first file's data is consumed:
Code:
#!/usr/bin/awk -f
BEGIN{
while(getline < ARGV[1]){
for(i=2;i<=(ARGC - 1);i++){
getline add < ARGV[i]
$0 = $0"\t"add
}
print
}
}
The nice thing here is you can call it like so:
Code:
./script.awk file*.txt
|
|
|
|
05-19-2011, 09:11 AM
|
#12
|
|
LQ Newbie
Registered: May 2011
Posts: 3
Original Poster
Rep: 
|
Thank you for the work-around
To try to figure out what is going on, I took SL00b's advice and actually created the simplified version of the input files:
Code:
fileA.txt
HeaderA
LineA1
LineA2
LineA3
LineA4
fileB.txt
HeaderB
LineB1
LineB2
LineB3
LineB4
fileC.txt
HeaderC
LineC1
LineC2
LineC3
LineC4
and when I run
Code:
>paste fileA.txt fileB.txt fileC.txt > test.txt
I get this:
Code:
HeaderA
HeaderB
HeaderC
LineA1
LineB1
LineC1
LineA2
LineB2
LineC2
LineA3
LineB3
LineC3
LineA4 LineB4 LineC4
For giggles, I put a delimiter in the paste command:
Code:
>paste -d* fileA.txt fileB.txt fileC.txt > test.txt
And as similarly before,
Code:
HeaderA
*HeaderB
*HeaderC
LineA1
*LineB1
*LineC1
LineA2
*LineB2
*LineC2
LineA3
*LineB3
*LineC3
LineA4*LineB4*LineC4
It almost looks like its trying to use the serial option of 'paste' except for the last line.
I then ran it again with the serial option:
Code:
>paste -sd* fileA.txt fileB.txt fileC.txt > test.txt
And it appears that this actually works correctly:
Code:
HeaderA
*LineA1
*LineA2
*LineA3
*LineA4
HeaderB
*LineB1
*LineB2
*LineB3
*LineB4
HeaderC
*LineC1
*LineC2
*LineC3
*LineC4
I'm really scratching my head.
|
|
|
|
05-19-2011, 09:22 AM
|
#13
|
|
Member
Registered: Feb 2011
Location: LA, US
Distribution: SLES
Posts: 375
Rep: 
|
I'd say something is goofy with the implementation of the paste command in an XP environment.
I read your post saying you were running it in Bash on XP, so I created the same three text files on my XP desktop, ran the paste command from a Cygwin client, and received the following:
$ paste fileA.txt fileB.txt fileC.txt
HeaderA HeaderC
LineA1 LineC1
LineA2 LineC2
LineA3 LineC3
LineA4 LineC4
... ...
Same data, same command, different environment, and for some reason fileB.txt was randomly ignored. I double and triple-checked the files, and they're all the same.
Since this simplified test doesn't work, that says "code problem" to me.
|
|
|
|
05-19-2011, 09:29 AM
|
#14
|
|
Senior Member
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,046
Rep: 
|
in bash, you can perhaps do it like this:
Code:
#!/bin/bash
for (( ;; )); do
C=0 A=()
read -u 3 A\[0\] && (( C++ ))
read -u 4 A\[1\] && (( C++ ))
read -u 5 A\[2\] && (( C++ ))
[[ C -eq 0 ]] && break
echo "${A[@]}" >&6
done 3<"file1" 4<"file2" 5<"file3" 6>"outputfile"
---- edit ----
or:
Code:
until
C=0 A=()
read -u 3 A[0] && (( C++ ))
read -u 4 A[1] && (( C++ ))
read -u 5 A[2] && (( C++ ))
[[ C -eq 0 ]]
do
echo "${A[@]}" >&6
done 3<"file1" 4<"file2" 5<"file3" 6>"outputfile"
---- edit ----
as a complete script:
Code:
#!/bin/bash
(
set -- . . "$@"
for (( I = 3; I < $#; I++ )); do
eval "exec $I<\"${!I}\""
done
exec >"${!#}"
until
C=0 A=()
for (( I = 3; I < $#; I++ )); do
read -u "$I" "A[$I]" && (( C++ ))
done
[[ C -eq 0 ]]
do
echo "${A[@]}"
done
)
Code:
bash script.sh file1 file2 ... outputfile
Last edited by konsolebox; 05-19-2011 at 09:45 AM.
|
|
|
|
05-19-2011, 09:30 AM
|
#15
|
|
Member
Registered: Nov 2009
Posts: 55
Rep:
|
Hi,
Try dos2unix file*.txt before the paste.
To control you can also use
Code:
paste -d'-_' fileA.txt fileB.txt fileC.txt > test.txt
You should have the result :
Code:
HeaderA-HeaderB_HeaderC
LineA1-LineB1_LineC1
LineA2-LineB2_LineC2
LineA3-LineB3_LineC3
LineA4-LineB4_LineC4
Just made the test unix2dos of my files and the result is :
Code:
$ od -c test.txt
0000000 H e a d e r A \r - H e a d e r B
0000020 \r _ H e a d e r C \r \n L i n e A
0000040 1 \r - L i n e B 1 \r _ L i n e C
0000060 1 \r \n L i n e A 2 \r - L i n e B
0000100 2 \r _ L i n e C 2 \r \n L i n e A
0000120 3 \r - L i n e B 3 \r _ L i n e C
0000140 3 \r \n L i n e A 4 \r - L i n e B
0000160 4 \r _ L i n e C 4 \r \n
$
Last edited by Chirel; 05-19-2011 at 09:36 AM.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:09 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|