LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   join every three lines of a text file (http://www.linuxquestions.org/questions/programming-9/join-every-three-lines-of-a-text-file-524265/)

powah 01-31-2007 12:41 AM

join every three lines of a text file
 
How to join every three lines of a text file?
Any programming language is ok.
Thanks.
e.g.
Input:
Sep. 30, 2006 Oct. 04, 2006
PIZZA PIZZA #222
$8.75
Oct. 01, 2006 Oct. 04, 2006
BURGER KING #8818
$9.10
Oct. 02, 2006 Oct. 03, 2006
NEW GREAT WALL RESTAURANT
$38.10

Output:
Sep. 30, 2006 Oct. 04, 2006 PIZZA PIZZA #222 $8.75
Oct. 01, 2006 Oct. 04, 2006 BURGER KING #8818 $9.10
Oct. 02, 2006 Oct. 03, 2006 NEW GREAT WALL RESTAURANT $38.10

bartonski 01-31-2007 02:17 AM

This can be done using the unix text utility 'paste'

Use the option -d with a the list of delimiters space, space newline
And the -s option.

Imagine that the file /tmp/foo contains your input:

Sep. 30, 2006 Oct. 04, 2006
PIZZA PIZZA #222
$8.75
Oct. 01, 2006 Oct. 04, 2006
BURGER KING #8818
$9.10
Oct. 02, 2006 Oct. 03, 2006
NEW GREAT WALL RESTAURANT
$38.10

eg.

Quote:

>paste -s -d '
' /tmp/foo
Yields

Sep. 30, 2006 Oct. 04, 2006 PIZZA PIZZA #222 $8.75
Oct. 01, 2006 Oct. 04, 2006 BURGER KING #8818 $9.10
Oct. 02, 2006 Oct. 03, 2006 NEW GREAT WALL RESTAURANT $38.10

The thing that's hard to convey via text is that the delimiters are two space characters and a newline, surrounded by single quotes.

makyo 01-31-2007 06:47 AM

Hi.

Good solution. Some versions of paste recognize \n for newline, and \s for space:
Code:

paste -s -d "\s\s\n" /tmp/foo
which can make the command slightly easier to read.

( <rant on poor documentation> Which is not noted in the man or info pages on paste, nor provided as an example </rant> ) ... cheers, makyo

bartonski 02-01-2007 08:23 AM

My version of paste (version 5.2.1 with Gnu Coretutils, the default version which ships with Fedora Core 4).

Quote:

paste -s -d "\s\s\n" /tmp/foo
Gives

[QUOTE]
Sep. 30, 2006 Oct. 04, 2006sPIZZA PIZZA #222s$8.75
Oct. 01, 2006 Oct. 04, 2006sBURGER KING #8818s$9.10
Oct. 02, 2006 Oct. 03, 2006sNEW GREAT WALL RESTAURANTs$38.10
[QUOTE]

In other words the \s delimter is parsed as a literal 's' (yeah, I'm anal retentive, I looked at it in od)

Quote:

paste -s -d " \n" /tmp/foo
Does the right thing.

I agree with the documentation. This is part of Gnu coreutils, the full documentation can be found by typing 'info paste'... sort of. You have to root around in the info pages for coreutils and find the section that has character classes and special characters in it. I found it once ;)

makyo 02-01-2007 09:42 AM

Hi, bartonski.

Good catch, thanks for checking it. The version of paste that I use is the same. Apparently reading unfamiliar text allowed me to visually skip the 's' insertions. I should have pasted the results.

I looked over info paste, and it's not different from man paste (of course, my skimming is somewhat suspect, at least recently :) ). I see that info coreutils paste does present a few examples, but does not appear to discuss \n.

Upon reflection, I am not sure why I even tried the \s, since that doesn't make any sense. The only use that I can think of now is for matching a whitespace character. Loose connection someplace :) ... cheers, makyo

whk 02-01-2007 12:34 PM

Quote:

I can think of now is for matching a whitespace character
paste -s -d " \n" /tmp/foo
(with two white spaces between " and \n)
This website should have a limited tab options.
just mho

makyo 02-01-2007 06:00 PM

Hi, whk.

Instead of using plain text in forum threads:

paste -s -d " \n" /tmp/foo

or QUOTE brackets:
Quote:

paste -s -d " \n" /tmp/foo
I think if one uses CODE brackets, one can see the spaces better because that gets set as monospace:
Code:

paste -s -d "  \n" /tmp/foo
How does that look to you? ... cheers, makyo

( edit 1: clarification )

ghostdog74 02-01-2007 07:42 PM

If you have Python, here's an alternative
Code:

#!/usr/bin/python
data = open("file.txt").readlines()
data = [ i.strip() for i in data ] #get rid of newlines
threelines = range(0,len(data),3)
for num,line in enumerate(data):
        if num in threelines:
                print ' '.join(data[num:num+3])

output:
Code:

#/home: ./test.py
Sep. 30, 2006 Oct. 04, 2006 PIZZA PIZZA #222$8.75
Oct. 01, 2006 Oct. 04, 2006 BURGER KING #8818$9.10
Oct. 02, 2006 Oct. 03, 2006 NEW GREAT WALL RESTAURANT$38.10


bartonski 02-01-2007 11:40 PM

Quote:

Originally Posted by makyo
Upon reflection, I am not sure why I even tried the \s, since that doesn't make any sense. The only use that I can think of now is for matching a whitespace character. Loose connection someplace :) ... cheers, makyo

Hmm. Makes me wonder, what \S would give you. No space delimiter? ;) ;)


All times are GMT -5. The time now is 07:20 PM.