[SOLVED] Joining Three Lines of Text into One Line with Delimiters Between the Original Lines

tronayne · 03-15-2016, 02:37 PM

I have a file that contains

Code:

First_name Last_name
House_Number Street_Name Street_Type
City State Zip

I want the output to be a CSV file with tab delimiters between the three fields.

The names can include Jr, II, Sr, maybe a middle name too.

The Addresses can be 12345 Nicholson Hill Road or 12345 Cedar (or mine, 12345 North US Hwy 23.

The City State and Zip are just that.

There is a blank line in file between each (I can simply remove those.

I want the three to be joined with a tab character as a delimiter.

I've been fiddling with paste, can't quite get that to work, I've been fiddling with sed, no joy there either. I'm just getting ready to write in C (and this is a one-off job).

I know I've done this before, just gotten too danged old to remember how.

Anybody have a clue?

schneidz · 03-15-2016, 02:44 PM

man tr ?
maybe awk would be better.
i think i would do it in c also.

titopoquito · 03-15-2016, 02:51 PM

+1 for tr. Not tested because I'm bringing my son to bed:

Code:

cat text | tr "\n" "\t"

riwi · 03-15-2016, 03:10 PM

Code:

#!/usr/bin/perl
use warnings;
my $line1;
my $line2;
my $line3;
my $outline;
open(in_file,"<./list.txt") or die "no file found";
open(out_file,">./list_perl.txt") or die "cannot create file";
while (my $line1=<in_file>) {
  $line2=<in_file>;
  $line3=<in_file>;
  $line1 =~ s/\R//g;
  $line2 =~ s/\R//g;
  $line3 =~ s/\R//g;
  $outline= $line1 . "\t" . $line2 . "\t" . $line3 . "\n";
  print out_file $outline;
  }
close(in_file);
close(out_file);

input

Code:

First_name Last_name
House_Number Street_Name Street_Type
City State Zip
First_name Last_name
House_Number Street_Name Street_Type
City State Zip
First_name Last_name
House_Number Street_Name Street_Type
City State Zip

output :

Code:

First_name Last_name    House_Number Street_Name Street_Type    City State Zip
First_name Last_name    House_Number Street_Name Street_Type    City State Zip
First_name Last_name    House_Number Street_Name Street_Type    City State Zip

I like perl better because it is usually easier to read and much fast on large text file handling.

GazL · 03-15-2016, 03:23 PM

sed implementation:

Code:

#!/usr/bin/sed -f

/^$/d
N
N
s/\n/\t/g

edit: enhanced version here

dugan · 03-15-2016, 04:35 PM

Python implementation:

Code:

#!/usr/bin/env python

import fileinput
import itertools

field_width = 3

field_indexes = (i for i in itertools.cycle(range(field_width)))
fields = []
for index, line in itertools.izip(field_indexes, fileinput.input()):
    fields.append(line.strip())
    if index == field_width - 1:
        print '\t'.join(fields)
        fields = []

Save it as "tabit" and run it with:

Code:

./tabit list.txt

allend · 03-15-2016, 06:01 PM

An awk solution.

Code:

awk '/.+/{x=$0;getline;x=x"\t"$0;getline;x=x"\t"$0;print x}' input.txt

Richard Cranium · 03-15-2016, 07:20 PM

Quote:

Originally Posted by allend

An awk solution.

Code:

awk '/.+/{x=$0;getline;x=x"\t"$0;getline;x=x"\t"$0;print x}' input.txt

The few times that I used awk, I thought that it was pretty powerful.

Unfortunately, I could never remember how the clever stuff that I had written worked after a couple of days.

https://news.ycombinator.com/item?id=682302

Richard Cranium · 03-15-2016, 07:28 PM

Quote:

Originally Posted by riwi

I like perl better because it is usually easier to read and much fast on large text file handling.

When I used Perl in my day job, it felt as if the Devil was breaking wind into my face with almost every line that I read.

A large subset of the coding world love Perl. There's nothing wrong with that (and I've seen some pretty amazing stuff written in Perl).

I'd just rather not, and probably because the "There's more than one way to do it" is a horrible thing to inflict upon the reader of your code for anything remotely complicated.

(Please don't take the above as a comment about the code that you posted. I just hate Perl.)

tronayne · 03-16-2016, 06:00 AM

Quote:

Originally Posted by allend

An awk solution.

Code:

awk '/.+/{x=$0;getline;x=x"\t"$0;getline;x=x"\t"$0;print x}' input.txt

Works like a champ, thank you (and I understand AWK, just hadn't got quite there yet).

Thanks to everybody, cripes there are so many ways to skin a cat -- thought I knew a couple, but, wow.

All is well that ends.

PrinceCruise · 03-16-2016, 10:41 AM

This thread is gold.

Regards.

kjhambrick · 03-16-2016, 12:03 PM

tronayne --

I know you've solved this one but being an awk junkie, myself, I had to send you another one <G>

Don't delete the Blank Lines !!!

This 'style' is a common file format and it's actually covered in 'the awk book' ( "The AWK Programming Language / Edition 1" by A,K,W )

Here's another using a newline as a Field Separator ( FS = "\n" ) and a "NULL" as a Record Separator ( RS = "" )

It will work as long as the Blank Lines are truly Blank and not spaces.

Code:

$ gawk 'BEGIN{ FS = "\n" ; RS = "" }{ print $1 "\t" $2 "\t" $3 }' <<your file>>

Test Data (<G> thanks GazL <G>):

Code:

$ cat test.txt
Joe Bloggs
5 Somewhere street, Somewhere.
Some Zip

Mary Smith
27  Other Street, Otherplace.
Another Zip

Note that the last line in the input file may be a blank line or not ( does not matter )...

Here's the output:

Code:

$ gawk 'BEGIN{ FS = "\n" ; RS = "" }{ print $1 "\t" $2 "\t" $3 }' test.txt
Joe Bloggs      5 Somewhere street, Somewhere.  Some Zip
Mary Smith      27  Other Street, Otherplace.   Another Zip

-- kjh

bassmadrigal · 03-16-2016, 12:13 PM

Quote:

Originally Posted by kjhambrick

This 'style' is a common file format and it's actually covered in 'the awk book' ( "The AWK Programming Language / Edition 1" by A,K,W )

I feel like they missed a golden opportunity to order their names so their initials could come out as AWK instead of alphabetically.

Alfred Aho
Peter Weinberger
Brian Kernighan

schneidz · 03-16-2016, 12:16 PM

Quote:

Originally Posted by bassmadrigal

I feel like they missed a golden opportunity to order their names so their initials could come out as AWK instead of alphabetically.

Alfred Aho
Peter Weinberger
Brian Kernighan

those are the original designers of the awk parsing language. i guess book publishers automatically alphabetize multi-authors (or maybe they are just not knowledgeable about it) ?

tronayne · 03-16-2016, 12:32 PM

Quote:

Originally Posted by kjhambrick

tronayne --

I know you've solved this one but being an awk junkie, myself, I had to send you another one <G>

Don't delete the Blank Lines !!!

Hey, that's even slicker -- was looking through The AWK Programming Language this morning, hadn't noticed that (got busy with doing other things and just getting back to it).

BTW, you can go to Brian Kernighan's web site, http://www.cs.princeton.edu/~bwk/, and download the source for a Linux version of AWK (New AWK, the one from the book, with updates and fixes). Sometimes that page is a little iffy, sometimes it's not (if you can't get to it, I'll e-mail to you if you want).

Get it, unzip it (be careful, create a directory, cd into it, then unpack it; you get the source and all the examples from the book. Type make, wait a while, copy a.out to /usr/local/bin as nawk.

I use it instead of gawk. It doesn't have embedded GNU "features," and it works just fine.

Thanks for your interest.