LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Distributing data among files (http://www.linuxquestions.org/questions/programming-9/distributing-data-among-files-933954/)

danielbmartin 03-11-2012 08:10 PM

Distributing data among files
 
Have:
An input file which contains a 1, 2, or 3 in column 1 of each line.

Want:
Three output files such that ...
output file #1 contains all input lines with a 1 in column 1.
output file #2 contains all input lines with a 2 in column 1.
output file #3 contains all input lines with a 3 in column 1.

This code doesn't work.
Code:

# File Identifiers
InFile='/home/daniel/Desktop/Voters/dbm262inp.txt'
OutFile1='/home/daniel/Desktop/Voters/dbm262o1.txt'
OutFile2='/home/daniel/Desktop/Voters/dbm262o2.txt'
OutFile3='/home/daniel/Desktop/Voters/dbm262o3.txt'

for (( i=1; i<=3; i++ ))
do
  OutFile="OutFile"$i
  grep '^$i ' $InFile > $OutFile
done

Please correct the bash if you can.

A secondary issue: Even if this code worked, it reads the input file three times. Can it be made more efficient by making only one pass? awk?

Daniel B. Martin

AnanthaP 03-11-2012 08:35 PM

Sure.

awk can redirect to multiple files.

Please check out this URL.
http://www.gnu.org/software/gawk/man...ml#Redirection

Your code might be.
If col1 = 1, then print $0 > file1.txt ;
If col1 = 2, then print $0 > file1.txt ;

and so on.

Ok

firstfire 03-11-2012 10:04 PM

Hi.

Code:

$ cat infile.txt
1 text1a
2 text2a
3 text3a
1 text1b
2 text2b
$ awk '$1 ~ /^[0-9]+$/ {print >"outfile-"$1".txt"}' infile.txt
$ cat outfile-1.txt
1 text1a
1 text1b
$ cat outfile-2.txt
2 text2a
2 text2b
$ cat outfile-3.txt
3 text3a

For details about redirection in awk read `info gawk redirection'.

Hope that helps.

danielbmartin 03-12-2012 10:16 AM

[QUOTE=firstfire;4624386]
Code:

awk '$1 ~ /^[0-9]+$/ {print >"outfile-"$1".txt"}' infile.txt
Thank you, firstfire, for this appealing one-line solution. I've been unable to adapt it to my exiting program. Please point out where I went wrong. This is the input file:
Code:

$ cat '/home/daniel/Desktop/Voters/dbm262inp.txt'
1Alabama
2Alaska
4Arizona
5Arkansas
1California
2Colorado
3Connecticut
4Delaware
3Florida
3Georgia
3Hawaii
3Idaho
4Illinois
4Indiana

This is the code which produces no output (or none that I can find).
Code:

# File Identifications 
InFile='/home/daniel/Desktop/Voters/dbm262inp.txt'
OutFile1='/home/daniel/Desktop/Voters/dbm262o1.txt'
OutFile2='/home/daniel/Desktop/Voters/dbm262o2.txt'
OutFile3='/home/daniel/Desktop/Voters/dbm262o3.txt'
OutFile4='/home/daniel/Desktop/Voters/dbm262o4.txt'
OutFile5='/home/daniel/Desktop/Voters/dbm262o5.txt'

awk '$1 ~ /^[0-9]+$/ {print >$OutFile$1}' $InFile

Thank you.

Daniel B. Martin

firstfire 03-12-2012 11:43 AM

Hi.

Here is a one-liner:
Code:

awk -F '' '$1 ~ /[0-9]/ {print >"out-"$1}'
I suppose here that the number (digit actually) is a first character on a line and that resulting files share a common prefix. -F option sets field separator to empty string, in which case each character considered a separate field (and $1 is a first character of a line).

Inside a shell script:
Code:

#!/bin/bash
PREFIX='/tmp/out-'
awk -v prefix="$PREFIX" -F '' '$1 ~ /[0-9]/ {print >prefix$1}'


danielbmartin 03-12-2012 01:41 PM

Quote:

Originally Posted by firstfire (Post 4624888)
Inside a shell script:
Code:

#!/bin/bash
PREFIX='/tmp/out-'
awk -v prefix="$PREFIX" -F '' '$1 ~ /[0-9]/ {print >prefix$1}'


Nice!

It was necessary to make minor modifications to suit my program. This is what works.
Code:

# File Identifications 
InFile='/home/daniel/Desktop/Voters/dbm262inp.txt'
OutFile1='/home/daniel/Desktop/Voters/dbm262o1.txt'
OutFile2='/home/daniel/Desktop/Voters/dbm262o2.txt'
OutFile3='/home/daniel/Desktop/Voters/dbm262o3.txt'
OutFile4='/home/daniel/Desktop/Voters/dbm262o4.txt'
OutFile5='/home/daniel/Desktop/Voters/dbm262o5.txt'

# Method of LQ member firstfire
PREFIX='/home/daniel/Desktop/Voters/dbm262o'
awk -v prefix="$PREFIX" -F '' '$1 ~ /[0-9]/ {print >prefix$1".txt"}' $InFile

Thank you!

Daniel B. Martin


All times are GMT -5. The time now is 08:42 PM.