Extracting certain lines from a text and outputting to new text files?

paradeboy · 03-11-2012, 02:23 PM

Dear forum,

I know about the awk command but I am having a hard time putting it to use on a text file I have.

This text file has the following format:

Quote:

Name Test Score
Jennifer 1 60
Jennifer 2 79
Jennifer 3 30
Jennifer 4 50
Jennifer 5 70
Bob 1 30
Bob 2 60
Bob 3 20
Bob 4 90
Bob 5 80
Joe 1 80
Joe 2 60
Joe 3 60
Joe 4 70
Joe 5 70
...

I would like to make new text files for each of the test scores (5 total), such that the new text files look like the following:

Text file 1 for Test 1 scores

Quote:

Name Test Score
Jennifer 1 60
Bob 1 30
Joe 1 80
...

Text file 2 for Test 2 scores

Quote:

Name Test Score
Jennifer 2 79
Bob 2 60
Joe 2 60
...

etc.

Is there a way to do this using just awk or is there something else that would be needed? Can awk do sequence extraction... not sure if that's the right wording, what I mean is that every nth line is output (line 1, 6, 11, 16... to text file 1; lines 2, 7, 12, 17... to text file 2, etc.) rather than searching for value 2 in the second column and outputting those that match to text file 2. Hope this made some sense..

Thanks for your help! =)

colucix · 03-11-2012, 03:42 PM

Hi and welcome to LinuxQuestions!

Checking the value in the second field would be the most straightforward method, anyway here we go:

Code:

BEGIN {

  getline
  
  for ( i = 1; i <= 5; i++ )
    print > i ".txt"
  
}

NR > 1 {

  file = (NR - 2) % 5 + 1 ".txt"
  
  print > file
  
}

The BEGIN section "initialize" the files printing out the header. The expression NR > 1 skips the header itself, then you can simply use an algorithm to compute the file name according to the current record number. You can easily adapt this code for any number of tests, provided the input file has the same format. Hope this helps.

paradeboy · 03-11-2012, 04:26 PM

Thank you very much for your help colucix!

danielbmartin · 03-13-2012, 07:30 PM

Here's another take on solving this "data distribution" problem.

This code assumes your input file has a name of the form
"/home/daniel/Desktop/LQfiles/dbm266inp.txt"
The script file is dbm266.bin and the input file is dbm266inp.txt.
Use your own path name and program name but the "inp.txt" is important.

Most of the code (below) is setup and comments.
The real work is all on one line, the awk.

Code:

#   Daniel B. Martin   Mar12
#
#   To execute this program, launch a terminal sesson and enter:
#   bash /home/daniel/Desktop/LQfiles/dbm266.bin
#
#   This program was inspired by:
#   http://www.linuxquestions.org/questions/linux-newbie-8/
#    extracting-certain-lines-from-a-text-and-outputting-to
#    -new-text-files-933921/


# Input file identification  
InFile='/home/daniel/Desktop/LQfiles/dbm266inp.txt'
echo
echo "The input file is:"
echo $InFile

# Output file identification 
# PF = Prefix
PF=$(echo $InFile |sed -e 's\inp.txt$\\')'out'
echo; echo "Output will be written to files with names of the form:"
echo $PF"x.txt where x is any alphanumeric."


# This awk deals out the input records to one or more output files
# according to the character in field 2.
awk -v pf="$PF" '{print >pf$2".txt"}' $InFile

echo; echo "Normal end of job."; echo
exit

Daniel B. Martin

grail · 03-14-2012, 12:02 AM

Here is the same idea:

Code:

awk 'NR > 1{print > $2".txt"}' file