LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   sed/awk/grep for multiple line data (http://www.linuxquestions.org/questions/programming-9/sed-awk-grep-for-multiple-line-data-592391/)

hotrodmacman 10-16-2007 09:57 PM

sed/awk/grep for multiple line data
 
I've been banging my head up against sed/awk/grep for quite some time now trying to parse a text data file that comes out like so:

There are 4 columns:
DEV, Queue Number, NAME, Assignment

DEVCONFIG*143 143 INCOMING3 lpr -P INCOM
ING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14 ^[[H^[[2J
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/p
ublic/pbinvo
ices.txt
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/script
s/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ
B.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_AD
MIN

Functionally I need to "unwrap" the data in column 4 (assignment) so that it prints all on one line, i.e. lpr -P IT_ADMIN, get rid of the empty spaces left in columns 1-3, and remove the control codes that print out after some of the lines ^[[H^[[2J.

This is just an excerpt from the file, the data in column 4 varies depending on printer assignment. The control codes are pagination codes that are put in the output by a LIST command on the data table in question.

Unfortunately, I do not know a way to pre-format this output. Is there anyone here familiar with jQL queries for jBASE, or that can give some quick and dirty PROC snippets that will do it?

I'm reasonably sure there is a way to do this with sed/awk, however I have been unable to find any relevant examples or a document that explains regular expressions such that I understand what is going on.

Thanks in advance!


--
Lance B.

yongitz 10-16-2007 10:38 PM

Hi! Can u give a clear example of your expected output? Anyway, try the code below if it can get the job done.


Code:

awk '{print $4,$5,$6}' FILE

matthewg42 10-16-2007 10:46 PM

You can use [code] tags to improve the readability of this sort of data (it uses a fixed width font and preserves whitespace).

I didn't quite understand what you want. I get the bit about un-word-wrapping lines like this:
Code:

DEVCONFIG*155 155 PBINV cat >/data/p
ublic/pbinvo
ices.txt

...but I didn't understand the part about the spaces in the first three columns and control codes.

ghostdog74 10-16-2007 11:29 PM

one way out of many...
Code:

awk 'BEGIN{RS="\nDEV"}
NR==1{ for(i=1;i<=NF;i++){printf $i" "};print "" }
NR>1{
          printf "DEV"
          for(i=1;i<=NF;i++){
              if ($i !~ /[\^\[]/) {
                    printf $i" "
              }
          }
          print " "
}' "file"

output:
Code:

# ./test.sh
DEVCONFIG*143 143 INCOMING3 lpr -P INCOM ING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/p ublic/pbinvo ices.txt
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/script s/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ B.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_AD MIN

i leave it to you to do the rest..

PAix 10-16-2007 11:55 PM

Quote:

#!/bin/sh

# Read until line complete. Anything after the first line is part of field 4 - Assignment

cat hotroddata | awk '
$1 !~ /DEVCONFIG/ { if ((fourzerotwo=="") && (fourzeroone==""))
fourzeroone = $0
if ((fourzeroone!="") && (fourzerotwo==""))
fourzerotwo = $0 }

$1 ~ /DEVCONFIG/ { print prime, fourzero fourzeroone fourzerotwo;
fourzero=fourzeroone=fourzerotwo="";
prime = ($1 " " $2 " " $3);
fourzero = $4 " " $5 " " $6 " " $7 " " $8 " " $9 }

END { if ((fourzerotwo != "") || (fourzeroone != ""))
print prime, fourzero fourzeroone fourzerotwo; }
' | less ## > output file

## I'm sure you will know what to do with the last line when you have it licked.
~
I regret that it is late in the UK and I have to be up in the morning so my solution is only partial due to lack of time on my part.

I copied your input data by cutting and pasting it and called it "hotroddata". So we can see your input data from your post and here is my output data which falls far short of your mark - at the moment.

DEVCONFIG*143 143 INCOMING3 lpr -P INCOM ING3ING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14 ^[[H^[[2J
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/p ublic/pbinvoublic/pbinvo
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/script s/email.pls/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ B.4B.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_AD MINMIN


On the basis that all lines began with DEVCONFIG and that I was likely to have an uncommitted print at the end I proceeded.

Question: did you want the first line of your data to actually look like this:

DEVCONFIG*143143INCOMING3 lpr -P INCOMING3ING3

This is what I think Mathew was confused about, and I had missed, the first three columns want concatenated without white space?
if that's correct, then removing the spaces in the assignment to "prime" should fix that no bother.

I took a blunt instrument approach to ensuring that elements of column 4 (5, 6 ,7, 8 etc) didn't lose their proper spaces. This of course caused a bit of a problem in that it put spaces where the data was originally broken over lines.

The multiple spaces I didn't see as a problem as I would just pipe the output through tr to single-up multiple spaces. The space prior to the broken line (see AD MINMIN [Woops that extra MIN came from sloppy thinking in the END tag] I would get rid of by writing an awk function to process things slightly more elegantly.

The control sequences ^[[H^[[2J I would get rid of in my function by discarding any ^ character and anything to the right of it.

I would imagine you will be well fixed by the time I get to have a look in again tomorrow evening. Good luck. A shame my sed is dusty (I think sed would have been more appropriate for sorting the lines) and my awk is a couple of years off speed and in need of exercising.

How to preserve indentation of code in this editor without compromising it?

PAix

bigearsbilly 10-17-2007 09:31 AM

example input and output may help

PAix 10-17-2007 10:22 PM

Hi Lance, Too late to be of use I guess,

Output file:
Quote:

DEVCONFIG*143 143 INCOMING3 lpr -P INCOMING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/public/pbinvoices.txt
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/scripts/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZB.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_ADMIN
The input file was shown in post #1, but a mailing showed it to have been substantially mangled by the forum browser window. It should have had multiple spaces between columns 1, 2 and 3,4 (DEV, Queue Number, NAME, Assignment). The extension of column 4 has significant leading space(?) fill, but don't rule out the possibility of tabs. I took my new input file from the mailing, which you may have seen and called it hotroddata2, So here is my final code which may not be over clever but does the job just fine.

Quote:

#!/bin/sh

cat hotroddata2 | tr -s "\t " " " | awk '
$1 ~ /DEVCO/ { if (NR!=1)
print ""

printf $0
}

$1 !~ /DEVCO/ { printf substr($0, 2) }

END { print "" }
' | awk '
{ split($0, newlin, "^");
print newlin[1] }
'
## From this point redirect or pipe stdout to the
## file or app of your choice
The file is passed into tr to strip multiple spaces (or tabs).
This essentially cleans up the spacing in each DEVCO line and leaves other lines with a single leading space that can be easily discarded.
The result of the tr is piped into the first of two awk scripts, where DEVCO lines print the newline for any previous records and the current line is printed without newline.
Non-Devco lines are stripped of their leading space by printing the complete line beginning with the second character, without newline. This has the effect of concatenating column 4 assignment data onto it's original DEVCO record.
The END pattern prints a newline to complete the final record.

This script output is piped into the second awk script where lines are split on a delimiter of “^” which appears to be the lead character for the the printer control codes which need to be removed. The part of the line prior to the “^” character. It does mean that there is a potentially hanging space at the end of some lines that had printer control codes. Small price to pay.

hotrodmacman 10-18-2007 12:05 PM

Solved
 
A big thanks to all who contributed, especially to ghostdog74 for his awk script! I was able to get things going with the follwing script. It could most likely be cleaner, but it works. I wind up with output like:

3,DOC.SUPHP,rsh ps27 11
50,BALLOONZB,lpr -P BALLOONZB
97,CMI,lpr -P cmi
139,CT.SET2,lpr -P SETS2
4,SHIP7,rsh ps1 6
51,SALESHP.2,rsh ps19 12
98,SHIPZB4,rsh ps15 14
140,RECEIVING,rsh ps17 13
5,CT.SET7,lpr -P SETS7
52,INCOMING4,lpr -P INCOMING4
99,RDHP,lpr -P RDHP
141,SHIPSUPHP,lpr -P SHIPSUPHP
6,CUSTINTHP,rsh ps19 10
53,ZEBRA.TEST,rsh ps3 13
142,INCOMING2,lpr -P INCOMING2

Thanks again!

~Lance

#####################################
# Program to update printer mappings
# Lance Berrier
# October 2007
#####################################
WORKING_DIR="/scratch/printing"
PRINTER_FILE="printer.file.txt"
SPOOL_DIR="/usr/jspooler"
SPOOL_FILE="spoolers.txt"
TEMP_OUT="tmp.txt"
OUT_FILE="spoolers.csv"

cd $SPOOL_DIR
LIST jspool_log WITH QNUM = "DEVCONFIG]" QNUM QNAME QDEV > $WORKING_DIR/$SPOOL_FILE
cd $WORKING_DIR


sed -i.bak -e 's/PAGE.*$//g' -e 's/jspool_log.*$//g' -e 's/\x1B.*$//g' -e '/^$/d' $SPOOL_FILE

######################################
# individual sed statement, because I
# can't' get it to work in the other
# one.
######################################

sed -i.bak -e 'N;$!P;$!D;$d' $SPOOL_FILE


######################################
# ghostdog74's awk to get rid of the
# wrapping.
######################################

awk 'BEGIN{RS="\nDEV"}
NR==1{ for(i=1;i<=NF;i++){printf $i" "};print "" }
NR>1{
printf "DEV"
for(i=1;i<=NF;i++){
if ($i !~ /[\^\[]/) {
printf $i" "
}
}
print " "
}' $SPOOL_FILE > $TEMP_OUT

#####################################
# Format the output into something
# useable
#####################################

awk 'BEGIN{OFS=","}
{
if ($4=="cat")
{
print $2,$3,$4" "$5$6$7$8
}
else if ($NF ~ /pl$/)
{
print $2,$3,$4$5$6$7$8
}
else
{
print $2,$3,$4" "$5" "$6$7$8
}
}' $TEMP_OUT > $OUT_FILE

rm -f $TEMP_OUT

hotrodmacman 10-18-2007 12:06 PM

PAix!

I wish I had seen your last post a little earlier! I tried it out and it worked great!

Thanks again!

~Lance


All times are GMT -5. The time now is 03:58 AM.