![]() |
sed/awk/grep for multiple line data
I've been banging my head up against sed/awk/grep for quite some time now trying to parse a text data file that comes out like so:
There are 4 columns: DEV, Queue Number, NAME, Assignment DEVCONFIG*143 143 INCOMING3 lpr -P INCOM ING3 DEVCONFIG*144 144 PACKHP rsh ps22 11 DEVCONFIG*145 145 CT.SET3 lpr -P SETS3 DEVCONFIG*146 146 PACKHP.2 rsh ps21 9 DEVCONFIG*147 147 MOLDHP rsh ps6 7 DEVCONFIG*148 148 SHIP5 rsh ps1 8 DEVCONFIG*149 149 PACKHP.1 rsh ps6 8 DEVCONFIG*150 150 MEC2 rsh ps1 11 DEVCONFIG*151 151 CT.SET4 lpr -P SETS4 DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13 DEVCONFIG*153 153 SHIP1 rsh ps1 14 ^[[H^[[2J DEVCONFIG*154 154 RMHP rsh ps16 9 DEVCONFIG*155 155 PBINV cat >/data/p ublic/pbinvo ices.txt DEVCONFIG*156 156 CT.SET5 lpr -P SETS5 DEVCONFIG*157 157 SETHP rsh ps25 16 DEVCONFIG*158 158 PEGHP rsh ps5 13 DEVCONFIG*159 159 POLYEHP rsh ps6 2 DEVCONFIG*160 160 EMAIL /data/script s/email.pl DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ B.4 DEVCONFIG*162 162 HR rsh ps13 2 DEVCONFIG*163 163 ARHP rsh ps10 4 DEVCONFIG*164 164 ARVHP lpr -P ARVHP DEVCONFIG*165 165 ARHP.5 rsh ps13 8 DEVCONFIG*166 166 ITADMIN lpr -P IT_AD MIN Functionally I need to "unwrap" the data in column 4 (assignment) so that it prints all on one line, i.e. lpr -P IT_ADMIN, get rid of the empty spaces left in columns 1-3, and remove the control codes that print out after some of the lines ^[[H^[[2J. This is just an excerpt from the file, the data in column 4 varies depending on printer assignment. The control codes are pagination codes that are put in the output by a LIST command on the data table in question. Unfortunately, I do not know a way to pre-format this output. Is there anyone here familiar with jQL queries for jBASE, or that can give some quick and dirty PROC snippets that will do it? I'm reasonably sure there is a way to do this with sed/awk, however I have been unable to find any relevant examples or a document that explains regular expressions such that I understand what is going on. Thanks in advance! -- Lance B. |
Hi! Can u give a clear example of your expected output? Anyway, try the code below if it can get the job done.
Code:
awk '{print $4,$5,$6}' FILE |
You can use [code] tags to improve the readability of this sort of data (it uses a fixed width font and preserves whitespace).
I didn't quite understand what you want. I get the bit about un-word-wrapping lines like this: Code:
DEVCONFIG*155 155 PBINV cat >/data/p |
one way out of many...
Code:
awk 'BEGIN{RS="\nDEV"}Code:
# ./test.sh |
Quote:
I regret that it is late in the UK and I have to be up in the morning so my solution is only partial due to lack of time on my part. I copied your input data by cutting and pasting it and called it "hotroddata". So we can see your input data from your post and here is my output data which falls far short of your mark - at the moment. DEVCONFIG*143 143 INCOMING3 lpr -P INCOM ING3ING3 DEVCONFIG*144 144 PACKHP rsh ps22 11 DEVCONFIG*145 145 CT.SET3 lpr -P SETS3 DEVCONFIG*146 146 PACKHP.2 rsh ps21 9 DEVCONFIG*147 147 MOLDHP rsh ps6 7 DEVCONFIG*148 148 SHIP5 rsh ps1 8 DEVCONFIG*149 149 PACKHP.1 rsh ps6 8 DEVCONFIG*150 150 MEC2 rsh ps1 11 DEVCONFIG*151 151 CT.SET4 lpr -P SETS4 DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13 DEVCONFIG*153 153 SHIP1 rsh ps1 14 ^[[H^[[2J DEVCONFIG*154 154 RMHP rsh ps16 9 DEVCONFIG*155 155 PBINV cat >/data/p ublic/pbinvoublic/pbinvo DEVCONFIG*156 156 CT.SET5 lpr -P SETS5 DEVCONFIG*157 157 SETHP rsh ps25 16 DEVCONFIG*158 158 PEGHP rsh ps5 13 DEVCONFIG*159 159 POLYEHP rsh ps6 2 DEVCONFIG*160 160 EMAIL /data/script s/email.pls/email.pl DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ B.4B.4 DEVCONFIG*162 162 HR rsh ps13 2 DEVCONFIG*163 163 ARHP rsh ps10 4 DEVCONFIG*164 164 ARVHP lpr -P ARVHP DEVCONFIG*165 165 ARHP.5 rsh ps13 8 DEVCONFIG*166 166 ITADMIN lpr -P IT_AD MINMIN On the basis that all lines began with DEVCONFIG and that I was likely to have an uncommitted print at the end I proceeded. Question: did you want the first line of your data to actually look like this: DEVCONFIG*143143INCOMING3 lpr -P INCOMING3ING3 This is what I think Mathew was confused about, and I had missed, the first three columns want concatenated without white space? if that's correct, then removing the spaces in the assignment to "prime" should fix that no bother. I took a blunt instrument approach to ensuring that elements of column 4 (5, 6 ,7, 8 etc) didn't lose their proper spaces. This of course caused a bit of a problem in that it put spaces where the data was originally broken over lines. The multiple spaces I didn't see as a problem as I would just pipe the output through tr to single-up multiple spaces. The space prior to the broken line (see AD MINMIN [Woops that extra MIN came from sloppy thinking in the END tag] I would get rid of by writing an awk function to process things slightly more elegantly. The control sequences ^[[H^[[2J I would get rid of in my function by discarding any ^ character and anything to the right of it. I would imagine you will be well fixed by the time I get to have a look in again tomorrow evening. Good luck. A shame my sed is dusty (I think sed would have been more appropriate for sorting the lines) and my awk is a couple of years off speed and in need of exercising. How to preserve indentation of code in this editor without compromising it? PAix |
example input and output may help
|
Hi Lance, Too late to be of use I guess,
Output file: Quote:
Quote:
This essentially cleans up the spacing in each DEVCO line and leaves other lines with a single leading space that can be easily discarded. The result of the tr is piped into the first of two awk scripts, where DEVCO lines print the newline for any previous records and the current line is printed without newline. Non-Devco lines are stripped of their leading space by printing the complete line beginning with the second character, without newline. This has the effect of concatenating column 4 assignment data onto it's original DEVCO record. The END pattern prints a newline to complete the final record. This script output is piped into the second awk script where lines are split on a delimiter of “^” which appears to be the lead character for the the printer control codes which need to be removed. The part of the line prior to the “^” character. It does mean that there is a potentially hanging space at the end of some lines that had printer control codes. Small price to pay. |
Solved
A big thanks to all who contributed, especially to ghostdog74 for his awk script! I was able to get things going with the follwing script. It could most likely be cleaner, but it works. I wind up with output like:
3,DOC.SUPHP,rsh ps27 11 50,BALLOONZB,lpr -P BALLOONZB 97,CMI,lpr -P cmi 139,CT.SET2,lpr -P SETS2 4,SHIP7,rsh ps1 6 51,SALESHP.2,rsh ps19 12 98,SHIPZB4,rsh ps15 14 140,RECEIVING,rsh ps17 13 5,CT.SET7,lpr -P SETS7 52,INCOMING4,lpr -P INCOMING4 99,RDHP,lpr -P RDHP 141,SHIPSUPHP,lpr -P SHIPSUPHP 6,CUSTINTHP,rsh ps19 10 53,ZEBRA.TEST,rsh ps3 13 142,INCOMING2,lpr -P INCOMING2 Thanks again! ~Lance ##################################### # Program to update printer mappings # Lance Berrier # October 2007 ##################################### WORKING_DIR="/scratch/printing" PRINTER_FILE="printer.file.txt" SPOOL_DIR="/usr/jspooler" SPOOL_FILE="spoolers.txt" TEMP_OUT="tmp.txt" OUT_FILE="spoolers.csv" cd $SPOOL_DIR LIST jspool_log WITH QNUM = "DEVCONFIG]" QNUM QNAME QDEV > $WORKING_DIR/$SPOOL_FILE cd $WORKING_DIR sed -i.bak -e 's/PAGE.*$//g' -e 's/jspool_log.*$//g' -e 's/\x1B.*$//g' -e '/^$/d' $SPOOL_FILE ###################################### # individual sed statement, because I # can't' get it to work in the other # one. ###################################### sed -i.bak -e 'N;$!P;$!D;$d' $SPOOL_FILE ###################################### # ghostdog74's awk to get rid of the # wrapping. ###################################### awk 'BEGIN{RS="\nDEV"} NR==1{ for(i=1;i<=NF;i++){printf $i" "};print "" } NR>1{ printf "DEV" for(i=1;i<=NF;i++){ if ($i !~ /[\^\[]/) { printf $i" " } } print " " }' $SPOOL_FILE > $TEMP_OUT ##################################### # Format the output into something # useable ##################################### awk 'BEGIN{OFS=","} { if ($4=="cat") { print $2,$3,$4" "$5$6$7$8 } else if ($NF ~ /pl$/) { print $2,$3,$4$5$6$7$8 } else { print $2,$3,$4" "$5" "$6$7$8 } }' $TEMP_OUT > $OUT_FILE rm -f $TEMP_OUT |
PAix!
I wish I had seen your last post a little earlier! I tried it out and it worked great! Thanks again! ~Lance |
| All times are GMT -5. The time now is 09:06 PM. |