LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 10-16-2007, 09:57 PM   #1
hotrodmacman
LQ Newbie
 
Registered: May 2006
Location: North Carolina
Distribution: Zenwalk
Posts: 3

Rep: Reputation: 0
sed/awk/grep for multiple line data


I've been banging my head up against sed/awk/grep for quite some time now trying to parse a text data file that comes out like so:

There are 4 columns:
DEV, Queue Number, NAME, Assignment

DEVCONFIG*143 143 INCOMING3 lpr -P INCOM
ING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14 ^[[H^[[2J
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/p
ublic/pbinvo
ices.txt
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/script
s/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ
B.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_AD
MIN

Functionally I need to "unwrap" the data in column 4 (assignment) so that it prints all on one line, i.e. lpr -P IT_ADMIN, get rid of the empty spaces left in columns 1-3, and remove the control codes that print out after some of the lines ^[[H^[[2J.

This is just an excerpt from the file, the data in column 4 varies depending on printer assignment. The control codes are pagination codes that are put in the output by a LIST command on the data table in question.

Unfortunately, I do not know a way to pre-format this output. Is there anyone here familiar with jQL queries for jBASE, or that can give some quick and dirty PROC snippets that will do it?

I'm reasonably sure there is a way to do this with sed/awk, however I have been unable to find any relevant examples or a document that explains regular expressions such that I understand what is going on.

Thanks in advance!


--
Lance B.
 
Old 10-16-2007, 10:38 PM   #2
yongitz
Member
 
Registered: Nov 2005
Location: Davao City, Philippines
Distribution: RHEL, CentOS, Ubuntu, Mint
Posts: 139

Rep: Reputation: 20
Hi! Can u give a clear example of your expected output? Anyway, try the code below if it can get the job done.


Code:
awk '{print $4,$5,$6}' FILE
 
Old 10-16-2007, 10:46 PM   #3
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
You can use [code] tags to improve the readability of this sort of data (it uses a fixed width font and preserves whitespace).

I didn't quite understand what you want. I get the bit about un-word-wrapping lines like this:
Code:
DEVCONFIG*155 155 PBINV cat >/data/p
ublic/pbinvo
ices.txt
...but I didn't understand the part about the spaces in the first three columns and control codes.
 
Old 10-16-2007, 11:29 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
one way out of many...
Code:
awk 'BEGIN{RS="\nDEV"}
NR==1{ for(i=1;i<=NF;i++){printf $i" "};print "" }
NR>1{
	  printf "DEV"
	  for(i=1;i<=NF;i++){
	       if ($i !~ /[\^\[]/) {
		    printf $i" "
	       }
	  }
	  print " "
}' "file"
output:
Code:
# ./test.sh
DEVCONFIG*143 143 INCOMING3 lpr -P INCOM ING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/p ublic/pbinvo ices.txt
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/script s/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ B.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_AD MIN
i leave it to you to do the rest..
 
Old 10-16-2007, 11:55 PM   #5
PAix
Member
 
Registered: Jul 2007
Location: United Kingdom, W Mids
Distribution: SUSE 11.0 as of Nov 2008
Posts: 195

Rep: Reputation: 40
Quote:
#!/bin/sh

# Read until line complete. Anything after the first line is part of field 4 - Assignment

cat hotroddata | awk '
$1 !~ /DEVCONFIG/ { if ((fourzerotwo=="") && (fourzeroone==""))
fourzeroone = $0
if ((fourzeroone!="") && (fourzerotwo==""))
fourzerotwo = $0 }

$1 ~ /DEVCONFIG/ { print prime, fourzero fourzeroone fourzerotwo;
fourzero=fourzeroone=fourzerotwo="";
prime = ($1 " " $2 " " $3);
fourzero = $4 " " $5 " " $6 " " $7 " " $8 " " $9 }

END { if ((fourzerotwo != "") || (fourzeroone != ""))
print prime, fourzero fourzeroone fourzerotwo; }
' | less ## > output file

## I'm sure you will know what to do with the last line when you have it licked.
~
I regret that it is late in the UK and I have to be up in the morning so my solution is only partial due to lack of time on my part.

I copied your input data by cutting and pasting it and called it "hotroddata". So we can see your input data from your post and here is my output data which falls far short of your mark - at the moment.

DEVCONFIG*143 143 INCOMING3 lpr -P INCOM ING3ING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14 ^[[H^[[2J
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/p ublic/pbinvoublic/pbinvo
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/script s/email.pls/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZ B.4B.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_AD MINMIN


On the basis that all lines began with DEVCONFIG and that I was likely to have an uncommitted print at the end I proceeded.

Question: did you want the first line of your data to actually look like this:

DEVCONFIG*143143INCOMING3 lpr -P INCOMING3ING3

This is what I think Mathew was confused about, and I had missed, the first three columns want concatenated without white space?
if that's correct, then removing the spaces in the assignment to "prime" should fix that no bother.

I took a blunt instrument approach to ensuring that elements of column 4 (5, 6 ,7, 8 etc) didn't lose their proper spaces. This of course caused a bit of a problem in that it put spaces where the data was originally broken over lines.

The multiple spaces I didn't see as a problem as I would just pipe the output through tr to single-up multiple spaces. The space prior to the broken line (see AD MINMIN [Woops that extra MIN came from sloppy thinking in the END tag] I would get rid of by writing an awk function to process things slightly more elegantly.

The control sequences ^[[H^[[2J I would get rid of in my function by discarding any ^ character and anything to the right of it.

I would imagine you will be well fixed by the time I get to have a look in again tomorrow evening. Good luck. A shame my sed is dusty (I think sed would have been more appropriate for sorting the lines) and my awk is a couple of years off speed and in need of exercising.

How to preserve indentation of code in this editor without compromising it?

PAix
 
Old 10-17-2007, 09:31 AM   #6
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,314

Rep: Reputation: 175Reputation: 175
example input and output may help
 
Old 10-17-2007, 10:22 PM   #7
PAix
Member
 
Registered: Jul 2007
Location: United Kingdom, W Mids
Distribution: SUSE 11.0 as of Nov 2008
Posts: 195

Rep: Reputation: 40
Hi Lance, Too late to be of use I guess,

Output file:
Quote:
DEVCONFIG*143 143 INCOMING3 lpr -P INCOMING3
DEVCONFIG*144 144 PACKHP rsh ps22 11
DEVCONFIG*145 145 CT.SET3 lpr -P SETS3
DEVCONFIG*146 146 PACKHP.2 rsh ps21 9
DEVCONFIG*147 147 MOLDHP rsh ps6 7
DEVCONFIG*148 148 SHIP5 rsh ps1 8
DEVCONFIG*149 149 PACKHP.1 rsh ps6 8
DEVCONFIG*150 150 MEC2 rsh ps1 11
DEVCONFIG*151 151 CT.SET4 lpr -P SETS4
DEVCONFIG*152 152 PRINTRONIX1 rsh ps1 13
DEVCONFIG*153 153 SHIP1 rsh ps1 14
DEVCONFIG*154 154 RMHP rsh ps16 9
DEVCONFIG*155 155 PBINV cat >/data/public/pbinvoices.txt
DEVCONFIG*156 156 CT.SET5 lpr -P SETS5
DEVCONFIG*157 157 SETHP rsh ps25 16
DEVCONFIG*158 158 PEGHP rsh ps5 13
DEVCONFIG*159 159 POLYEHP rsh ps6 2
DEVCONFIG*160 160 EMAIL /data/scripts/email.pl
DEVCONFIG*161 161 PACKZB.4 lpr -P PACKZB.4
DEVCONFIG*162 162 HR rsh ps13 2
DEVCONFIG*163 163 ARHP rsh ps10 4
DEVCONFIG*164 164 ARVHP lpr -P ARVHP
DEVCONFIG*165 165 ARHP.5 rsh ps13 8
DEVCONFIG*166 166 ITADMIN lpr -P IT_ADMIN
The input file was shown in post #1, but a mailing showed it to have been substantially mangled by the forum browser window. It should have had multiple spaces between columns 1, 2 and 3,4 (DEV, Queue Number, NAME, Assignment). The extension of column 4 has significant leading space(?) fill, but don't rule out the possibility of tabs. I took my new input file from the mailing, which you may have seen and called it hotroddata2, So here is my final code which may not be over clever but does the job just fine.

Quote:
#!/bin/sh

cat hotroddata2 | tr -s "\t " " " | awk '
$1 ~ /DEVCO/ { if (NR!=1)
print ""

printf $0
}

$1 !~ /DEVCO/ { printf substr($0, 2) }

END { print "" }
' | awk '
{ split($0, newlin, "^");
print newlin[1] }
'
## From this point redirect or pipe stdout to the
## file or app of your choice
The file is passed into tr to strip multiple spaces (or tabs).
This essentially cleans up the spacing in each DEVCO line and leaves other lines with a single leading space that can be easily discarded.
The result of the tr is piped into the first of two awk scripts, where DEVCO lines print the newline for any previous records and the current line is printed without newline.
Non-Devco lines are stripped of their leading space by printing the complete line beginning with the second character, without newline. This has the effect of concatenating column 4 assignment data onto it's original DEVCO record.
The END pattern prints a newline to complete the final record.

This script output is piped into the second awk script where lines are split on a delimiter of “^” which appears to be the lead character for the the printer control codes which need to be removed. The part of the line prior to the “^” character. It does mean that there is a potentially hanging space at the end of some lines that had printer control codes. Small price to pay.

Last edited by PAix; 10-17-2007 at 11:53 PM.
 
Old 10-18-2007, 12:05 PM   #8
hotrodmacman
LQ Newbie
 
Registered: May 2006
Location: North Carolina
Distribution: Zenwalk
Posts: 3

Original Poster
Rep: Reputation: 0
Smile Solved

A big thanks to all who contributed, especially to ghostdog74 for his awk script! I was able to get things going with the follwing script. It could most likely be cleaner, but it works. I wind up with output like:

3,DOC.SUPHP,rsh ps27 11
50,BALLOONZB,lpr -P BALLOONZB
97,CMI,lpr -P cmi
139,CT.SET2,lpr -P SETS2
4,SHIP7,rsh ps1 6
51,SALESHP.2,rsh ps19 12
98,SHIPZB4,rsh ps15 14
140,RECEIVING,rsh ps17 13
5,CT.SET7,lpr -P SETS7
52,INCOMING4,lpr -P INCOMING4
99,RDHP,lpr -P RDHP
141,SHIPSUPHP,lpr -P SHIPSUPHP
6,CUSTINTHP,rsh ps19 10
53,ZEBRA.TEST,rsh ps3 13
142,INCOMING2,lpr -P INCOMING2

Thanks again!

~Lance

#####################################
# Program to update printer mappings
# Lance Berrier
# October 2007
#####################################
WORKING_DIR="/scratch/printing"
PRINTER_FILE="printer.file.txt"
SPOOL_DIR="/usr/jspooler"
SPOOL_FILE="spoolers.txt"
TEMP_OUT="tmp.txt"
OUT_FILE="spoolers.csv"

cd $SPOOL_DIR
LIST jspool_log WITH QNUM = "DEVCONFIG]" QNUM QNAME QDEV > $WORKING_DIR/$SPOOL_FILE
cd $WORKING_DIR


sed -i.bak -e 's/PAGE.*$//g' -e 's/jspool_log.*$//g' -e 's/\x1B.*$//g' -e '/^$/d' $SPOOL_FILE

######################################
# individual sed statement, because I
# can't' get it to work in the other
# one.
######################################

sed -i.bak -e 'N;$!P;$!D;$d' $SPOOL_FILE


######################################
# ghostdog74's awk to get rid of the
# wrapping.
######################################

awk 'BEGIN{RS="\nDEV"}
NR==1{ for(i=1;i<=NF;i++){printf $i" "};print "" }
NR>1{
printf "DEV"
for(i=1;i<=NF;i++){
if ($i !~ /[\^\[]/) {
printf $i" "
}
}
print " "
}' $SPOOL_FILE > $TEMP_OUT

#####################################
# Format the output into something
# useable
#####################################

awk 'BEGIN{OFS=","}
{
if ($4=="cat")
{
print $2,$3,$4" "$5$6$7$8
}
else if ($NF ~ /pl$/)
{
print $2,$3,$4$5$6$7$8
}
else
{
print $2,$3,$4" "$5" "$6$7$8
}
}' $TEMP_OUT > $OUT_FILE

rm -f $TEMP_OUT
 
Old 10-18-2007, 12:06 PM   #9
hotrodmacman
LQ Newbie
 
Registered: May 2006
Location: North Carolina
Distribution: Zenwalk
Posts: 3

Original Poster
Rep: Reputation: 0
PAix!

I wish I had seen your last post a little earlier! I tried it out and it worked great!

Thanks again!

~Lance
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 12:54 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 09:55 AM
Need to strip words from front of line. sed/awk/grep? joadoor Linux - Software 6 08-28-2006 05:39 AM
diffrence between grep, sed, awk and egrep Fond_of_Opensource Linux - Newbie 3 08-18-2006 09:15 AM
How can I awk/sed/grep the IPs from the maillog? abefroman Programming 7 03-09-2006 11:22 AM


All times are GMT -5. The time now is 09:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration