ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
need to convert a file like this. basically just need to identfy the start of a line with "ID", read that line into a variable, then output. continue to loop until eof.
$X,$X
$X,data from line 2
$X,data from line 3
$X,date from line 4
then reset variable when "ID" is found again, the repeat, etc.
xxxxx, yyyyy, zzzzz, here is just random data, but it is a whole line.
(input file)
ID = xyz name = abc
xxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzz
ID = THE name = band
xxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzz
(desired output file)
ID = xyz name = abc,ID = xyz name = abc
ID = xyz name = abc,xxxxxxxxxxxxxxxxxxxxxxxx
ID = xyz name = abc,yyyyyyyyyyyyyyyyyyyyyyyy
ID = xyz name = abc,zzzzzzzzzzzzzzzzzzzzzzzz
ID = THE name = band,ID = THE name = band
ID = THE name = band,xxxxxxxxxxxxxxxxxxxxxxxx
ID = THE name = band,yyyyyyyyyyyyyyyyyyyyyyyy
ID = THE name = band,zzzzzzzzzzzzzzzzzzzzzzzz
cat in
ID = xyz name = abc
1xxxxxxxxxxxxxxxxxxxxxxx
1yyyyyyyyyyyyyyyyyyyyyyy
1zzzzzzzzzzzzzzzzzzzzzzz
ID = THE name = band
2xxxxxxxxxxxxxxxxxxxxxxx
2yyyyyyyyyyyyyyyyyyyyyyy
2zzzzzzzzzzzzzzzzzzzzzzz
awk '/^ID/ {id=$0} {printf "%s,%s\n",id,$0}' <in
ID = xyz name = abc,ID = xyz name = abc
ID = xyz name = abc,1xxxxxxxxxxxxxxxxxxxxxxx
ID = xyz name = abc,1yyyyyyyyyyyyyyyyyyyyyyy
ID = xyz name = abc,1zzzzzzzzzzzzzzzzzzzzzzz
ID = THE name = band,ID = THE name = band
ID = THE name = band,2xxxxxxxxxxxxxxxxxxxxxxx
ID = THE name = band,2yyyyyyyyyyyyyyyyyyyyyyy
ID = THE name = band,2zzzzzzzzzzzzzzzzzzzzzzz
i forgot to say that the input file has lines before "^ID" and those need to be skipped (# of lines is random). input file also has blank lines in random places, so i need to also skip any blank lines.
as example:
(input file)
Code:
junk filler random data yada yada yada
junk filler // nothing random data yada yada yada
junk filler random data \\ sky is blue yada yada yada
junk filler **(texas should have won) random data yada yada yada
ID = xyz name = abc
xxxxxxx xxxxxxx = xxxxxxxxxx
yyy = yyyyyyyyyy = yyyyyyyyyyy
zzzzzzzzz = zzzzzzzz(YSYR) zzzzzzz
ID = THE name = band
xxxxx (YSTSTS) xxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyy\=(YHSGST) yyyyyyyyyyyy
zzzzzz = 09/22/11 zzzzzzzzzzzzzzzzzz
using (awk '/^ID/ {id=$0}$0 = id","$0' file) this is what i get
Code:
,junk filler random data yada yada yada
,junk filler // nothing random data yada yada yada
,junk filler random data \\ sky is blue yada yada yada
,junk filler **(texas should have won) random data yada yada yada
ID = xyz name = abc,ID = xyz name = abc
ID = xyz name = abc,xxxxxxx xxxxxxx = xxxxxxxxxx
ID = xyz name = abc,yyy = yyyyyyyyyy = yyyyyyyyyyy
ID = xyz name = abc,zzzzzzzzz = zzzzzzzz(YSYR) zzzzzzz
ID = xyz name = abc,
ID = THE name = band,ID = THE name = band
ID = THE name = band,xxxxx (YSTSTS) xxxxxxxxxxxxxxxxxxx
ID = THE name = band,yyyyyyyyyyyy\=(YHSGST) yyyyyyyyyyyy
ID = THE name = band,zzzzzz = 09/22/11 zzzzzzzzzzzzzzzzzz
Last edited by Linux_Kidd; 11-03-2011 at 12:37 PM.
Add condition "print only if id is set" to grail's awk command:
Code:
awk '/^ID/ {id=$0} id && $0 = id","$0' file
i still get wrong output (it doesnt skip the blank lines)
Code:
[root@host ~]$ more test3.txt
junk filler random data yada yada yada
junk filler // nothing random data yada yada yada
junk filler random data \\ sky is blue yada yada yada
junk filler **(texas should have won) random data yada yada yada
ID = xyz name = abc
xxxxxxx xxxxxxx = xxxxxxxxxx
yyy = yyyyyyyyyy = yyyyyyyyyyy
zzzzzzzzz = zzzzzzzz(YSYR) zzzzzzz
ID = THE name = band
xxxxx (YSTSTS) xxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyy\=(YHSGST) yyyyyyyyyyyy
zzzzzz = 09/22/11 zzzzzzzzzzzzzzzzzz
[root@host ~]$ awk '/^ID/ {id=$0} id && $0 = id","$0' test3.txt |more
ID = xyz name = abc,ID = xyz name = abc
ID = xyz name = abc,xxxxxxx xxxxxxx = xxxxxxxxxx
ID = xyz name = abc,yyy = yyyyyyyyyy = yyyyyyyyyyy
ID = xyz name = abc,zzzzzzzzz = zzzzzzzz(YSYR) zzzzzzz
ID = xyz name = abc,
ID = THE name = band,ID = THE name = band
ID = THE name = band,xxxxx (YSTSTS) xxxxxxxxxxxxxxxxxxx
ID = THE name = band,yyyyyyyyyyyy\=(YHSGST) yyyyyyyyyyyy
ID = THE name = band,zzzzzz = 09/22/11 zzzzzzzzzzzzzzzzzz
i was trying someing like this, but cant get the ELSE part to work as desired (that is, know if "ID" was already found by using variable)
Code:
#!/bin/awk -f
BEGIN {
OFS=",";
}
{
if ( $1 == "ID" ) {
id=$0;
print $0,$0;}
else {
if (id contains "ID") {
print id,$0;}
}
}
Last edited by Linux_Kidd; 11-03-2011 at 01:01 PM.
i did some playing around with awk script, came up with this. it ignores all lines up until it finds $1=ID, and also ignores empty lines. seems to work ok. i will need to manually chop out of the output file a few lines at the end since my input file has no definitive marker, but no big deal. any way to simplify?
Is there more junk mixed in through the file after ID is discovered for the first time?
Are there always blank lines between IDs?
It helps if you can describe your input data more if we are to mold the solution.
its a output file from CA Top Secret (mainframe report). i cannot find definitive patterns of blank lines or definitive markers between the junk and the 1st "ID". after the 1st ID the data flows ID-data-data-data ID-data-data ID-data-data-data-data-data, etc, with some blank lines in there. let me see if i can sanitize a portion of my real file (its a large file) and i will post it.
thnx.
This seems overly complicated. Let me see if I understand the file structure:
1. Any amount of crap but definitely not the letters ID prior to the first invocation of ID
2. Once ID is found there will be lines of data to be prepended with the ID and a comma
3. There may also occur blank (you may need to confirm if blank means nothing but a newline or possible could be white space as well) lines after ID is found
So based on the above the idea is ALL lines must be printed irrelevant of data but any post ID string being found must have ID string and a comma inserted (correct?)
Is there more junk mixed in through the file after ID is discovered for the first time?
Are there always blank lines between IDs?
It helps if you can describe your input data more if we are to mold the solution.
grail,
here is raw source (sanitized fubar). i dont need anyting until 1st occurance of "ID", no blank lines, and i dont need "TSP0320I LIST FUNCTION SUCCESSFUL" near the end or anything after that, etc. notice i also skip the following (or similar crud)that is wedged between pages and/or ID's.
1COMPUTER ASSOCIATES ***** T S S C O M M A N D P R O C E S S O R ***** TSSSSSDB PAGE 2
CA-POT RET/VS 2.0 12/04/2010 11.36.04
my real source file is ~30k lines and has many many ID's with each ID having random # of lines associated with ID, etc. i dunno if the TS report can be created in different ways, but this is the raw source i have to work with.
can you get this into a one line awk? if so hats off to you. my code in post #11 does the job, but i like simpler if you can achieve that. thnx.
output is OFS="|"
and should look like this (the dots just mean continue on, etc)
output file
Code:
ID = TESTTEST|ID = TESTTEST
ID = TESTTEST|TYPE = MASTER SIZE = 4352 BITS
ID = TESTTEST|FACILITY = *ALL*
ID = TESTTEST|CREATED = 07/25/01 LAST MOD = 08/29/09 09:46
ID = TESTTEST|PROFILED = PRFGTYU
.
.
.
ID = AKIM|ID = AKIM
ID = AKIM|TYPE = CENTRAL SIZE = 512 BITS
ID = AKIM|FACILITY = *ALL*
ID = AKIM|CREATED = 08/29/09 LAST MOD = 10/05/09 15:14
ID = AKIM|PROFILED = PRFGTYU PRFBATCH PROFGEN
.
.
.
input file
Code:
1// JOB TSSLIST *** TSS INIT COMMANDS *** DATE 12/04/2010, CLOCK 11/36/05
// EXEC TSSSSSDB
1S23D PHASE TSSSSSDB IS TO BE FETCHED FROM CAISLIF.PRODUCT
1COMPUTER ASSOCIATES ***** T S S C O M M A N D P R O C E S S O R ***** TSSSSSDB PAGE 1
CA-POT RET/VS 2.0 12/04/2010 11.36.04
*------------------------------------------------------------------------------*
TSS LIST(BASICS) DATA(ALL)
TSS LIST(BASICS) DATA(ALL)
ID = TESTTEST NAME = MASTER SECURITY
TYPE = MASTER SIZE = 4352 BITS
FACILITY = *ALL*
CREATED = 07/25/01 LAST MOD = 08/29/09 09:46
PROFILED = PRFGTYU
ATTRIBUTES = TTY1
LAST USED = 08/30/09 16:19 CPU(VPEA) FAC(ICDG ) COUNT(16381)
VSESLIB = FJSWSRS. FARTSY.
VSESLIB = PRO1. PRO2.
DATASET = *****
VOLUMES = *ALL*(G)
DCT = *ALL*
FCT = *ALL*
JCT = *ALL*
MODE = WARN
OTRAN = DITT
PANEL = REXX
PPT = *ALL*
TERMINAL = *ALL* K
TST = *ALL*
XA VSELIB = VSE.LIBRARY.BLABBER OWNER(IRMST )
ACCESS = SOME
XA VSELIB = VSE.FARTSY.LIBRARY.BLABBER OWNER(IRMST )
ACCESS = SOME
XA VSELIB = VSE.PRO1.LIBRARY.PRO1 OWNER(IRMST )
ACCESS = SOME
XA VSELIB = VSE.PRO2.LIBRARY.PRO2 OWNER(IRMST )
ACCESS = SOME
XA VSELIB = VSE.SYSRES.LIBRARY.IJSYSRS OWNER(IRMST )
ACCESS = SOME
XA DATASET = VSE OWNER(IRMST )
ACCESS = SOME
XA MODE = FAIL OWNER(IRMST )
XA OTRAN = *ALL* OWNER(IRMST )
ACCESS = SOME
XA OTRAN = MD OWNER(IRMST )
ACCESS = SOME
XA OTRAN = TSS OWNER(IRMST )
ACCESS = SOME
----------- SEGMENT CIPS
OPIDENT = AST
BASICS = AKIM -SC SAM -SC CIPS (D) CLIMENT(V)
CLEN -SC COMM(V) ERTT -SC GNIT -SC
IAUPTRUD-SC JOTC -SC JTAO -SC JVOL -SC
BLABBER (Z) TESTY (D)
ID = AKIM NAME = VALUE ADD
1COMPUTER ASSOCIATES ***** T S S C O M M A N D P R O C E S S O R ***** TSSSSSDB PAGE 2
CA-POT RET/VS 2.0 12/04/2010 11.36.04
TYPE = CENTRAL SIZE = 512 BITS
FACILITY = *ALL*
CREATED = 08/29/09 LAST MOD = 10/05/09 15:14
PROFILED = PRFGTYU PRFBATCH PROFGEN
ATTRIBUTES = TTY1,VSECATBT,VSERDDIR,VSESYSAD,VSEMCON
LAST USED = 10/05/09 15:14 CPU(VPEA) FAC(ICDG ) COUNT(00177)
----------- SEGMENT CIPS
OPIDENT = OPD
----------- SEGMENT IESIS
IESFL1 = BAS,COD,VSAT
IESFL2 = BQS,ESC,CSU,CSD,OSPD,XSM
IESINIT = IPSEABH
IESTYPE = USERTYPE2,NEW,SELECT
IESVCAT = TESTCAP
----------- ADMINISTRATION AUTHORITIES
RESOURCE = XAUTH,INFO
ACCESS = SOME
ECID = *ALL*
FACILITIES = *ALL*
LIST DATA = *ALL*,PROFILED,PASSFOO
MISC1 = SUSPEND
MISC8 = LISTSTC,LISTRDT,REMASUSP,MCS
ID = SAM NAME = SAM WALBERG
TYPE = CENTRAL SIZE = 512 BITS
FACILITY = *ALL*
CREATED = 04/17/07 LAST MOD = 08/01/09 11:50
PROFILED = PRFGTYU PRFBATCH PROFGEN
ATTRIBUTES = TTY1,VSECATBT,VSERDDIR,VSESYSAD,VSEMCON
LAST USED = 09/28/09 01:59 CPU(VPEA) FAC(BATCH ) COUNT(04165)
----------- SEGMENT CIPS
OPIDENT = OP5
----------- SEGMENT IESIS
IESFL1 = BAT,COD,VSAM
IESFL2 = BQA,ESC,COU,CMD,OLPD,XRM
IESINIT = IESEADM
IESTYPE = USERTYPE14,NEW,SELECT
IESVCAT = TESTCAP
----------- ADMINISTRATION AUTHORITIES
RESOURCE = *ALL*
ACCESS = SOME
ECID = *ALL*
FACILITIES = *ALL*
LIST DATA = *ALL*,PROFILED,PASSFOO
MISC1 = *ALL*
MISC2 = *ALL*
MISC3 = *ALL*
MISC8 = LISTSTC,LISTRDT,REMASUSP,MCS,LISTSDT
MISC9 = *ALL*
ID = RRQMTTTT NAME = RQAMT TEST USER
TYPE = USER SIZE = 512 BITS
FACILITY = TEST
DEPT ECID = TESTY DEPARTMENT = TEST USERS
CREATED = 07/13/06 LAST MOD = 08/24/06 15:29
PROFILED = PRFRRQMT PRFBATCH
LAST USED = 07/14/06 13:33 CPU(VSEB) FAC(BATCH ) COUNT(00012)
XA VSESLIB = DB3LIPS.TESTBTCH OWNER(IRMST )
ACCESS = READ
XA OTRAN = CEDF OWNER(IRMST )
ACCESS = EXECUTE
----------- SEGMENT CIPS
OPIDENT = BBJDS
TSP0320I LIST FUNCTION SUCCESSFUL
*------------------------------------------------------------------------------*
1COMPUTER ASSOCIATES ***** T S S C O M M A N D P R O C E S S O R ***** TSSSSSDB PAGE 444
CA-POT RET/VS 2.0 12/04/2010 11.36.04
TSS INPUT STATEMENTS READ 1
TSS COMMANDS PROCESSED 1
TSS BATCH ENVIRONMENT ERRORS 0
TSS COMMAND ERRORS 0
1EOJ TSSLIST DATE 12/04/2010, CLOCK 11/36/20, DURATION 00/00/15
Last edited by Linux_Kidd; 11-04-2011 at 12:25 PM.
The tr converts all newline conventions to standard Unix newlines.
The first sed removes whitespace around equals signs. Also, if there are multiple consecutive whitespaces, followed by some term (which may contain nonconsecutive whitespaces) and an equals sign, it splits the line at the consecutive whitespace. This makes sure each fact is on its own line.
The second sed removes leading and trailing whitespace, all whitespace before a close parenthesis, and combines multiple consecutive whitespaces into one. (Because the first sed introduces new newlines, I find it is easiest to flatten the data stream by using a separate sed command. It makes it easier to develop such long pipe stanzas.)
The awk part picks the ID values (also printing them alone), and for any line containing an equals sign, prints the id and the line. If you do not need the ID alone, just omit the first print .
If the input may contain pipes, I recommend prepending s/|/!/g; to the first sed pattern.
If you prefer the whitespace around = and |, add | sed -e 's/\([|=]\)/ \1 /g' just before the >outfile .
To see which input lines are ignored/omitted by the above command, replace the end, starting at awk, with grep -v -e '=' -e '^[\t\v\f ]*$'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.