TsubasaJM 08-23-2011 04:52 AM

Manipulating data per paragraph
Hi there,

I'm still pretty novice to bash, but I have some experience in manipulating data per line by using the 'while read' structure and for example extracting some data with awk and such.

Now I have an ldif file, which has data in it per paragraph (separated by a newline). Now I want to extract data from it (different attributes in it) per ldif record, which is harder cause there is no 'read paragraph' of course. :)

Example, if I have an ldif file like this:

dn: uid=test1,cn=example,cn=dom
time: 20110822105940
modifyTimestamp: 20110822085944Z

dn: uid=test2,cn=example,cn=dom
time: 20110822105941
modifyTimestamp: 20110822085945Z

dn: uid=test3,cn=example,cn=dom
time: 20110822105942
modifyTimestamp: 20110822085946Z

I want to be able to do checks and compares of attributes in every record, like comparing time and modifyTimestamp, but I need to do it in all of the records (skip none) and I would like it generic enough so I'm able to use it in a for or while loop and, say, grep the data out of it.

I'm sure there are cool solutions for this in sed or awk and I hope I can learn from it how it's done and I can try to use it to my own liking.

Thanks in advance!

grail 08-23-2011 10:10 AM

Well using awk you can set the record separator and that will then make each paragraph a record. You will then need to choose whether you want to split the data or use
the field separator ... it will depend on how uniform your data is. As a demo:

awk '{print NR}' RS="" file
With above example this will print from 1 to 3

chrism01 08-23-2011 09:50 PM

Depends how complex these checks are, but I'd seriously consider using Perl; its very good at this sort of thing.

TsubasaJM 08-24-2011 05:31 AM

Thanks all for replying.

grail: the awk option sounds quite useful but now I need to figure out how to get data out of it on a per record basis. So how would I use awk to enter a sort of while or for loop (or split it so I can do stuff on each record individually) and grab the time and modifyTimestamp attributes from each record?

chrism01: thanks for the tip but I suck even more at perl than I suck at bash ;)

grail 08-24-2011 06:13 AM

Here is my bible. Have a look at field separators and the split function.

