LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-15-2012, 02:56 PM   #1
StupidNewbie
Member
 
Registered: Dec 2007
Posts: 71

Rep: Reputation: 16
Help with Sed/Grep/Awk for file parsing


Hi everyone,

I need some help with a program I'm working on. Say I want to read in a file that looks like:

Code:
site:nameofsite
username:nameofuser
ipaddress:ipofsite
password:somehashedvalue
site:nameofsite2
username:nameofuser2
ipaddress:ipofsite2
password:somehashedvalue2
site:nameofsite3
username:nameofuser3
ipaddress:ipofsite3
password:somehashedvalue3
I can run grep and do something like:

Code:
cat file | grep 'site' | cut -d ':' -f2
and get "nameofsite" for each site. However, if I then put:

Code:
grep 'username' | cut -d ':' -f2
I get nothing. In fact, the program hangs and I have to ctrl+c to get out of it. The output I am trying to get is:

Code:
nameofsite
nameofuser
ipofsite
somehashedvalue
nameofsite2
nameofuser2
ipofsite2
somehashedvalue2
nameofsite3
nameofuser3
ipofsite3
somehashedvalue3
so that I can assign those items to variables. I have tried using sed to replace anything before the ':' with nothing (i.e. sed 's/.*://') but unfortunately the file I'm parsing is a bit more complicated than the one above. I am using this as an example for simplicity as I feel there must be a way to make grep go back and search again from the top of the file for a new string.

Does anyone have any idea how to make that happen?

Thanks in advance!!!
 
Old 03-15-2012, 03:08 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by StupidNewbie View Post
Code:
grep 'username' | cut -d ':' -f2
I get nothing.
It simply misses the file name. In this case grep expects input from the keyboard (standard input) and you have to terminate it using Ctrl-D, whereas Ctrl-C interrupts the whole process. Anyway, why don't you show us a piece of the real input? Maybe we can give some more help.
 
Old 03-15-2012, 03:14 PM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Code:
grep 'username' | cut -d ':' -f2
There is no filename or stdin input given here for grep to read, so it sits there waiting for you to give it some.


Code:
grep 'username' inputfile | cut -d ':' -f2
Notice that this also avoids the Useless Use Of Cat that your first grep command is guilty of.
 
Old 03-16-2012, 12:34 PM   #4
StupidNewbie
Member
 
Registered: Dec 2007
Posts: 71

Original Poster
Rep: Reputation: 16
Thanks for the replies. David, I'm not actually using cat. I am using tail -500 logfile.log (it's a log file) | grep 'stuff' | cut -d blah blah blah

Unfortunately I can't get a sample of the exact output because it is on a private system, but it basically takes this format:

[mm/dd/yyyy hh:mm:ss] Creating a connection config for: SITE
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set url = URL
[mm/dd/yyyy hh:mm:ss] Creating a connection config for: SITE
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set url = URL
[mm/dd/yyyy hh:mm:ss] Creating a connection config for: SITE
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set parameter: PARAM
[mm/dd/yyyy hh:mm:ss] Set url = URL

I need PARAM, PARAM, PARAM and URL for each site. The desired output would be

SITE
PARAM
PARAM
PARAM
URL

SITE
PARAM
PARAM
PARAM
URL

SITE
PARAM
PARAM
PARAM
URL

And actually, it doesn't even need to be output, I just need those things separated out so that I can manipulate them. It seems I can get all of the sites, all of one param or another, or all the URLs using grep and cut/sed, but I can't get them ordered in the way I want because once the file is "grepped" once, grep doesn't continue from the top again. I hope this isn't too vague. I would love to be able to post the actual log file but I just can't do it
 
Old 03-16-2012, 12:39 PM   #5
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
If it’s always the last column:
Code:
$ awk '{ print $NF }' file
 
1 members found this post helpful.
Old 03-16-2012, 12:54 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Try this...
Code:
|tr = : |sed 's/.*://'
Daniel B. Martin
 
Old 03-16-2012, 02:14 PM   #7
StupidNewbie
Member
 
Registered: Dec 2007
Posts: 71

Original Poster
Rep: Reputation: 16
Thanks guys! Both of these look to have potential but neither of them worked quite like I'd expected. The reason is that some of the PARAMS have special characters in them (for example one of them is a DN string like cn=username,ou=a,ou=b,ou=c,dc=a,dc=b,dc=c)

With awk, I was able to get everything except one of the PARAMs which happens to have spaces in it (I assume because awk is using the last field and using a space as the delimiter?)

With translate I was able to get SITE and only one of the PARAMs, I'm guessing because some of the PARAMs have colons in them. I've come up with some stuff I can post without giving away too much info. This is the exact format the log file follows (punctuation and everything):

Code:
[mm/dd/yyyy hh:mm:ss] Creating a connection config for: SITE1
[mm/dd/yyyy hh:mm:ss] Set parameter: some.stuff.i.dont.need
[mm/dd/yyyy hh:mm:ss] Set parameter: java.naming.security.principal=CN=user,OU=a,OU=b,DC=a,DC=b,DC=c,DC=d,DC=e,DC=f
[mm/dd/yyyy hh:mm:ss] Set parameter: java.naming.security.credentials=somehashedvalue
[mm/dd/yyyy hh:mm:ss] Set parameter: some.more.stuff.i.dont.need
[mm/dd/yyyy hh:mm:ss] Set java.naming.provider.url = http://www.example.com/
Creating a connection config for: SITE2
[mm/dd/yyyy hh:mm:ss] Set parameter: some.stuff.i.dont.need
[mm/dd/yyyy hh:mm:ss] Set parameter: java.naming.security.principal=CN=user,OU=a,OU=b,OU=c,DC=a,DC=b,DC=c,DC=d,DC=e
[mm/dd/yyyy hh:mm:ss] Set parameter: java.naming.security.credentials=somehashedvalue
[mm/dd/yyyy hh:mm:ss] Set parameter: some.more.stuff.i.dont.need
[mm/dd/yyyy hh:mm:ss] Set java.naming.provider.url = http://www.example2.com/
Creating a connection config for: SITE3
[mm/dd/yyyy hh:mm:ss] Set parameter: some.stuff.i.dont.need
[mm/dd/yyyy hh:mm:ss] Set parameter: java.naming.security.principal=CN=user,OU=a,OU=b,OU=c,OU=d,DC=a,DC=b,DC=c,DC=d
[mm/dd/yyyy hh:mm:ss] Set parameter: java.naming.security.credentials=somehashedvalue
[mm/dd/yyyy hh:mm:ss] Set parameter: some.more.stuff.i.dont.need
[mm/dd/yyyy hh:mm:ss] Set java.naming.provider.url = http://www.example3.com/
What I need is the following:

SITE1
CN=user,OU=a,OU=b,DC=a,DC=b,DC=c,DC=d,DC=e,DC=f
somehashedvalue
http://www.example.com/

SITE2
CN=user,OU=a,OU=b,OU=c,DC=a,DC=b,DC=c,DC=d,DC=e
somehashedvalue
http://www.example2.com/

SITE3
CN=user,OU=a,OU=b,OU=c,OU=d,DC=a,DC=b,DC=c,DC=d
somehashedvalue
http://www.example3.com/

Note that the OU structures are different and will vary depending on the site, so I do not have a specific number of fields for that line unfortunately. Likewise notice that there are a couple random lines in the middle of each block which I don't need, although it might be ok because I can probably grep them out if I can get everything else right. Unfortunately the format's not uniform but if I can get close I might be able to figure the rest out on my own. I'm still playing with awk and tr to see if I can get this to work, but in the mean time if you guys are able to get the output above from the code above that, I should be in business!

Thanks again for all the help.
 
Old 03-16-2012, 03:21 PM   #8
StupidNewbie
Member
 
Registered: Dec 2007
Posts: 71

Original Poster
Rep: Reputation: 16
I got it guys! I ended up just piping a bunch of sed commands together after using awk to print out the last field using the field separator of ":"! I used a little bit of each of your replies combined with a bit of my own tweaking!

Here is my final command (I tested it on the real log file and with a little tweaking I got it to work like I planned). This assumes that "testfile" has the format given above:

Code:
cat testfile | awk -F ": " '{ print $NF }' | grep -v 'need' | sed 's/.*java.naming.security//' | sed 's/.*principal=//' | sed 's/.*credentials=//' | sed 's/.*provider.url = //'
THANKS!!!

Last edited by StupidNewbie; 03-16-2012 at 03:23 PM.
 
Old 03-17-2012, 11:17 AM   #9
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
There's generally no need to mix and match grep, sed, and awk. sed can do everything grep can do and more, and awk is a full text-processing scripting language that can completely replace the other two, and then some.

grep and sed can also be handed multiple expressions at once, using the "-e" option.

Also, don't forget that "." is a regex operator, meaning "match any character", so you have to escape it or use a bracket expression if you want to match a literal period.

Code:
sed -e '/need/d' -e 's/.*for: //' -e 's/.*java\.naming\.security//' -e 's/.*principal=//' -e 's/.*credentials=//' -e 's/.*provider\.url = //' infile.txt
It's possible to compact the command even more if you use extended regular expressions (the -r option in sed). Then you can use parentheses to group a list of alternate values to match (separated by "|").

Code:
sed -r -e '/need/d' -e 's/.*(for: |java\.naming\.security|principal=|credentials=|provider\.url = )//' infile.txt

Here are a few useful sed references.
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt

Here are a few useful awk references:
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/man...ode/index.html
http://www.pement.org/awk/awk1line.txt
http://www.catonmat.net/blog/awk-one...ined-part-one/

A couple of regular expressions tutorials:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html
 
Old 03-18-2012, 01:42 PM   #10
StupidNewbie
Member
 
Registered: Dec 2007
Posts: 71

Original Poster
Rep: Reputation: 16
Thanks David. Even though I got this to work, I will give that a shot too. I tried using Sed before (by itself) and it just became so cluttered and cryptic I couldn't keep track of what I was replacing. Also, there were some quirks like Sed not properly interpreting brackets {} in order to make the pattern repeat a specific number of times, which became an issue with the OU string since DC= repeats multiple times, as well as OU=, and it's an unknown number of repetitions each time. Anyway, I will give your code a shot and see if it looks cleaner and works the same way. Thanks!
 
Old 03-18-2012, 01:52 PM   #11
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
NB: Your first post showed Ubuntu and the latter Mac OS X from where you are posting. On a Mac the delivered sed is the BSD version and has no -r option. In case you are using it thereon you can compile the GNU sed though, like I did for exactly that purpose.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Parsing help. Using grep and awk for the creation of a configuration file. dragos240 Linux - Software 2 03-05-2012 02:19 AM
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 11:54 AM
[SOLVED] AWK / SED - Parsing a CSV file with comma delimiter, and some extra needs. PenguinJr Programming 8 05-24-2011 06:28 PM
Parsing through a Nagios config file to extract info w/ Sed, Awk, Vi, etc. chudster Linux - General 3 10-14-2010 02:18 AM
Sed/awk/grep search for number string of variable length in text file Alexr Linux - Newbie 10 01-19-2010 01:34 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration