LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-21-2011, 11:29 AM   #1
logicalfuzz
Member
 
Registered: Aug 2005
Distribution: Arch Linux
Posts: 291

Rep: Reputation: 47
Parsing file with multiple delimiters


I am using nessus to scan systems, and it generates the report file without any specific delimiter. Shown below is how a typical line in the file reads (It might show up as a very long line.. pasting in text editor may help):
Code:
xx.xx.xx.xx|cpq-wbem (2301/tcp)|49272|REPORT|Synopsis :;;The remote web server is affected by multiple vulnerabilities.;;Description :;;According to its self-reported version number, the HP System;Management Homepage install on the remote host is earlier than 6.2. ;Such versions are reportedly affected by the following ;vulnerabilities :;;  - Session renegotiations are not handled properly, which;    could be exploited to insert arbitrary plaintext in a;    man-in-the-middle attack. (CVE-2009-3555);;  - An attacker may be able to upload files using a POST ;    request with 'multipart/form-data' content even if the ;    target script doesn't actually support file uploads per;    se. (CVE-2009-4017);;  - PHP's 'proc_open' function can be abused to bypass ;    'safe_mode_allowed_env_vars' and ;    'safe_mode_protected_env_vars' directives. ;    (CVE-2009-4018);;  - PHP does not properly protect session data as relates;    to interrupt corruption of '$_SESSION' and the ;    'session.save_path' directive. (CVE-2009-4143);;  - The application allows arbitrary URL redirections.;    (CVE-2010-1586 and CVE-2010-3283);;  - An information disclosure vulnerability exists in;    Apache's mod_proxy_ajp, mod_reqtimeout, and ;    mod_proxy_http relating to timeout conditions. Note ;    that this issue only affects SMH on Windows. ;    (CVE-2010-2068);;  - An as-yet unspecified information disclosure ;    vulnerability may allow an authorized user to gain;    access to sensitive information, which in turn could;    be leveraged to obtain root access on Linux installs;    of SMH. (CVE-2010-3009);;  - There is an as-yet unspecified HTTP response splitting;    issue. (CVE-2010-3011);;  - There is an as-yet unspecified XSS issue. ;    (CVE-2010-3012);;  - An as-yet unspecified vulnerability could lead to ;    remote disclosure of sensitive information.;    (CVE-2010-3284);;See also :;;http://www.securityfocus.com/archive/1/513684/30/0/threaded;http://www.securityfocus.com/archive/1/513771/30/0/threaded;http://www.securityfocus.com/archive/1/513840/30/0/threaded;http://www.securityfocus.com/archive/1/513917/30/0/threaded;http://www.securityfocus.com/archive/1/513918/30/0/threaded;http://www.securityfocus.com/archive/1/513920/30/0/threaded;;Solution :;;Upgrade to HP System Management Homepage 6.2.0 or later.;;Risk factor :;;High / CVSS Base Score : 9.0;(CVSS2#AV:N/AC:L/Au:S/C:C/I:C/A:C);CVSS Temporal Score : 7.4;(CVSS2#E:F/RL:OF/RC:C);Public Exploit Available : true;;;Plugin output :;;  Product           : HP System Management Homepage;  Version source    : Server: CompaqHTTPServer/9.9 HP System Management Homepage/6.1.0.102;  Installed version : 6.1.0.102;  Fixed version     : 6.2.0.12;;CVE : CVE-2009-3555, CVE-2009-4017, CVE-2009-4018, CVE-2009-4143, CVE-2010-1586, CVE-2010-2068, CVE-2010-3009, CVE-2010-3011, CVE-2010-3012, CVE-2010-3283, CVE-2010-3284;BID : 36935, 37079, 37138, 37390, 43208, 43269, 43334, 43423, 43462, 43463;Other references : OSVDB:60438, OSVDB:60451, OSVDB:61208, OSVDB:64146, OSVDB:64725, OSVDB:65654, OSVDB:68025, OSVDB:68124, OSVDB:68125, OSVDB:68216, OSVDB:68217, CWE:310;
Now, a constant pattern in the line is (note the pipe '|' as the delimiter):
IP ADDRESS|PROTOCOL|NESSUS ID|someNOTE|EVERYTHING-ELSE

Of this 'EVERYTHING-ELSE' is split as
Synopsys:;;Description:;;Solution:;;Risk factor:;;Plugin output

The field of interest (for me) are IP ADDRESS, Synopsys, Description, Solution and Risk Factor.

I tried the following command to parse this:
Code:
sed s/\:\;\;/\|/g report.nsr | grep Risk\ Factor\ \|High | awk 'BEGIN {FS = "[:][;][;]"} {printf("%s|%s|%s|%s",$1, $6, $7, $10)}'
But this would not give desired results as ':;;' is not a standard delimiter and hence gives incorrect results.

Does someone have an idea how to grab everything that occurs between the strings 'Synopsys', 'Description', 'Solution' & 'Risk factor' in the above example? In other words i may be asking to use multiple delimiters.. in the order of occurance.. is there a tool that supports multiple delimiters?
 
Old 06-21-2011, 11:56 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
GNU awk accepts multiple delimiters if you use a regular expression as the value of FS or RS. Suppose you want to grab the Synopsis:
Code:
$ awk 'BEGIN{RS="[|;:]+"}/Synopsis/{getline; print}' file                                                             
The remote web server is affected by multiple vulnerabilities.
...and the trick is done!
 
Old 06-22-2011, 06:22 AM   #3
logicalfuzz
Member
 
Registered: Aug 2005
Distribution: Arch Linux
Posts: 291

Original Poster
Rep: Reputation: 47
Quote:
Originally Posted by colucix View Post
GNU awk accepts multiple delimiters if you use a regular expression as the value of FS or RS. Suppose you want to grab the Synopsis:
Code:
$ awk 'BEGIN{RS="[|;:]+"}/Synopsis/{getline; print}' file                                                             
The remote web server is affected by multiple vulnerabilities.
...and the trick is done!
Thanks colucix!
The above works great for 'Synopsis', however 'Description' has too many obstacles. Maybe using an 'END' would work? i am still reading through awk reference manual.
 
Old 06-22-2011, 06:34 AM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Hmm... I see: the Description section has different delimiters. What are the information you want to extract? Can you post an example of the desired output based on the input above?
 
Old 06-22-2011, 07:23 AM   #5
logicalfuzz
Member
 
Registered: Aug 2005
Distribution: Arch Linux
Posts: 291

Original Poster
Rep: Reputation: 47
Quote:
Originally Posted by colucix View Post
What are the information you want to extract? Can you post an example of the desired output based on the input above?
Here's what i want to exract:

IP Address: xx.xx.xx.xx

Synopsys:The remote web server is affected by multiple vulnerabilities.

Description:According to its self-reported version number, the HP System;Management Homepage install on the remote host is earlier than 6.2. ;Such versions are reportedly affected by the following ;vulnerabilities :;; - Session renegotiations are not handled properly, which; could be exploited to insert arbitrary plaintext in a; man-in-the-middle attack. (CVE-2009-3555);; - An attacker may be able to upload files using a POST ; request with 'multipart/form-data' content even if the ; target script doesn't actually support file uploads per; se. (CVE-2009-4017);; - PHP's 'proc_open' function can be abused to bypass ; 'safe_mode_allowed_env_vars' and ; 'safe_mode_protected_env_vars' directives. ; (CVE-2009-4018);; - PHP does not properly protect session data as relates; to interrupt corruption of '$_SESSION' and the ; 'session.save_path' directive. (CVE-2009-4143);; - The application allows arbitrary URL redirections.; (CVE-2010-1586 and CVE-2010-3283);; - An information disclosure vulnerability exists in; Apache's mod_proxy_ajp, mod_reqtimeout, and ; mod_proxy_http relating to timeout conditions. Note ; that this issue only affects SMH on Windows. ; (CVE-2010-2068);; - An as-yet unspecified information disclosure ; vulnerability may allow an authorized user to gain; access to sensitive information, which in turn could; be leveraged to obtain root access on Linux installs; of SMH. (CVE-2010-3009);; - There is an as-yet unspecified HTTP response splitting; issue. (CVE-2010-3011);; - There is an as-yet unspecified XSS issue. ; (CVE-2010-3012);; - An as-yet unspecified vulnerability could lead to ; remote disclosure of sensitive information.; (CVE-2010-3284);;See also :;;http://www.securityfocus.com/archive...30/0/threaded;

Solution:Upgrade to HP System Management Homepage 6.2.0 or later.

Risk factor: High
 
Old 06-22-2011, 09:06 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Here is an example code:
Code:
BEGIN {

  RS = "[|:;]+"
  
}

NR == 1 { 

  print "IP Address:", $1
  print ""
  
}

/Synopsis/ {

  printf "Synopsis: "
  getline
  print
  print ""
  
}

/Description/ {

  printf "Description: "
  while ( $0 !~ /Solution/ ) {
    getline
    if ( $0 ~ /^http/ ) {
      printf "%s:", $0
      getline
      print
    }
    else if ( $0 !~ /Solution/ )
      print
  }
  print ""
  
}

/Solution/ {

  printf "Solution: "
  getline
  print
  print ""

}

/Risk factor/ {

  printf "Risk factor: "
  getline
  print $1

}
First it retrieves the IP address from the first record. Then parses the Synopsis, assumed it is on a single line. Then it parses the Description until the Solution is encountered. It assumes that Solution always follows the Description. Finally it gets the Solution and the Risk factor, assumed on a single line as well. Running this code on your input file generates:
Code:
$ awk -f test.awk file
IP Address: xx.xx.xx.xx

Synopsis: The remote web server is affected by multiple vulnerabilities.

Description: According to its self-reported version number, the HP System
Management Homepage install on the remote host is earlier than 6.2. 
Such versions are reportedly affected by the following 
vulnerabilities 
  - Session renegotiations are not handled properly, which
    could be exploited to insert arbitrary plaintext in a
    man-in-the-middle attack. (CVE-2009-3555)
  - An attacker may be able to upload files using a POST 
    request with 'multipart/form-data' content even if the 
    target script doesn't actually support file uploads per
    se. (CVE-2009-4017)
  - PHP's 'proc_open' function can be abused to bypass 
    'safe_mode_allowed_env_vars' and 
    'safe_mode_protected_env_vars' directives. 
    (CVE-2009-4018)
  - PHP does not properly protect session data as relates
    to interrupt corruption of '$_SESSION' and the 
    'session.save_path' directive. (CVE-2009-4143)
  - The application allows arbitrary URL redirections.
    (CVE-2010-1586 and CVE-2010-3283)
  - An information disclosure vulnerability exists in
    Apache's mod_proxy_ajp, mod_reqtimeout, and 
    mod_proxy_http relating to timeout conditions. Note 
    that this issue only affects SMH on Windows. 
    (CVE-2010-2068)
  - An as-yet unspecified information disclosure 
    vulnerability may allow an authorized user to gain
    access to sensitive information, which in turn could
    be leveraged to obtain root access on Linux installs
    of SMH. (CVE-2010-3009)
  - There is an as-yet unspecified HTTP response splitting
    issue. (CVE-2010-3011)
  - There is an as-yet unspecified XSS issue. 
    (CVE-2010-3012)
  - An as-yet unspecified vulnerability could lead to 
    remote disclosure of sensitive information.
    (CVE-2010-3284)
See also 
http://www.securityfocus.com/archive/1/513684/30/0/threaded
http://www.securityfocus.com/archive/1/513771/30/0/threaded
http://www.securityfocus.com/archive/1/513840/30/0/threaded
http://www.securityfocus.com/archive/1/513917/30/0/threaded
http://www.securityfocus.com/archive/1/513918/30/0/threaded
http://www.securityfocus.com/archive/1/513920/30/0/threaded

Solution: Upgrade to HP System Management Homepage 6.2.0 or later.

Risk factor: High
Obviously if the input file has a different format (that is different assumption are to be made) or you want to extract other informations, you have to slightly modify it, but it looks to work nicely using pipe, colon and semicolon as record separators.
 
Old 06-22-2011, 11:17 AM   #7
logicalfuzz
Member
 
Registered: Aug 2005
Distribution: Arch Linux
Posts: 291

Original Poster
Rep: Reputation: 47
colucix! This is awesome! Thanks a ton for this!

AWK looks so very powerful...

One last question.. is it possible to get a pipe, or a dollar symbol as seperator instead of a 'new line'? That would let me save it as a CSV or similar. The input file i have to work on is pretty huge actually.
 
Old 06-22-2011, 11:24 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Quote:
Originally Posted by logicalfuzz View Post
colucix! This is awesome! Thanks a ton for this!
You're welcome!
Quote:
Originally Posted by logicalfuzz View Post
One last question.. is it possible to get a pipe, or a dollar symbol as seperator instead of a 'new line'? That would let me save it as a CSV or similar. The input file i have to work on is pretty huge actually.
Yes. Use the ORS (Output Record Separator) built-in variable. You can define it in the BEGIN section:
Code:
BEGIN {

  RS = "[|:;]+"
  ORS = "|"
  
}
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Importing MYSQL table from text file wo column delimiters Latios Programming 3 06-13-2011 04:03 AM
(Debian) xorg.conf file, (EE) Problem parsing the config file unclerick94 Linux - Newbie 1 07-28-2009 02:27 PM
Reading/Wirting file/parsing xml file using javascript fakhrul Programming 1 08-14-2007 05:08 PM
Delimiters in control file ancys Programming 1 08-11-2006 11:40 AM
Need help with file parsing BrianK Programming 2 09-02-2005 05:58 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 08:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration