LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 10-16-2009, 05:56 AM   #1
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Rep: Reputation: 21
Need a second pair of eyes to check this code for CSV file


Hi, just got code to work that pulls out the fields I need from a file, 6 fields to be specific. Next I have to convert 3 of these fields.


'Mop-21050905','auth','info','info','26','2009-10-01 05:00:01','snort','snort[9861]: [1:254:7] DNS SPOOF query response with TTL of 1 min and no authority [Classification: Potentially Bad Traffic] [Priority: 2]: {UDP} 20867222222:53 -> 9522496106:58846',7119397

'Mop-21050905','auth','info','info','26','2009-10-01 07:59:09','snort','snort[9861]: [1:7861:1] POLICY Google Desktop activity [Classification: Potential Corporate Privacy Violation] [Priority: 1]: {TCP} 9522496106:43418 -> 7412553136:80',7123988

'Mop-21050905','auth','info','info','26','2009-10-01 08:02:22','snort','snort[9861]: [1:13864:1] POLICY Microsoft Watson error reporting attempt [Classification: Potential Corporate Privacy Violation] [Priority: 1]: {TCP} 9522496106:34663 -> 655553190:80',7124307

Using this code

awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename.txt | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'"

gives

2009-10-09 12,45,18,ICMP,66102999,9522496106
2009-10-09 12,45,20,UDP,202103951,1096,9522496106,1434
2009-10-09 12,45,20,UDP,202103951,1096,9522496106,1434

which is what I want, however I cannot convert the datetime field (1st field) since the colon is removed from the time. It was removed in order to separate the IP address from its port number. (used to be 202103951:1096 now is 202103951,1096)

I need to use a sed command to put the time back to 12:45:18 and not 12,45,18
Tried using
sed -e 's/.^/\d+[0-9]/\d+[0-9]/\d+[0-9]/:/' but no luck
Any ideas using sed which will fit with my original command?

Last edited by shayno90; 10-23-2009 at 08:46 AM.
 
Old 10-16-2009, 07:29 AM   #2
rizhun
Member
 
Registered: Jun 2005
Location: England
Distribution: Ubuntu, SLES, AIX
Posts: 268

Rep: Reputation: 47
Quick & Dirty:

Code:
$ sed -e 's/,/:/' -e 's/,/:/' your_data_file.txt
 
Old 10-16-2009, 09:31 AM   #3
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
Thanks rizhun, it worked when I piped it onto the original command

awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'" | sed -e 's/,/:/' -e 's/,/:/' > filename.txt

Gives

2009-10-01 08:06:08,TCP,62.149.128.161,110,95.224.96.106,1047
2009-10-01 08:06:08,TCP,62.149.128.161,110,95.224.96.106,1047
2009-10-01 08:06:17,TCP,62.149.128.167,110,95.224.96.106,1053
2009-10-01 08:06:17,TCP,62.149.128.167,110,95.224.96.106,1053

Now I have another question, using this new file how can I get the 1st field of datetime to convert in each line using date -d "$1" +%s, maybe xarg or awk?
e.g.
xargs -a filename.txt -I date -d "$1" +%s

Last edited by shayno90; 10-16-2009 at 05:14 PM.
 
Old 10-16-2009, 09:52 AM   #4
rizhun
Member
 
Registered: Jun 2005
Location: England
Distribution: Ubuntu, SLES, AIX
Posts: 268

Rep: Reputation: 47
You didn't specify whether the epoch date was an addition or replacement of the existing date, so I'm going to presume it's an addition:

Code:
cat datafile.txt | while read dataline
do
  date=$(echo ${dateline} | awk -F"," '{ print $1 }')
  epochdate=$(date -d "${date}" +%s)
  echo "${epochdate},${dataline}" >> newdatafile.txt
done
 
Old 10-20-2009, 06:30 AM   #5
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
hi rizhun, tried your code but I want to replace the original datetime with the new datetime after the datetime code is executed. Also the datetime code doesn't change the datetime for each field but only outputs the same integer as below

1255993200,2009-10-01 05:00:01,UDP,208.67.222.222,53,95.224.96.106,58846
1255993200,2009-10-01 07:59:09,TCP,95.224.96.106,43418,74.125.53.136,80
1255993200,2009-10-01 13:21:01,ICMP,83.237.90.142,95.224.96.106

So I want the datetime to look like this date (using this -d "2009-10-09 11:58:59" +%s should be date -d "$1" +%s)

1255993289,UDP,208.67.222.222,53,95.224.96.106,58846
1255389290,TCP,95.224.96.106,43418,74.125.53.136,80
1283202828,ICMP,83.237.90.142,95.224.96.106

So if you or anyone else can suggest how to change it, would be great.

Last edited by shayno90; 10-21-2009 at 03:48 AM.
 
Old 10-21-2009, 03:57 AM   #6
rizhun
Member
 
Registered: Jun 2005
Location: England
Distribution: Ubuntu, SLES, AIX
Posts: 268

Rep: Reputation: 47
Hi shayno90,

There was a type-o in my code, try this:

Code:
# original code with fixed type-o
cat datafile.txt | while read dataline
do
  date=$(echo ${dataline} | awk -F"," '{ print $1 }')
  epochdate=$(date -d "${date}" +%s)
  echo "${epochdate},${dataline}" >> newdatafile.txt
done

# awk to remove unwanted second column
awk 'BEGIN{FS=",";OFS=","}{$2="";gsub(FS "+",FS)}1' newdatafile.txt > finished-data-file.txt
 
Old 10-21-2009, 04:57 AM   #7
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
Thanks rizhun the code outputted the data correctly
Now I am trying to convert the 2 ip address fields into integers for example

echo 194.169.240.130 | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print' (this gives the integer below)
3265917058

So I need to convert and substitute fields 2 and 4 from ip addresses to integers

1255993200,ICMP,86.47.191.211,95.224.96.106 ----------->1255993200,ICMP,3265917058,2261917067
1255993200,TCP,95.224.96.106,36491,65.55.53.190,80 ----------->1255993200,TCP,13259170432,36491,3234817345,80
1255993200,UDP,208.67.222.222,53,95.224.96.106,59329 ----------->1255993200,UDP,5875912358,53,3290917028,59329

Could changing your code like this work

cat finishedprotocol.txt | while read dataline
do
ipaddress=$(echo ${dataline} | awk -F"," '{ print $3, print$5 }')
newipaddress=$(ipaddress perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
echo "${newipaddress},${dataline}" >> newerprotocol.txt
done

# awk to remove unwanted third and fifth columns
awk 'BEGIN{FS=",";OFS=","}{$3="";gsub(FS "+",FS)}3{$5="";gsub(FS "+",FS)}5' newerprotocol.txt > newestprotocol.txt

A big problem is that protocols TCP AND UDP have 6 fields so code would only work for them while protocol ICMP has only 4 fields (due to no ports for it)
i.e only the first ip address (field 3) will be converted and not the 2nd ip address (field 4)

Any workaround for this?

Last edited by shayno90; 10-21-2009 at 05:02 AM.
 
Old 10-21-2009, 05:49 AM   #8
rizhun
Member
 
Registered: Jun 2005
Location: England
Distribution: Ubuntu, SLES, AIX
Posts: 268

Rep: Reputation: 47
Hmmm...

Try the following (not tested):

Code:
cat finishedprotocol.txt | while read dataline
do

  # find out which protocol we're dealing with
  datatype=$(echo ${dataline} | awk -F"," '{ print $2 }')

  # only 1 ip for icmp
  [[ ${datatype} == "ICMP" ]] && {
    # grab the ip
    ipaddress=$(echo ${dataline} | awk -F"," '{ print $3 }')
    # convert the ip
    newipaddress=$(echo ${ipaddress} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
    # remove old ip column
    newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3="";gsub(FS "+",FS)}1')
    # print new 'dataline' to new text file
    echo "${newipaddress},${newdataline}" >> newerprotocol.txt
    # start from the top of the loop
    continue
  }

  # anything except icmp; grab ip's
  ipaddone=$(echo ${dataline} | awk -F"," '{ print $3 }')
  ipaddtwo=$(echo ${dataline} | awk -F"," '{ print $5 }')

  # convert
  newipone=$(echo ${ipaddone} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
  newiptwo=$(echo ${ipaddtwo} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')

  # remove old columns
  newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3=$5="";gsub(FS "+",FS)}1')

  # print new 'dataline' to new text file
  echo "${newipone},${newiptwo},${newdataline}" >> newerprotocol.txt

done
 
Old 10-21-2009, 06:15 AM   #9
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
Thanks again rizhun just made a small change to the code to convert the 2nd ip address (4th field) of icmp also like so

cat finishedprotocol.txt | while read dataline
do

# find out which protocol we're dealing with
datatype=$(echo ${dataline} | awk -F"," '{ print $2 }')

# convert 2 ip addresses for icmp
[[ ${datatype} == "ICMP" ]] && {
# grab the ip
ipaddress1=$(echo ${dataline} | awk -F"," '{ print $3 }')
ipaddress2=$(echo ${dataline} | awk -F"," '{ print $4 }')
# convert the ip
newipaddress1=$(echo ${ipaddress1} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
newipaddress2=$(echo ${ipaddress2} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
# remove old ip column
newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3=$4"";gsub(FS "+",FS)}1')
# print new 'dataline' to new text file
echo "${newipaddress1},${newipaddress2},${newdataline}" >> newerprotocol.txt
# start from the top of the loop
continue
}

# anything except icmp; grab ip's
ipaddone=$(echo ${dataline} | awk -F"," '{ print $3 }')
ipaddtwo=$(echo ${dataline} | awk -F"," '{ print $5 }')

# convert
newipone=$(echo ${ipaddone} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
newiptwo=$(echo ${ipaddtwo} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')

# remove old columns
newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3=$5="";gsub(FS "+",FS)}1')

# print new 'dataline' to new text file
echo "${newipone},${newiptwo},${newdataline}" >> newestprotocol.txt

done

output is like this

3670927203,1608540266,1255993200,UDP,4244,1434
3494108894,1608540266,1255993200,UDP,53,58846
1608540266,1249719688,1255993200,TCP,43418,80
1608540266,1094137278,1255993200,TCP,34663,80
1049985185,1608540266,1255993200,TCP,110,1047

however only the UDP and TCP datalines have printed not the ICMP datalines and the positon of the protocol and port fields have moved to the end of the line, maybe I haven't included the ICMP to print in the "newestprotocol" file?)
any ideas why this occurred?

Last edited by shayno90; 10-21-2009 at 06:17 AM.
 
Old 10-21-2009, 06:24 AM   #10
rizhun
Member
 
Registered: Jun 2005
Location: England
Distribution: Ubuntu, SLES, AIX
Posts: 268

Rep: Reputation: 47
You've got 2 different output files in your code:

newerprotocol.txt
and
newestprotocol.txt

Change one so they are both the same.

Any column re-arranging can be done by using 'awk' to print each field into a variable, then 'echo' those variables out in the order you require.
 
Old 10-21-2009, 07:12 AM   #11
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
Ya sorted the output by changing the file name, now trying to rearrange the field's positions but code is just hanging

cat newerprotocol.txt | while read dataline
do

field1=$(echo ${dataline} | awk -F"," '{ print $1 }')
field2=$(echo ${dataline} | awk -F"," '{ print $4 }')
field3=$(echo ${dataline} | awk -F"," '{ print $2 }')
field4=$(echo ${dataline} | awk -F"," '{ print $5 }')
field5=$(echo ${dataline} | awk -F"," '{ print $3 }')
field6=$(echo ${dataline} | awk -F"," '{ print $6 }')

echo "${field1},${field2},${field3},${field4},${field5},${field6}" >> newestprotocol.csv

done

The 4th and 6th field still appear in the ICMP lines so just wondering what else I'm missing in this code? Also the code appears to hang when it reaches "done" and won't execute until you hit the enter key, any thoughts on this?

Last edited by shayno90; 10-23-2009 at 09:52 AM.
 
Old 10-23-2009, 09:53 AM   #12
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
Another issue:

The 4th and 6th field still appear in the ICMP lines so just wondering what else I'm missing in this code?

Also the previous code in the previous reply appears to hang when it reaches "done" and won't execute until you hit the enter key, any thoughts on this?
 
Old 10-23-2009, 03:11 PM   #13
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 101Reputation: 101
Quote:
Originally Posted by shayno90 View Post
Hi, just got code to work that pulls out the fields I need from a file, 6 fields to be specific. Next I have to convert 3 of these fields.


'Mop-21050905','auth','info','info','26','2009-10-01 05:00:01','snort','snort[9861]: [1:254:7] DNS SPOOF query response with TTL of 1 min and no authority [Classification: Potentially Bad Traffic] [Priority: 2]: {UDP} 20867222222:53 -> 9522496106:58846',7119397

'Mop-21050905','auth','info','info','26','2009-10-01 07:59:09','snort','snort[9861]: [1:7861:1] POLICY Google Desktop activity [Classification: Potential Corporate Privacy Violation] [Priority: 1]: {TCP} 9522496106:43418 -> 7412553136:80',7123988

'Mop-21050905','auth','info','info','26','2009-10-01 08:02:22','snort','snort[9861]: [1:13864:1] POLICY Microsoft Watson error reporting attempt [Classification: Potential Corporate Privacy Violation] [Priority: 1]: {TCP} 9522496106:34663 -> 655553190:80',7124307

Using this code

awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename.txt | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'"

gives

2009-10-09 12,45,18,ICMP,66102999,9522496106
2009-10-09 12,45,20,UDP,202103951,1096,9522496106,1434
2009-10-09 12,45,20,UDP,202103951,1096,9522496106,1434

which is what I want, however I cannot convert the datetime field (1st field) since the colon is removed from the time. It was removed in order to separate the IP address from its port number. (used to be 202103951:1096 now is 202103951,1096)

I need to use a sed command to put the time back to 12:45:18 and not 12,45,18
Tried using
sed -e 's/.^/\d+[0-9]/\d+[0-9]/\d+[0-9]/:/' but no luck
Any ideas using sed which will fit with my original command?

Code:
awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename.txt | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'"
Please, will you people stop writing these one-line wonders! They are a waste of time, they identify rank amateur coders, they start out incomprehensible and unworkable and go downhill from there, and they beg to be thrown away.

Do it this way:

Code:
rn=1
cat data.txt | while read line
do
   echo "Record $rn:"
   fn=1
   echo "$line" | tr ',' '\n' | while read field
   do
       echo -e "\tField $fn: $field"
       ((fn++))
   done
   ((rn++))
done
Now it must be obvious that you can use the same coding pattern to break the fields down in the same way, down to individual characters if need be, to accomplish any earthly objective.

The advantages of this script is that you can understand it, change it, and use it again tomorrow for some other data processing task.

The advantage of the one-liner is that you can baffle onlookers with your imagined intellectual prowess, but as to accomplishing anything useful, forget it.

When you write a script, you have some options:

1. Use sed
2. Use awk
3. Use one-liners
4. Use find followed by either xargs or "-exec ... '{}' \;"
5. Use your head.

I recommend option 5.

While using your head, I invite you to find the flaw in my script, that will prevent it from working with just any arbitrary comma-separated database.

(pause ...)

Okay, time's up. If there are any embedded commas within the data fields, my script will fail. But this can be dealt with by using a smart parser that is only slightly more complex than the above script. Fair warning.
 
Old 10-27-2009, 11:46 AM   #14
shayno90
Member
 
Registered: Oct 2009
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 177
Blog Entries: 1

Original Poster
Rep: Reputation: 21
Quote:
Originally Posted by lutusp View Post
Code:
awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename.txt | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'"
Please, will you people stop writing these one-line wonders! They are a waste of time, they identify rank amateur coders, they start out incomprehensible and unworkable and go downhill from there, and they beg to be thrown away.

Do it this way:

Code:
rn=1
cat data.txt | while read line
do
   echo "Record $rn:"
   fn=1
   echo "$line" | tr ',' '\n' | while read field
   do
       echo -e "\tField $fn: $field"
       ((fn++))
   done
   ((rn++))
done
Now it must be obvious that you can use the same coding pattern to break the fields down in the same way, down to individual characters if need be, to accomplish any earthly objective.

The advantages of this script is that you can understand it, change it, and use it again tomorrow for some other data processing task.

The advantage of the one-liner is that you can baffle onlookers with your imagined intellectual prowess, but as to accomplishing anything useful, forget it.

When you write a script, you have some options:

1. Use sed
2. Use awk
3. Use one-liners
4. Use find followed by either xargs or "-exec ... '{}' \;"
5. Use your head.

I recommend option 5.

While using your head, I invite you to find the flaw in my script, that will prevent it from working with just any arbitrary comma-separated database.

(pause ...)

Okay, time's up. If there are any embedded commas within the data fields, my script will fail. But this can be dealt with by using a smart parser that is only slightly more complex than the above script. Fair warning.


First of all I was given this code and it works!!!!!

Secondly, it is tough to understand but at least I understand it now more since it has been discussed and work on over a period of weeks (it is clear you are unaware of this!)

Thirdly, your code doesn't work for me so and isn't explained clearly

Fourthly you condescending attitude doesn't belong in a forum like this and you would be better off spending your time in complaining on some blog or twitter!!

Last edited by shayno90; 10-27-2009 at 11:48 AM.
 
Old 10-27-2009, 12:59 PM   #15
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by lutusp View Post
[CODE]

Do it this way:

Code:
rn=1
cat data.txt | while read line
do
   echo "Record $rn:"
   fn=1
   echo "$line" | tr ',' '\n' | while read field
   do
       echo -e "\tField $fn: $field"
       ((fn++))
   done
   ((rn++))
done
no need to use cat. Also can be written another way which is slightly more efficient. Instead of reading the lines and doing tr for each line,
do the tr first
Code:
tr ',' '\n' < "data.txt"  | while ....
do

done
in fact, tr is also not needed since one can set IFS to get fields in just read into array all with bash.

that said, the whole thing above can also be written with just awk, whose file operations are more efficient than a bash while loop.
 
  


Reply

Tags
awk, csv, extract, fields, file, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
replace line in CSV file and rename file connected to that name wademac Linux - Newbie 3 07-15-2009 01:09 PM
Need a fresh pair of eyes to review my backup+restore scripts BassKozz Linux - Newbie 6 06-01-2009 04:51 AM
I am just not seeing this error kinda need an extra pair of eyes please (C++) jus71n742 Programming 3 03-01-2009 04:24 PM
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 09:15 PM
i need a second pair of eyes spooge Linux - Networking 1 03-16-2005 06:25 PM


All times are GMT -5. The time now is 09:15 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration