![]() |
Need a second pair of eyes to check this code for CSV file
Hi, just got code to work that pulls out the fields I need from a file, 6 fields to be specific. Next I have to convert 3 of these fields.
'Mop-21050905','auth','info','info','26','2009-10-01 05:00:01','snort','snort[9861]: [1:254:7] DNS SPOOF query response with TTL of 1 min and no authority [Classification: Potentially Bad Traffic] [Priority: 2]: {UDP} 20867222222:53 -> 9522496106:58846',7119397 'Mop-21050905','auth','info','info','26','2009-10-01 07:59:09','snort','snort[9861]: [1:7861:1] POLICY Google Desktop activity [Classification: Potential Corporate Privacy Violation] [Priority: 1]: {TCP} 9522496106:43418 -> 7412553136:80',7123988 'Mop-21050905','auth','info','info','26','2009-10-01 08:02:22','snort','snort[9861]: [1:13864:1] POLICY Microsoft Watson error reporting attempt [Classification: Potential Corporate Privacy Violation] [Priority: 1]: {TCP} 9522496106:34663 -> 655553190:80',7124307 Using this code awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename.txt | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'" gives 2009-10-09 12,45,18,ICMP,66102999,9522496106 2009-10-09 12,45,20,UDP,202103951,1096,9522496106,1434 2009-10-09 12,45,20,UDP,202103951,1096,9522496106,1434 which is what I want, however I cannot convert the datetime field (1st field) since the colon is removed from the time. It was removed in order to separate the IP address from its port number. (used to be 202103951:1096 now is 202103951,1096) I need to use a sed command to put the time back to 12:45:18 and not 12,45,18 Tried using sed -e 's/.^/\d+[0-9]/\d+[0-9]/\d+[0-9]/:/' but no luck:( Any ideas using sed which will fit with my original command? |
Quick & Dirty:
Code:
$ sed -e 's/,/:/' -e 's/,/:/' your_data_file.txt |
Thanks rizhun, it worked when I piped it onto the original command
awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'" | sed -e 's/,/:/' -e 's/,/:/' > filename.txt Gives 2009-10-01 08:06:08,TCP,62.149.128.161,110,95.224.96.106,1047 2009-10-01 08:06:08,TCP,62.149.128.161,110,95.224.96.106,1047 2009-10-01 08:06:17,TCP,62.149.128.167,110,95.224.96.106,1053 2009-10-01 08:06:17,TCP,62.149.128.167,110,95.224.96.106,1053 Now I have another question, using this new file how can I get the 1st field of datetime to convert in each line using date -d "$1" +%s, maybe xarg or awk? e.g. xargs -a filename.txt -I date -d "$1" +%s |
You didn't specify whether the epoch date was an addition or replacement of the existing date, so I'm going to presume it's an addition:
Code:
cat datafile.txt | while read dataline |
hi rizhun, tried your code but I want to replace the original datetime with the new datetime after the datetime code is executed. Also the datetime code doesn't change the datetime for each field but only outputs the same integer as below
1255993200,2009-10-01 05:00:01,UDP,208.67.222.222,53,95.224.96.106,58846 1255993200,2009-10-01 07:59:09,TCP,95.224.96.106,43418,74.125.53.136,80 1255993200,2009-10-01 13:21:01,ICMP,83.237.90.142,95.224.96.106 So I want the datetime to look like this date (using this -d "2009-10-09 11:58:59" +%s should be date -d "$1" +%s) 1255993289,UDP,208.67.222.222,53,95.224.96.106,58846 1255389290,TCP,95.224.96.106,43418,74.125.53.136,80 1283202828,ICMP,83.237.90.142,95.224.96.106 So if you or anyone else can suggest how to change it, would be great. |
Hi shayno90,
There was a type-o in my code, try this: Code:
# original code with fixed type-o |
Thanks rizhun the code outputted the data correctly :)
Now I am trying to convert the 2 ip address fields into integers for example echo 194.169.240.130 | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print' (this gives the integer below) 3265917058 So I need to convert and substitute fields 2 and 4 from ip addresses to integers 1255993200,ICMP,86.47.191.211,95.224.96.106 ----------->1255993200,ICMP,3265917058,2261917067 1255993200,TCP,95.224.96.106,36491,65.55.53.190,80 ----------->1255993200,TCP,13259170432,36491,3234817345,80 1255993200,UDP,208.67.222.222,53,95.224.96.106,59329 ----------->1255993200,UDP,5875912358,53,3290917028,59329 Could changing your code like this work cat finishedprotocol.txt | while read dataline do ipaddress=$(echo ${dataline} | awk -F"," '{ print $3, print$5 }') newipaddress=$(ipaddress perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print') echo "${newipaddress},${dataline}" >> newerprotocol.txt done # awk to remove unwanted third and fifth columns awk 'BEGIN{FS=",";OFS=","}{$3="";gsub(FS "+",FS)}3{$5="";gsub(FS "+",FS)}5' newerprotocol.txt > newestprotocol.txt A big problem is that protocols TCP AND UDP have 6 fields so code would only work for them while protocol ICMP has only 4 fields (due to no ports for it) i.e only the first ip address (field 3) will be converted and not the 2nd ip address (field 4) :( Any workaround for this? |
Hmmm...
Try the following (not tested): Code:
cat finishedprotocol.txt | while read dataline |
Thanks again rizhun just made a small change to the code to convert the 2nd ip address (4th field) of icmp also like so
cat finishedprotocol.txt | while read dataline do # find out which protocol we're dealing with datatype=$(echo ${dataline} | awk -F"," '{ print $2 }') # convert 2 ip addresses for icmp [[ ${datatype} == "ICMP" ]] && { # grab the ip ipaddress1=$(echo ${dataline} | awk -F"," '{ print $3 }') ipaddress2=$(echo ${dataline} | awk -F"," '{ print $4 }') # convert the ip newipaddress1=$(echo ${ipaddress1} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print') newipaddress2=$(echo ${ipaddress2} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print') # remove old ip column newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3=$4"";gsub(FS "+",FS)}1') # print new 'dataline' to new text file echo "${newipaddress1},${newipaddress2},${newdataline}" >> newerprotocol.txt # start from the top of the loop continue } # anything except icmp; grab ip's ipaddone=$(echo ${dataline} | awk -F"," '{ print $3 }') ipaddtwo=$(echo ${dataline} | awk -F"," '{ print $5 }') # convert newipone=$(echo ${ipaddone} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print') newiptwo=$(echo ${ipaddtwo} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print') # remove old columns newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3=$5="";gsub(FS "+",FS)}1') # print new 'dataline' to new text file echo "${newipone},${newiptwo},${newdataline}" >> newestprotocol.txt done output is like this 3670927203,1608540266,1255993200,UDP,4244,1434 3494108894,1608540266,1255993200,UDP,53,58846 1608540266,1249719688,1255993200,TCP,43418,80 1608540266,1094137278,1255993200,TCP,34663,80 1049985185,1608540266,1255993200,TCP,110,1047 however only the UDP and TCP datalines have printed not the ICMP datalines and the positon of the protocol and port fields have moved to the end of the line, maybe I haven't included the ICMP to print in the "newestprotocol" file?) any ideas why this occurred? |
You've got 2 different output files in your code:
newerprotocol.txt and newestprotocol.txt Change one so they are both the same. Any column re-arranging can be done by using 'awk' to print each field into a variable, then 'echo' those variables out in the order you require. |
Ya sorted the output by changing the file name, now trying to rearrange the field's positions but code is just hanging
cat newerprotocol.txt | while read dataline do field1=$(echo ${dataline} | awk -F"," '{ print $1 }') field2=$(echo ${dataline} | awk -F"," '{ print $4 }') field3=$(echo ${dataline} | awk -F"," '{ print $2 }') field4=$(echo ${dataline} | awk -F"," '{ print $5 }') field5=$(echo ${dataline} | awk -F"," '{ print $3 }') field6=$(echo ${dataline} | awk -F"," '{ print $6 }') echo "${field1},${field2},${field3},${field4},${field5},${field6}" >> newestprotocol.csv done The 4th and 6th field still appear in the ICMP lines so just wondering what else I'm missing in this code? Also the code appears to hang when it reaches "done" and won't execute until you hit the enter key, any thoughts on this? |
Another issue:
The 4th and 6th field still appear in the ICMP lines so just wondering what else I'm missing in this code? Also the previous code in the previous reply appears to hang when it reaches "done" and won't execute until you hit the enter key, any thoughts on this? |
Quote:
Code:
awk -F"," '{ split($8,tmp,"{"); print $6 "," tmp[2] }' filename.txt | sed -e "s/} /,/g" -e "s/ -> /,/g" | tr ":" "," | tr -d "'"Do it this way: Code:
rn=1The advantages of this script is that you can understand it, change it, and use it again tomorrow for some other data processing task. The advantage of the one-liner is that you can baffle onlookers with your imagined intellectual prowess, but as to accomplishing anything useful, forget it. When you write a script, you have some options: 1. Use sed 2. Use awk 3. Use one-liners 4. Use find followed by either xargs or "-exec ... '{}' \;" 5. Use your head. I recommend option 5. While using your head, I invite you to find the flaw in my script, that will prevent it from working with just any arbitrary comma-separated database. (pause ...) Okay, time's up. If there are any embedded commas within the data fields, my script will fail. But this can be dealt with by using a smart parser that is only slightly more complex than the above script. Fair warning. |
Quote:
First of all I was given this code and it works!!!!! Secondly, it is tough to understand but at least I understand it now more since it has been discussed and work on over a period of weeks (it is clear you are unaware of this!) Thirdly, your code doesn't work for me so and isn't explained clearly Fourthly you condescending attitude doesn't belong in a forum like this and you would be better off spending your time in complaining on some blog or twitter!! |
Quote:
do the tr first Code:
tr ',' '\n' < "data.txt" | while ....that said, the whole thing above can also be written with just awk, whose file operations are more efficient than a bash while loop. |
| All times are GMT -5. The time now is 08:49 AM. |