Need a second pair of eyes to check this code for CSV file
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 140
Rep:
Need a second pair of eyes to check this code for CSV file
Hi, just got code to work that pulls out the fields I need from a file, 6 fields to be specific. Next I have to convert 3 of these fields.
'Mop-21050905','auth','info','info','26','2009-10-01 05:00:01','snort','snort[9861]: [1:254:7] DNS SPOOF query response with TTL of 1 min and no authority [Classification: Potentially Bad Traffic] [Priority: 2]: {UDP} 20867222222:53 -> 9522496106:58846',7119397
which is what I want, however I cannot convert the datetime field (1st field) since the colon is removed from the time. It was removed in order to separate the IP address from its port number. (used to be 202103951:1096 now is 202103951,1096)
I need to use a sed command to put the time back to 12:45:18 and not 12,45,18
Tried using
sed -e 's/.^/\d+[0-9]/\d+[0-9]/\d+[0-9]/:/' but no luck
Any ideas using sed which will fit with my original command?
Now I have another question, using this new file how can I get the 1st field of datetime to convert in each line using date -d "$1" +%s, maybe xarg or awk?
e.g.
xargs -a filename.txt -I date -d "$1" +%s
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 140
Original Poster
Rep:
hi rizhun, tried your code but I want to replace the original datetime with the new datetime after the datetime code is executed. Also the datetime code doesn't change the datetime for each field but only outputs the same integer as below
# awk to remove unwanted third and fifth columns
awk 'BEGIN{FS=",";OFS=","}{$3="";gsub(FS "+",FS)}3{$5="";gsub(FS "+",FS)}5' newerprotocol.txt > newestprotocol.txt
A big problem is that protocols TCP AND UDP have 6 fields so code would only work for them while protocol ICMP has only 4 fields (due to no ports for it)
i.e only the first ip address (field 3) will be converted and not the 2nd ip address (field 4)
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 140
Original Poster
Rep:
Thanks again rizhun just made a small change to the code to convert the 2nd ip address (4th field) of icmp also like so
cat finishedprotocol.txt | while read dataline
do
# find out which protocol we're dealing with
datatype=$(echo ${dataline} | awk -F"," '{ print $2 }')
# convert 2 ip addresses for icmp
[[ ${datatype} == "ICMP" ]] && {
# grab the ip
ipaddress1=$(echo ${dataline} | awk -F"," '{ print $3 }')
ipaddress2=$(echo ${dataline} | awk -F"," '{ print $4 }')
# convert the ip
newipaddress1=$(echo ${ipaddress1} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
newipaddress2=$(echo ${ipaddress2} | perl -ne 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/$1<<24|$2<<16|$3<<8|$4/e;print')
# remove old ip column
newdataline=$(echo ${dataline} | awk 'BEGIN{FS=",";OFS=","}{$3=$4"";gsub(FS "+",FS)}1')
# print new 'dataline' to new text file
echo "${newipaddress1},${newipaddress2},${newdataline}" >> newerprotocol.txt
# start from the top of the loop
continue
}
however only the UDP and TCP datalines have printed not the ICMP datalines and the positon of the protocol and port fields have moved to the end of the line, maybe I haven't included the ICMP to print in the "newestprotocol" file?)
any ideas why this occurred?
The 4th and 6th field still appear in the ICMP lines so just wondering what else I'm missing in this code? Also the code appears to hang when it reaches "done" and won't execute until you hit the enter key, any thoughts on this?
Distribution: Debian Lenny 2.6.26 Ubuntu Lucid Lynx 10.04 Windows 7
Posts: 140
Original Poster
Rep:
Another issue:
The 4th and 6th field still appear in the ICMP lines so just wondering what else I'm missing in this code?
Also the previous code in the previous reply appears to hang when it reaches "done" and won't execute until you hit the enter key, any thoughts on this?
Hi, just got code to work that pulls out the fields I need from a file, 6 fields to be specific. Next I have to convert 3 of these fields.
'Mop-21050905','auth','info','info','26','2009-10-01 05:00:01','snort','snort[9861]: [1:254:7] DNS SPOOF query response with TTL of 1 min and no authority [Classification: Potentially Bad Traffic] [Priority: 2]: {UDP} 20867222222:53 -> 9522496106:58846',7119397
which is what I want, however I cannot convert the datetime field (1st field) since the colon is removed from the time. It was removed in order to separate the IP address from its port number. (used to be 202103951:1096 now is 202103951,1096)
I need to use a sed command to put the time back to 12:45:18 and not 12,45,18
Tried using
sed -e 's/.^/\d+[0-9]/\d+[0-9]/\d+[0-9]/:/' but no luck
Any ideas using sed which will fit with my original command?
Please, will you people stop writing these one-line wonders! They are a waste of time, they identify rank amateur coders, they start out incomprehensible and unworkable and go downhill from there, and they beg to be thrown away.
Do it this way:
Code:
rn=1
cat data.txt | while read line
do
echo "Record $rn:"
fn=1
echo "$line" | tr ',' '\n' | while read field
do
echo -e "\tField $fn: $field"
((fn++))
done
((rn++))
done
Now it must be obvious that you can use the same coding pattern to break the fields down in the same way, down to individual characters if need be, to accomplish any earthly objective.
The advantages of this script is that you can understand it, change it, and use it again tomorrow for some other data processing task.
The advantage of the one-liner is that you can baffle onlookers with your imagined intellectual prowess, but as to accomplishing anything useful, forget it.
When you write a script, you have some options:
1. Use sed
2. Use awk
3. Use one-liners
4. Use find followed by either xargs or "-exec ... '{}' \;"
5. Use your head.
I recommend option 5.
While using your head, I invite you to find the flaw in my script, that will prevent it from working with just any arbitrary comma-separated database.
(pause ...)
Okay, time's up. If there are any embedded commas within the data fields, my script will fail. But this can be dealt with by using a smart parser that is only slightly more complex than the above script. Fair warning.
Please, will you people stop writing these one-line wonders! They are a waste of time, they identify rank amateur coders, they start out incomprehensible and unworkable and go downhill from there, and they beg to be thrown away.
Do it this way:
Code:
rn=1
cat data.txt | while read line
do
echo "Record $rn:"
fn=1
echo "$line" | tr ',' '\n' | while read field
do
echo -e "\tField $fn: $field"
((fn++))
done
((rn++))
done
Now it must be obvious that you can use the same coding pattern to break the fields down in the same way, down to individual characters if need be, to accomplish any earthly objective.
The advantages of this script is that you can understand it, change it, and use it again tomorrow for some other data processing task.
The advantage of the one-liner is that you can baffle onlookers with your imagined intellectual prowess, but as to accomplishing anything useful, forget it.
When you write a script, you have some options:
1. Use sed
2. Use awk
3. Use one-liners
4. Use find followed by either xargs or "-exec ... '{}' \;"
5. Use your head.
I recommend option 5.
While using your head, I invite you to find the flaw in my script, that will prevent it from working with just any arbitrary comma-separated database.
(pause ...)
Okay, time's up. If there are any embedded commas within the data fields, my script will fail. But this can be dealt with by using a smart parser that is only slightly more complex than the above script. Fair warning.
First of all I was given this code and it works!!!!!
Secondly, it is tough to understand but at least I understand it now more since it has been discussed and work on over a period of weeks (it is clear you are unaware of this!)
Thirdly, your code doesn't work for me so and isn't explained clearly
Fourthly you condescending attitude doesn't belong in a forum like this and you would be better off spending your time in complaining on some blog or twitter!!
rn=1
cat data.txt | while read line
do
echo "Record $rn:"
fn=1
echo "$line" | tr ',' '\n' | while read field
do
echo -e "\tField $fn: $field"
((fn++))
done
((rn++))
done
no need to use cat. Also can be written another way which is slightly more efficient. Instead of reading the lines and doing tr for each line,
do the tr first
Code:
tr ',' '\n' < "data.txt" | while ....
do
done
in fact, tr is also not needed since one can set IFS to get fields in just read into array all with bash.
that said, the whole thing above can also be written with just awk, whose file operations are more efficient than a bash while loop.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.