Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
04-23-2013, 10:17 AM
|
#1
|
LQ Newbie
Registered: Apr 2013
Posts: 13
Rep:
|
Bash script help - removing certain rows from .csv file
Hello Everyone,
I am trying to find a way to take a .csv file with 7 columns and a ton of rows (over 600,000) and remove the entire row if the cell in forth column is blank.
Just to give you a little background on why I am doing this (just in case there is an easier way), I am pulling information from a PCAP into a .csv file and I only want to view the rows from the .csv file if it lists something in the http.host (forth column) entry (i.e. google.com). If that entry is blank because it is not a http.host website then I would like to remove the row. By doing this it would seriously cut down on the amount of rows I have to review to make sure my users are not visiting sites that they should now be.
So far my script looks like this:
#/bin/bash
echo -n "What is the name of your PCAP file? "
read in_pcap
echo -n "What is the name of your CSV file? "
read out_csv
tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, > "$out_csv"
_____
I ran the script on a current PCAP and it wors like a charm getting the information I need from a pcap file to a csv file unfortunately I am running into the aforementioned blank row situation as every entry does not list a value in the http.host cell. In fact of the over 600,000 I am guessing there are only several hundred rows that I need. So adding to the script above (or creating a new script if need be) to remove rows with a blank entry in the forth column of every row would be the perfect solution however I am not sure how to do that. The condition that needs to be met for the loop (assuming a loop is the solution) for the loop to stop would be for each of the 7 columns to be blank a.k.a. the row after the last of the 600,000+ entries.
Can anyone help me edit my current script and or write a new script to loop over (or otherwise remove) blank entries?
Thanks in advance!
|
|
|
04-23-2013, 12:10 PM
|
#2
|
Senior Member
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,894
|
With this InFile ...
Code:
aaa1,aaa2,aaa3,aaa4,aaa5,aaa6,aaa7
bbb1,bbb2,bbb3,bbb4,bbb5,bbb6,bbb7
ccc1,ccc2,ccc3,,ccc5,ccc6,ccc7
ddd1,ddd2,ddd3,ddd4,ddd5,ddd6,ddd7
eee1,eee2,eee3,,eee5,eee6,eee7
fff1,fff2,fff3,fff4,fff5,fff6,fff7
... this awk ...
Code:
awk -F, '{if ($4!="") print}' $InFile >$OutFile
... produced this OutFile ...
Code:
aaa1,aaa2,aaa3,aaa4,aaa5,aaa6,aaa7
bbb1,bbb2,bbb3,bbb4,bbb5,bbb6,bbb7
ddd1,ddd2,ddd3,ddd4,ddd5,ddd6,ddd7
fff1,fff2,fff3,fff4,fff5,fff6,fff7
Daniel B. Martin
|
|
1 members found this post helpful.
|
04-23-2013, 02:35 PM
|
#3
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,030
|
Daniel's solution is correct and of course you can pipe your tshark command into it instead of creating an intermediate file.
It can also be shortened to:
Code:
tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, | awk -F, '$4 != ""' > output.file
|
|
2 members found this post helpful.
|
04-24-2013, 02:56 PM
|
#4
|
LQ Newbie
Registered: Apr 2013
Posts: 13
Original Poster
Rep:
|
Quote:
Originally Posted by grail
Daniel's solution is correct and of course you can pipe your tshark command into it instead of creating an intermediate file.
It can also be shortened to:
Code:
tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, | awk -F, '$4 != ""' > output.file
|
Thanks a ton for this answer!
I went from 600,000+ rows to analyze to less than 3100 and it did everything I wanted it to.
I do have one problem though. If I run this line:
tshark -r test.pcap -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, | awk –F, ‘$4 != “”’ > test.csv
Everything works perfectly fine.
However if I run my script mentioned above with the variables $in_pcap and $out_csv, I get the following error message:
./pcapAnalyze: line 9: $out_csv: ambiguous redirect
tshark: Output fields were specified with "-e", but "-Tfields" was not specified.
I wanted to make sure the command ran on its own without the variables to limit the amount of things that could go wrong. After I verified that the command worked perfect with the awk command at the end, I simply replaced the hard coded test.pcap and test.csv files with the $in_pcap and $out_csv variables, and I got that error message. The only thing that I changed was implementing the variable... Why is it doing that?
Thanks again!
|
|
|
04-24-2013, 04:07 PM
|
#5
|
Senior Member
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,894
|
Quote:
Originally Posted by MrTuxor
./pcapAnalyze: line 9: $out_csv: ambiguous redirect
|
In my (limited) experience ambiguous redirect means you failed to identify the output file out_csv. Check that -- might be only a keying error.
Daniel B. Martin
|
|
|
04-24-2013, 04:59 PM
|
#6
|
LQ Newbie
Registered: Sep 2011
Posts: 16
Rep:
|
Quote:
Originally Posted by MrTuxor
However if I run my script mentioned above with the variables $in_pcap and $out_csv, I get the following error message:
./pcapAnalyze: line 9: $out_csv: ambiguous redirect
tshark: Output fields were specified with "-e", but "-Tfields" was not specified.
|
just an observation since I am not familiar with tshark, but your command is -T fields, but the error output says "-Tfields" with no space. Did you accidentally remove the space between -T and fields? That might resolve the redirect issue too if that error was caused by the incorrect tshark command.
|
|
|
04-25-2013, 12:23 AM
|
#7
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,030
|
If you alter the shebang to be:
This will show you if your variables are being set to what you are thinking.
Also, point above about spacing may also need to be looked at.
|
|
|
04-25-2013, 09:57 AM
|
#8
|
LQ Newbie
Registered: Apr 2013
Posts: 13
Original Poster
Rep:
|
Quote:
Originally Posted by grail
If you alter the shebang to be:
This will show you if your variables are being set to what you are thinking.
Also, point above about spacing may also need to be looked at.
|
I tried the –xv after the shebang and received the following:
echo –n “What is the name of the PCAP file? “
+ echo –n ‘What is the name of the PCAP file? ‘
What is the name of the PCAP file? Read $in_pcap
+read
And it hung up there. I am not sure why.
Quote:
Originally Posted by Gener@l
just an observation since I am not familiar with tshark, but your command is -T fields, but the error output says "-Tfields" with no space. Did you accidentally remove the space between -T and fields? That might resolve the redirect issue too if that error was caused by the incorrect tshark command.
|
I tried to move the –T fields (after double checking the spacing part of it) part of the command to the end of the command just before the first –E switch but it produced another error message saying:
Line 9: $out_csv: ambiguous redirect
tshark: The File “-e” doesn’t exist
I hesitate to think that the command is wrong because it only errors out when I change the actual name of the files (pcap and .csv) to variables… If I hard code the name of the pcap and csv file into the script and comment out the echo/variable lines, it works perfect.
Quote:
Originally Posted by danielbmartin
In my (limited) experience ambiguous redirect means you failed to identify the output file out_csv. Check that -- might be only a keying error.
Daniel B. Martin
|
What you are saying makes sense and I was thinking the same thing so I tried the command both with and without the .csv extension (i.e. test; test.csv) and both failed with the same ambiguous error.
Thanks again for the help!
|
|
|
04-25-2013, 11:35 AM
|
#9
|
LQ Newbie
Registered: Apr 2013
Posts: 13
Original Poster
Rep:
|
Hey Everyone,
Almost ashamed to admit it but I got it to work... The dollar signs after the read commands at the beginning of the script were throwing it off. As soon as I removed them (but kept them in the tshark command) it ran perfectly!
Thanks for all the help!
|
|
|
04-25-2013, 12:09 PM
|
#10
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,030
|
Well I am glad you got it working, even if I do not understand the following:
Quote:
The dollar signs after the read commands at the beginning of the script were throwing it off
|
In your example I see no dollar signs after any of the read commands?
Also, in answer to your previous question:
Quote:
And it hung up there. I am not sure why.
|
This would be the same as your normal program where it would pause and wait for you to enter the PCAP value so it can be stored in the variable.
In future you may find it cleaner to simply use read and not the additional echo:
Code:
echo -n "What is the name of your PCAP file? "
read in_pcap
# becomes
read -p "What is the name of your PCAP file? " in_pcap
|
|
1 members found this post helpful.
|
04-25-2013, 12:47 PM
|
#11
|
LQ Newbie
Registered: Apr 2013
Posts: 13
Original Poster
Rep:
|
Quote:
Originally Posted by grail
Well I am glad you got it working, even if I do not understand the following:
In your example I see no dollar signs after any of the read commands?
Also, in answer to your previous question:
This would be the same as your normal program where it would pause and wait for you to enter the PCAP value so it can be stored in the variable.
In future you may find it cleaner to simply use read and not the additional echo:
Code:
echo -n "What is the name of your PCAP file? "
read in_pcap
# becomes
read -p "What is the name of your PCAP file? " in_pcap
|
I was moving parts of the command around and must have at some point decided to add $’s to the variables after the read command. Unfortunately that was after I posted my example and I forgot to update that on the forum #LessonLearned.
Thanks again!
|
|
|
All times are GMT -5. The time now is 09:35 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|