LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-23-2013, 09:17 AM   #1
MrTuxor
LQ Newbie
 
Registered: Apr 2013
Posts: 13

Rep: Reputation: Disabled
Bash script help - removing certain rows from .csv file


Hello Everyone,

I am trying to find a way to take a .csv file with 7 columns and a ton of rows (over 600,000) and remove the entire row if the cell in forth column is blank.

Just to give you a little background on why I am doing this (just in case there is an easier way), I am pulling information from a PCAP into a .csv file and I only want to view the rows from the .csv file if it lists something in the http.host (forth column) entry (i.e. google.com). If that entry is blank because it is not a http.host website then I would like to remove the row. By doing this it would seriously cut down on the amount of rows I have to review to make sure my users are not visiting sites that they should now be.

So far my script looks like this:
#/bin/bash

echo -n "What is the name of your PCAP file? "
read in_pcap

echo -n "What is the name of your CSV file? "
read out_csv

tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, > "$out_csv"

_____
I ran the script on a current PCAP and it wors like a charm getting the information I need from a pcap file to a csv file unfortunately I am running into the aforementioned blank row situation as every entry does not list a value in the http.host cell. In fact of the over 600,000 I am guessing there are only several hundred rows that I need. So adding to the script above (or creating a new script if need be) to remove rows with a blank entry in the forth column of every row would be the perfect solution however I am not sure how to do that. The condition that needs to be met for the loop (assuming a loop is the solution) for the loop to stop would be for each of the 7 columns to be blank a.k.a. the row after the last of the 600,000+ entries.

Can anyone help me edit my current script and or write a new script to loop over (or otherwise remove) blank entries?

Thanks in advance!
 
Old 04-23-2013, 11:10 AM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
aaa1,aaa2,aaa3,aaa4,aaa5,aaa6,aaa7
bbb1,bbb2,bbb3,bbb4,bbb5,bbb6,bbb7
ccc1,ccc2,ccc3,,ccc5,ccc6,ccc7
ddd1,ddd2,ddd3,ddd4,ddd5,ddd6,ddd7
eee1,eee2,eee3,,eee5,eee6,eee7
fff1,fff2,fff3,fff4,fff5,fff6,fff7
... this awk ...
Code:
awk -F, '{if ($4!="") print}' $InFile >$OutFile
... produced this OutFile ...
Code:
aaa1,aaa2,aaa3,aaa4,aaa5,aaa6,aaa7
bbb1,bbb2,bbb3,bbb4,bbb5,bbb6,bbb7
ddd1,ddd2,ddd3,ddd4,ddd5,ddd6,ddd7
fff1,fff2,fff3,fff4,fff5,fff6,fff7
Daniel B. Martin
 
1 members found this post helpful.
Old 04-23-2013, 01:35 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Daniel's solution is correct and of course you can pipe your tshark command into it instead of creating an intermediate file.

It can also be shortened to:
Code:
tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, | awk -F, '$4 != ""' > output.file
 
2 members found this post helpful.
Old 04-24-2013, 01:56 PM   #4
MrTuxor
LQ Newbie
 
Registered: Apr 2013
Posts: 13

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Daniel's solution is correct and of course you can pipe your tshark command into it instead of creating an intermediate file.

It can also be shortened to:
Code:
tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, | awk -F, '$4 != ""' > output.file
Thanks a ton for this answer!

I went from 600,000+ rows to analyze to less than 3100 and it did everything I wanted it to.

I do have one problem though. If I run this line:
tshark -r test.pcap -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, | awk –F, ‘$4 != “”’ > test.csv
Everything works perfectly fine.

However if I run my script mentioned above with the variables $in_pcap and $out_csv, I get the following error message:
./pcapAnalyze: line 9: $out_csv: ambiguous redirect
tshark: Output fields were specified with "-e", but "-Tfields" was not specified.

I wanted to make sure the command ran on its own without the variables to limit the amount of things that could go wrong. After I verified that the command worked perfect with the awk command at the end, I simply replaced the hard coded test.pcap and test.csv files with the $in_pcap and $out_csv variables, and I got that error message. The only thing that I changed was implementing the variable... Why is it doing that?

Thanks again!
 
Old 04-24-2013, 03:07 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by MrTuxor View Post
./pcapAnalyze: line 9: $out_csv: ambiguous redirect
In my (limited) experience ambiguous redirect means you failed to identify the output file out_csv. Check that -- might be only a keying error.

Daniel B. Martin
 
Old 04-24-2013, 03:59 PM   #6
Gener@l
LQ Newbie
 
Registered: Sep 2011
Posts: 16

Rep: Reputation: Disabled
Quote:
Originally Posted by MrTuxor View Post
However if I run my script mentioned above with the variables $in_pcap and $out_csv, I get the following error message:
./pcapAnalyze: line 9: $out_csv: ambiguous redirect
tshark: Output fields were specified with "-e", but "-Tfields" was not specified.
just an observation since I am not familiar with tshark, but your command is -T fields, but the error output says "-Tfields" with no space. Did you accidentally remove the space between -T and fields? That might resolve the redirect issue too if that error was caused by the incorrect tshark command.
 
Old 04-24-2013, 11:23 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
If you alter the shebang to be:
Code:
#!/bin/bash -xv
This will show you if your variables are being set to what you are thinking.

Also, point above about spacing may also need to be looked at.
 
Old 04-25-2013, 08:57 AM   #8
MrTuxor
LQ Newbie
 
Registered: Apr 2013
Posts: 13

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
If you alter the shebang to be:
Code:
#!/bin/bash -xv
This will show you if your variables are being set to what you are thinking.

Also, point above about spacing may also need to be looked at.
I tried the –xv after the shebang and received the following:
echo –n “What is the name of the PCAP file? “
+ echo –n ‘What is the name of the PCAP file? ‘
What is the name of the PCAP file? Read $in_pcap
+read
And it hung up there. I am not sure why.

Quote:
Originally Posted by Gener@l View Post
just an observation since I am not familiar with tshark, but your command is -T fields, but the error output says "-Tfields" with no space. Did you accidentally remove the space between -T and fields? That might resolve the redirect issue too if that error was caused by the incorrect tshark command.
I tried to move the –T fields (after double checking the spacing part of it) part of the command to the end of the command just before the first –E switch but it produced another error message saying:
Line 9: $out_csv: ambiguous redirect
tshark: The File “-e” doesn’t exist

I hesitate to think that the command is wrong because it only errors out when I change the actual name of the files (pcap and .csv) to variables… If I hard code the name of the pcap and csv file into the script and comment out the echo/variable lines, it works perfect.

Quote:
Originally Posted by danielbmartin View Post
In my (limited) experience ambiguous redirect means you failed to identify the output file out_csv. Check that -- might be only a keying error.

Daniel B. Martin
What you are saying makes sense and I was thinking the same thing so I tried the command both with and without the .csv extension (i.e. test; test.csv) and both failed with the same ambiguous error.

Thanks again for the help!
 
Old 04-25-2013, 10:35 AM   #9
MrTuxor
LQ Newbie
 
Registered: Apr 2013
Posts: 13

Original Poster
Rep: Reputation: Disabled
Lightbulb

Hey Everyone,

Almost ashamed to admit it but I got it to work... The dollar signs after the read commands at the beginning of the script were throwing it off. As soon as I removed them (but kept them in the tshark command) it ran perfectly!

Thanks for all the help!
 
Old 04-25-2013, 11:09 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well I am glad you got it working, even if I do not understand the following:
Quote:
The dollar signs after the read commands at the beginning of the script were throwing it off
In your example I see no dollar signs after any of the read commands?

Also, in answer to your previous question:
Quote:
And it hung up there. I am not sure why.
This would be the same as your normal program where it would pause and wait for you to enter the PCAP value so it can be stored in the variable.

In future you may find it cleaner to simply use read and not the additional echo:
Code:
echo -n "What is the name of your PCAP file? "
read in_pcap

# becomes

read -p "What is the name of your PCAP file? " in_pcap
 
1 members found this post helpful.
Old 04-25-2013, 11:47 AM   #11
MrTuxor
LQ Newbie
 
Registered: Apr 2013
Posts: 13

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Well I am glad you got it working, even if I do not understand the following:

In your example I see no dollar signs after any of the read commands?

Also, in answer to your previous question:

This would be the same as your normal program where it would pause and wait for you to enter the PCAP value so it can be stored in the variable.

In future you may find it cleaner to simply use read and not the additional echo:
Code:
echo -n "What is the name of your PCAP file? "
read in_pcap

# becomes

read -p "What is the name of your PCAP file? " in_pcap
I was moving parts of the command around and must have at some point decided to add $’s to the variables after the read command. Unfortunately that was after I posted my example and I forgot to update that on the forum #LessonLearned.

Thanks again!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to script csv editing? Remove rows from csv file that do not contain certain text ingram87 Linux - Software 9 08-03-2012 12:45 PM
consolidating rows from csv file noony123 Linux - Newbie 3 12-14-2011 07:47 AM
bash script choice menu from csv file deefke Linux - Newbie 5 01-25-2011 01:31 PM
How to ignore rows with a specific character in a csv file ziggy25 Linux - Newbie 8 03-13-2010 12:38 PM
[SOLVED] Need help create a bash script to edit CSV File imkornhulio Programming 13 02-05-2009 10:23 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:19 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration