LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-30-2010, 12:52 PM   #1
Phier
Member
 
Registered: Nov 2009
Posts: 31

Rep: Reputation: 0
Bash - Searching strings for array elements...


...and returning the index of the found element in its array.

I have:

for ((i=0; i < ${#array1[@]}; i++)); do
# Read each line of the file test
if [[ $(eval "sed -n '$(($i+1))'p test") == *${array2[0]}* ]]
stuff

I want to find the index of the found substring in array2 and only if it isn't found, move on to the next element of array2. I don't know the size of array2 so that [0] has just got to go.

Any suggestions? Efficiency is always nice.
 
Old 04-30-2010, 12:59 PM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
I am not sure i get what you are asking?

Perhaps an example of what is in array1 and array2?
 
Old 04-30-2010, 01:10 PM   #3
Phier
Member
 
Registered: Nov 2009
Posts: 31

Original Poster
Rep: Reputation: 0
test contains for example:
2010-04-30 11:05:10,ICMP Echo Reply,Possible harmful network activity detected,3,<IP address>
...

array1:

2010-04-30 11:05:10
...

array2

<IP address>

There's one array1 element for each line of test, and only a handful of IP addresses.

I put all the unique IP addresses into array2, and create another array which contains the countries associated with those IP addresses, in the same order.

I want to step through each line of test and if it contains array2[0] append country[0] to test, except dynamically.
 
Old 04-30-2010, 01:50 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,557
Blog Entries: 28

Rep: Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178
Have I rightly understood that you:
  1. Have a file with lines of the format <timestamp>,<other stuff>,<IP address>
  2. Have data mapping each <IP address> to a country.
  3. Want to add the appropriate country to the end of each file line.
 
Old 04-30-2010, 02:18 PM   #5
Phier
Member
 
Registered: Nov 2009
Posts: 31

Original Poster
Rep: Reputation: 0
Yeah. I'm probably going about it the wrong way.
sed in a for loop is icky


Scratch that, i have it. Was using sed when i could easier do what i wanted with a simple echo.

Thanks anyway!

Last edited by Phier; 04-30-2010 at 02:29 PM.
 
Old 04-30-2010, 02:39 PM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,557
Blog Entries: 28

Rep: Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178Reputation: 1178
Quote:
Originally Posted by Phier View Post
Yeah. I'm probably going about it the wrong way.
sed in a for loop is icky
Wait up! I haven't got that far yet, still trying to understand the requirement (but looking at your code to get a better handle on it). Efficiency and style can come later.

Is my statement of the requirement complete? I can't square it with your outermost loop being on the date array and then sedding the file to get the matching line. From the code sample you gave (before editing it out -- somewhat confusing, better to leave it in and add something like "EDIT: ignore that") there doesn't seem to be any reason for having the date array ... ?

Does this pseudo-code solve the requirement
Code:
for each line of the file
{
    get the IP address
    search the "IP_address to country" mapping data to find the country
    write the line with the country suffixed
}
 
Old 05-01-2010, 12:59 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
I agree with catkin that the date array is of no value after your explanation and would have said the issue revolves around
how you match IPs in array2 with "country" in array3
 
Old 05-03-2010, 10:56 AM   #8
Phier
Member
 
Registered: Nov 2009
Posts: 31

Original Poster
Rep: Reputation: 0
Hopefully this explains better:

Code:
Datetime=( $(awk -F ',' '{print $1}' csvfile) )
	IP=( $(awk -F "," '{print $5}' csvfile | uniq) )

	for ((i=0; i < ${#IP[@]}; i++)); do
	       	Country[i]=$(grep $(/usr/local/bin/jwhois ${IP[i]} |grep -i country:|uniq| awk '{print $2}')" -" codesfile|awk -F '- ' '{print $2}')
	done

	for ((i=0; i < ${#Datetime[@]}; i++)); do
	        for ((j=0; j < ${#IP[@]}; j++)); do
        	        if [[ $(eval "sed -n '$(($i+1))'p csvfile") == *${IP[j]}* ]]
                	then
                        	echo $(eval "sed -n '$(($i+1))'p csvfile"),${Country[j]} >> newcsvfile
				break
	                fi
        	done
	done

        Datetime=( $(awk -F ',' '{print $1}' csvfile) )
	IP=( $(awk -F "," '{print $5}' csvfile | uniq) )

	for ((i=0; i < ${#IP[@]}; i++)); do
	       	Country[i]=$(grep $(/usr/local/bin/jwhois ${IP[i]} |grep -i country:|uniq| awk '{print $2}')" -" codesfile|awk -F '- ' '{print $2}')
	done

	for ((i=0; i < ${#Datetime[@]}; i++)); do
	        for ((j=0; j < ${#IP[@]}; j++)); do
        	        if [[ $(eval "sed -n '$(($i+1))'p csvfile") == *${IP[j]}* ]]
                	then
                        	echo $(eval "sed -n '$(($i+1))'p csvfile"),${Country[j]} >> newcsvfile
				break
	                fi
        	done
	done
csvfile contains snort alerts, e.g.
2010-05-03 14:35:23,ICMP Echo Reply,Possible harmful network activity detected,3,216.239.59.104

I get the country code by running a jwhois on all the unique IPs, and find the corresponding country name using codefile, which contains for example:

US - United States

Finally i want to create a newcsvfile, which, for the above alert would contain:

2010-05-03 14:35:23,ICMP Echo Reply,Possible harmful network activity detected,3,216.239.59.104,United States

The above seems to work on the test files i've used, but the genuine csvfile keeps growing, so i want to make it as efficient as possible.
 
Old 05-03-2010, 11:37 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Well I am not sure why we got the code twice (unless I am missing a small change?)?

My first tip (although just a helper but not required):
Code:
IP=( $(awk -F "," '{print $5}' csvfile | uniq) )

#could be

IP=( $(awk -F, '!_[$5]{print $5}' csvfile) )
As for:
Quote:
Country[i]=$(grep $(/usr/local/bin/jwhois ${IP[i]} |grep -i country:|uniq| awk '{print $2}')" -" codesfile|awk -F '- ' '
I am not familiar with jwhois, but I would lay odds you do not need multiple awks here. Again if you could provide some output we might be able to tidy that too

Also I think you can get rid of a lot of collateral in the last for loop:
1. The Datetime array serves no purpose (see next steps)
2. the "if" is not required as all the data up until this point has come from the csvfile, so testing if the IP will be there is a waste.
3. As you have identified the unique IPs then the single IP loop is enough with a solitary sed statement straight into the file, ie to echo the result is not required, just use your Country variable as part of the replacement in sed
4. break not required (not quite sure why you have it there?)

So I would have just gone with:
Code:
for ((j=0;j < ${#IP[@]; j++))
do
    sed "s/(${IP[j]}$)/\1${Country[j]}/" csvfile >> newcsvfile
done
 
Old 05-03-2010, 01:08 PM   #10
Phier
Member
 
Registered: Nov 2009
Posts: 31

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
Code:
for ((j=0;j < ${#IP[@]; j++))
do
    sed "s/(${IP[j]}$)/\1${Country[j]}/" csvfile >> newcsvfile
done
Wouldn't that run sed multiple times on the file?

As the file grows, i'd like to only run jwhois on new IPs that haven't been jwhois-ed before.
 
Old 05-04-2010, 03:36 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Quote:
Wouldn't that run sed multiple times on the file?
Yes

Quote:
As the file grows, i'd like to only run jwhois on new IPs that haven't been jwhois-ed before.
I see where you are coming from here, but I would put it to you that what stops a more recent line using a previous IP.
Remembering that the sed will only replace those that have IP at the end of the line, ie if it already has comma country then it will be ignored.
 
Old 05-07-2010, 10:36 PM   #12
Phier
Member
 
Registered: Nov 2009
Posts: 31

Original Poster
Rep: Reputation: 0
I don't follow you. The sed will make the new file bigger than the original 'cause it will run #IP[@] times. Do you know how i can avoid this?

I have this:

Code:
for ((i=0; i < $(wc -l < csvfile); i++)); do
        line=$(eval "sed -n '$(($i+1))'p csvfile")
        if [[ $(echo $line | awk -F "," '{print NF}') == "6" ]]
        then
                KnownIPs[$[${#KnownIPs[@]}+1]]=$(echo $line | awk -F "," '{print $5","$6}')
        fi
done

UniqueKnownIPs=( $( printf "%s\n" "${KnownIPs[@]}" | awk 'x[$0]++ == 0 { print $0 }' ) )


for ((i=0; i < $(wc -l < csvfile); i++)); do
        found=0
        line=$(eval "sed -n '$(($i+1))'p csvfile")
        if [[ $(echo $line | awk -F "," '{print NF}') == "6" ]]
        then
                echo "printing line as is"
                echo $line >> newcsvfile
        else
                for j in "${UniqueKnownIPs[@]}"; do
                        knownIP=$(echo $j | awk -F "," '{print $1}')
                        if [[ "$line" == *"$knownIP" ]]
                        then
                                echo "IP found in file"
                                found=1
                                Country=$(grep $knownIP -m 1 csvfile|awk -F "," '{print $6}')
                                echo $line,$Country >> newcsvfile
                                break
                        fi
                done
                if [[ $found -eq 0 ]]
                then
                        echo "looking up IP"
                        newIP=$(echo $line|awk -F "," '{print $5}')
                        Country=$(/usr/local/bin/jwhois $newIP |grep -i country:|uniq|awk '{print $2}')
                        UniqueKnownIPs[$[${#UniqueKnownIPs[@]}+1]]=$newIP","$Country
                        echo $line,$Country >> newcsvfile
                fi
        fi

done
Probably messy and unnecessary but i can count the amount of scripts i've written on one hand, and it seems to work

Oh, the reason why i can't simply ignore previously seen IPs as they will have countries beside them, is because updates are added to the same csvfile, so you might have:

IP1,Country1
IP2,Country2
IP3
IP2

and on subsequent runs of the script it would be silly to look up Country2 again.

Last edited by Phier; 05-08-2010 at 01:50 AM.
 
Old 05-08-2010, 01:34 AM   #13
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
hmmm ... so are you saying that in your example:
Quote:
IP1,Country1
IP2,Country2
IP3
IP2
You have run the code once previously and hence the added Country's and now when you run it again that the assigned Country for IP2 has now changed
so both IP2 values need to be updated?
 
Old 05-08-2010, 11:23 AM   #14
Phier
Member
 
Registered: Nov 2009
Posts: 31

Original Poster
Rep: Reputation: 0
Further up in the script it gets the IPs from an sqldump, then it adds the countries, so on first run we've got:

IP1,Country1
IP2,Country2

And half way through the second run we've got:

IP1,Country1
IP2,Country2
IP3
IP2

IP2s country won't change, but we've seen it before so don't have to look it up again.
 
Old 05-08-2010, 11:45 AM   #15
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
If we are not going to look it up how will the new IP2 get its corresponding Country?

My understanding is this:

1. We have a file in the following format:
Quote:
2010-04-30 11:05:10,ICMP Echo Reply,Possible harmful network activity detected,3,1.1.1.1
2. Using jwhois, you retrieve the Country information associated with each IP
3. A sed runs over the file mention in step 1 to append the string ",Country" after each relevant IP
4. New data is added to original file

Obviously from this we can deduce that steps 2 - 4 may continue until the file reaches some arbitrary limit.

Now I am saying that step 3 will only make changes to the newly added data as it will be the only ones without the relevant string at the end.

Do I understand correctly that your issue is that the sed will still 'look' at each line even though it may already have the string at the end?

Or is it another concern?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Simultaneous writes into different elements of an array estratos Programming 7 12-15-2006 07:36 AM
Passing Array Elements to functions melikai Programming 4 10-31-2006 11:27 PM
Renaming array elements in bash bryan.out.there Programming 2 06-01-2006 12:44 AM
odd behaviour of array elements in c++ markhod Programming 4 03-14-2005 10:58 AM
perl - get number of elements in an array AM1SHFURN1TURE Programming 3 03-07-2005 04:59 PM


All times are GMT -5. The time now is 08:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration