Hey guys,
I could really use some help cleaning the readability of this ruby script. I could not figure out how to read files more properly for my situation. I know there is a fh.each_line ... but I'm reading more into blocks within the file... first here is the script:
http://thorium-ini.ini.cmu.edu/unique_contact.rb (md5sum 1742e0c0fc2022ca06cd99f07342d1c3)
Code:
#!/usr/bin/ruby
begin
ip = Hash.new
ARGV.each do |dir|
files = Array.new # Going to get all of the files in this dir which is all of the
files = Dir.glob("#{dir}/**/*") # 5 minute intervals in the day
files.each do |file| # for each file get the number of connections
next if File.directory?(file)
puts "#{file}"
fh = File.open(file, "r") # open the file for reading
fh.readline # skip the first four lines
fh.readline
fh.readline
fh.readline
while(1)
ipaddr = fh.readline # the first line of the block is the IP address
ip[ipaddr] = Hash.new if(!ip[ipaddr])
while(1) # while we are inside the block of IP's they had contact with
(dir, contact, count, loc) = fh.readline.split # read the line in and split it
break if(dir=="IP:") # "IP:" denotes the end of the block
ip[ipaddr][contact]=1 # The value doesn't matter, as long as the key exists
end
fh.readline # skip blank line after block
break if(fh.eof?) # go to the next file if EOF
end
fh.close
end
end
of = File.open("degree_timespan", "w")
ip.each do |key, value|
of.puts "#{key.chop} #{ip[key].length}"
end
of.close
end
To test it create a directory called "test" and put these two files in it:
http://thorium-ini.ini.cmu.edu/test1
http://thorium-ini.ini.cmu.edu/test2
Code:
ce0fc04cd448d645f0e1a259a4b40d16 /var/www/localhost/htdocs/test1
5532b17a93b51fbb5abd5b640245af8e /var/www/localhost/htdocs/test2
Then you can run it, and check your output with mine:
Code:
gnychis@x60s ~/school/thesis/host_analysis/degree $ ./unique_contact.rb test
test/test1
test/test2
gnychis@x60s ~/school/thesis/host_analysis/degree $ cat degree_timespan
0.245.42.237 3
0.245.42.236 1
0.245.42.235 2
There must be a better way to reading in the file than the code I wrote.
To help you better understand the input:
Code:
0.245.42.235 *** TOP LEVEL HASH KEY ***
<- 224.170.4.11 1 (inter) *** HOST IT HAS CONTACTED ***
<- 224.170.4.12 1 (inter) *** HOST IT HAS CONTACTED ***
IP: 0.245.42.235 RA_Out: 0 ER_Out: 0 RA_In: 0 ER_In: 1 RA: 0 ER: 1 Total: 1 ** KNOW THE BLOCK ENDED*
Thanks!
George