LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices

Reply
 
Search this Thread
Old 05-04-2004, 12:51 PM   #1
TheDarktrooper
LQ Newbie
 
Registered: May 2004
Posts: 5

Rep: Reputation: 0
Help needed in writing Awk Scripts..


Ok, Basically, im just starting out doing Awk Scripts in Linux (as the title would suggest) and frankly.. i have no idea what i'm doing..

i'm trying to write a simple script that takes a webpage as a parameter (eg, index.html) and returns a list of all the links on that web page to toerher sites (so ending in html, or htm) with a count after them representing how many times that link was counted on that page.

So far i've managed to just sort and display all the links on the page by going:

BEGIN{FS = "\""}

{c=split($0, s); for(n=1; n<=c; ++n) print s[n] | "sort | uniq | grep http | egrep '(html)|(htm)'" }

END{}

which splits up all the source code for the webpage around the "'s (which surround links) and then sorts them, gets rid of duplicates and only displays links..

the thing is, i really have no idea where to go from here, what i think i have to do, is use an Array to count the number of instances of each link, and then print out the contents of each entry in the array after the corresponding link (i dont want to use the uniq -c command because that displays it before hand) but i have no idea how to go about that...

so any help you could give would be appreciated.
 
Old 05-04-2004, 07:49 PM   #2
TheOther1
Member
 
Registered: Feb 2003
Location: Atlanta, GA
Distribution: RHAS 2.1, RHEL3, RHEL4, SLES 8.3, SLES 9, SLES9_64, SuSE 9.3 Pro, Ubuntu, Gentoo
Posts: 335

Rep: Reputation: 32
Sorry my awk is terrible, so might I make another suggestion? I would think Perl would be your best bet for this sort of thing. Use HTML::Parser to strip out links. Part of that package is HTML::LinkExtor that does EXACTLY what you want, extracts links from an HTML document. There is even a demo script that just prints them, but you can easily build a hash using the link as the key and increment the value each time you find the same link. When done, you can sort the hash or just print the key and value using a for loop:
Code:
   for my $key ( keys %hash ) {
        my $value = $hash{$key};
        print "$key => $value\n";
    }
or a while loop:
Code:
while ( my ($key, $value) = each(%hash) ) {
        print "$key => $value\n";
    }
There are tons of Perl modules for dealing with HTML available on CPAN.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
writing awk script files bigmark Linux - Software 1 10-19-2005 11:33 AM
Sed/Awk command help needed. farmerjoe Programming 3 03-02-2005 12:13 PM
awk experts help needed ferreirafm Linux - General 2 07-28-2004 09:38 PM
Writing scripts Rameriez Programming 2 01-07-2003 01:01 AM
Getting awk to extract scripts from a file jspaceman Programming 5 11-24-2002 07:37 PM


All times are GMT -5. The time now is 04:33 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration