LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-10-2005, 07:01 PM   #1
jeffreybluml
Member
 
Registered: Mar 2004
Location: Minnesota
Distribution: Fedora Core 1, Mandrake 10
Posts: 405

Rep: Reputation: 30
daunting task - read wml input, insert variables into URL, DL page, parse, write file


Okay, nobody laugh when they read my "code" below. Okay?

I need a guru's help here. It took me seven hours to get where I'm at, and it's about the sloppiest thing you'll ever see. I have NO programming experience, and I know no languages. Now, on with it...

I will include all the parts of "code" I'm using, in the proper order, at the end...

I've set up a wap site (on my home fedora box) for my mobile phone, and I'm trying to incorporate a page that looks up phone numbers from dexonline.com and strips all the html formatting, leaving only the actual search results in a .inc file which is then included in the resulting wap page. What I'm doing is this...

First, I read three inputs from the wap page, name, city, and state. I couldn't figure out how to pass these into bash (or anything else), so what I did was set up a link on the page that incorporates the variables (client side) into the URL of the link and requests it from my webserver. I'm using a "template" URL to dexonline, into which I stick the three variables. But first, I request it form my server, and then I read the URL out of my servers access log. To do this, I have to have the phone request the bad URL, arrive at the "Not Found" page, and then go back to the previous page to continue on to the results page. The results page contains an exec cmd which initiates the whole process after the URL is created. This (sadly) was the only way I could get the URL with the variables out of the wap page and into a bash environment to be used next.

Next, after grepping the URL out of the access log and stripping it of the non-URL stuff (time stamp, etc) and inserting the remaining dexonline stuff, I output the address to a file (called address) and cat it in a for loop to get it into a bash variable so I can then use wget to DL the page and save it as phone.html on my server.

Still with me? Sorry...

Okay, next I grep through the phone.html page for the results (which relies entirely on the format of dexonline's page remaining the same) by keying off of some formatting common only to the actual results. I then pipe these to sed, remove and replace some formatting so it complies with wml, and output it to the .inc file onthe server, which gets called from the wap results page.

Good grief, huh?

Okay, here's the wml code and the phonesearch "script" that is executed...


search.wml
Code:
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">
 <wml>
<head>
<meta forua="true" http-equiv="Cache-Control" content="max-age=0"/>
</head>
<card title="Phone_Search">
<p>
 Enter info, then click LOAD<br/>
Page will say Not Found,<br/>
Click Back, then click<br/>
SHOW RESULTS<br/>
</p>
<p>
Find: <input name="find" type="text"/><br/>
City: <input name="city" type="text"/><br/>
State: <input name="state" type="text"/>
</p>
<p>
<anchor title="go">
LOAD
<go href="http://theblumls.com/servlet/ActionServlet;?pid=blistings&amp;queryType=&amp;centerCity=&amp;centerState=&amp;centerLabel=Last&amp;PREVIOUS_PAGE=bsearch&amp;from=&amp;queryText=$(find)&amp;distance=10&amp;centerAddress=Enter+street+address&amp;cityText=$(city)&amp;state=$(state)&amp;surroundingAreas=true"/>
</anchor>
</p>
<p>
<a href="/files/wait.wml">SHOW RESULTS</a>
<br/>
<a href="/files/wapmain.wml">Main</a>
</p>
</card>
</wml>
/bin/phonesearch
Code:
#!/bin/bash
sudo tail -n 20 /var/log/httpd/access_log | grep  servlet | sed 's/.*GET/http\:\/\/dexonline.com/' | sed 's/HTTP\/1.1\"\ 404\ 573//' | sed 's/\ //g' > /var/www/html/files/address;
j=1; for i in $(cat /var/www/html/files/address); do address=$i; wget -o /var/www/html/wget_log_phonesearch -O /var/www/html/files/phone.html -r $i; j=$(($j+1)); done;
grep -m 1 listingname /var/www/html/files/phone.html | sed 's/<span\ class=\"listingname\">//' | sed 's/<\/span>/<br\/>/' > /var/www/html/files/results.inc;grep -m 30 -A 10 '<tr><td>&nbsp;</td></tr>' /var/www/html/files/phone.html | grep -B 3 '<b>' | sed 's/<br/<br\//g' | sed 's/--/<br\/>/g' | sed 's/&\ //g' >> /var/www/html/files/results.inc;
results.wml
Code:
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<head>
<meta forua="true" http-equiv="Cache-Control" content="max-age=0"/>
</head>
<card id="results" title="results">
<p>
<!--#include virtual="/files/results.inc" -->
</p>
</card>
"http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<head>
<meta forua="true" http-equiv="Cache-Control" content="max-age=0"/>
</head>
<card id="results" title="results">
<p>
<!--#include virtual="/files/results.inc" -->
</p>
</card>
</wml>
This works so far, but there's a few issues:

1) as I said, completely dependant on dexonline's html formatting, but I see no way around that
2) I can only return the results from the first page without performing all the tasks over again, which would get to be a bit much
3) HATE that I have to request the bad URL first, then go back a page before continuing to results
4) there's room for problems, like when I tail the access log. If a bunch of pages from the webserver were requested at the same time, the tail might not catch the bad URL since it only goes back 4 lines, but if I go too many lines back it may catch more than one previous search URL if they've been done close together
5) There's almost always duplicate results, which gets annoying
6) Due to the formatting of the phone.html opage, I can't get the name of each resutl to display above it. I can only get the first one. Not good if a return doens't actually match what I searched
7) It kills me to know that there's got to be a MUCH simpler way to do this, I just don't have the experience...

I know, what a hack job, right? Ugh.

If anybody wants/needs to get at the wap site to test it, it's at http://theblumls.com/files/search.wml

So, can anybody help put this in a nice little package for me, and/or help with some if the remaining issues? Sorry to be so long here...

Thanks in advance,

Jeff

Last edited by jeffreybluml; 05-10-2005 at 07:03 PM.
 
Old 05-12-2005, 06:31 AM   #2
jeffreybluml
Member
 
Registered: Mar 2004
Location: Minnesota
Distribution: Fedora Core 1, Mandrake 10
Posts: 405

Original Poster
Rep: Reputation: 30
Anybody?

Update: I did change one thing after reading the man page for grep. This should help make it a little less dependant on the format of the page from dexonline. Rather than grep for the common formatting tags near the results, I'm now grepping straight for the returned phone numbers, liek this:
Code:
grep -m 30 -B 3 '([[:alnum:]][[:alnum:]][[:alnum:]]) [[:alnum:]][[:alnum:]][[:alnum:]]-[[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]'
That'll grab the number and lines above it. Much more reliable, I think.


Sure hope somebody responds with other suggestions though...hint hint.....

I think the biggest one would be removing the step wherein I have to (from the wap browser) request the non-existant URL from my server in order to get bash to be able to read it and attach it to the end of the wget command. If this can be done within the wml code or within the /bin/phonesearch script, that would really help. Isnt' there any way to get those variable out on the server side so they can be inserted into the URL template?



PLEASE help....

Moderators, could this perhaps be moved over to the general forum, or the newbie forum, to get more exposure? Is it lame of me to ask?

Thanks,

Jeff
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
read the input file and write in the given format suchi_s Programming 8 12-17-2004 01:12 AM
How can i read an write to a HTML or xml file using C alix123 Programming 1 11-24-2004 05:07 AM
read the input file from the specified line no till end suchi_s Programming 5 09-09-2004 04:36 AM
How to read ans parse MS word file using a Linux Shell script. Alek Linux - General 2 11-10-2003 02:07 PM
Change from Read only to Read Write File System? justiceisblind Linux - Newbie 3 03-03-2002 07:23 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration