LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 10-04-2011, 01:39 PM   #16
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957

Quote:
Originally Posted by messinwu View Post
colucix - I think I found a bug of sorts in the code you wrote for me. The 'minlat' variable seems to not change sometimes; it is being re-used in the next block of text, thus resulting in incorrect calculations.
Yeah, sorry! I forgot to reset the min and max values. Here is a corrected version (see the part highlighted in red):
Code:
#!/usr/bin/awk -f

BEGIN {
  minlat = 90
  maxlat = -90
  minlon = 180
  maxlon = -180
  OFMT = "%.6f"
}

/<way /,/<\/way>/ {

  while ( $0 ~ /<nd ref=/ ) {
  
    c++
    lat[c] = gensub(/.*="([^ ]+) .*/,"\\1",1)
    lon[c] = gensub(/.*="[^ ]+ ([^"]+)".*/,"\\1",1)
    
    if ( lon[c] < minlon ) minlon = lon[c]
    if ( lon[c] > maxlon ) maxlon = lon[c]
    if ( lat[c] < minlat ) minlat = lat[c]
    if ( lat[c] > maxlat ) maxlat = lat[c]
    
    getline
  
  }
  
  if ( $0 ~ /<tag k="name"/ ) {
    street=gensub(/.*v="([^"]+).*/,"\\1",1)
    print street ",", maxlon-minlon ",", maxlat-minlat
    minlat = 90
    maxlat = -90
    minlon = 180
    maxlon = -180
  }

}
 
1 members found this post helpful.
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 10-04-2011, 01:49 PM   #17
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
Hi Rod - thanks for chiming in. You might be right. If they change the formatting, my script will break. Then I'll have to see what they changed, and change my script to accomadate. I'm really not sure what a "real" XML parser would do for me. Would it recognize that they changed the tags on the fly, etc? I looked at the links that sundial provided, and to be honest, they're all complete greek to me. I don't understand almost anything on those pages. I'm a real estate broker, not a computer programmer, hence why I came here to get a little help. I would love to do things the "right" way, but you have to understand that different people with different skillsets will require different levels of spoon-feeding.

Thus far, I've been fairly successful with using bash to do basic automated logins using curl for data scraping to assist with my specialized niche. I know there's other/better ways of doing some of the stuff my scripts do, but if they work, I'm happy. I'm completely open to any level of tutoring anyone wants to provide.

Wow, although I did get a fast and accurate answer to my problem, I'm not sure this is the right forum for me. Maybe I posted my question in the wrong category, since I'm not an experienced programmer. I do feel like I was "pounced" on by dis-approving peers. Perhaps you have fallen in the 'you should use this very complex tool because it can do so much more than that barbaric simple tool' syndrome just the same...
 
Old 10-04-2011, 01:51 PM   #18
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
colucix - thank you, I figured that out as well, and was about to post it. I want you to know that I greatly appreciate your kind assistance today.

Along that line though, why didn't the values get reset since they're at the beginning of the script? I guess it's because they're outside of the loop which identifies the lat/lon variables. I tried sticking the reset code in a couple other places, but it only seems to work being at the end like you have it.

Last edited by messinwu; 10-04-2011 at 01:54 PM.
 
Old 10-04-2011, 01:54 PM   #19
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
You're welcome!
 
Old 10-04-2011, 02:08 PM   #20
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Sorry, I didn't intend to pounce, merely to point to ways to produce better code. People who reply here often see things which you would overlook, and see it as helpful to point them out. Please don't be offended by that; it is not the intended reaction.

With respect to XML parsing, I tried to explain that the way in which the XML is formatted should not be built into your parser. If the XML were generated as one long line of text, a proper parser would not break because of it. The way XML is often laid out for human visualization does not convey any information. Whitespace in and around any elements is completely ignored, and serves only human readability, and can change without affecting the content. It is realistic to expect the content to stay consistent. For instance, the following two fragments of your XML data are exactly equal, with respect to their content:
Code:
  <nd ref="41.4415540 -97.0669980"/>
  <nd ref="41.4415510 -97.0676330"/>
  <nd ref="41.4415480 -97.0682330"/>
  <nd ref="41.4415450 -97.0688240"/>
  <tag k="highway" v="residential"/>
  <tag k="name" v="West 5th Street"/>
  <tag k="tiger:cfcc" v="A41"/>
  <tag k="tiger:county" v="Colfax, NE"/>
Code:
  <nd ref="41.4415540 -97.0669980"/><nd ref="41.4415510 -97.0676330"/><nd ref="41.4415480 -97.0682330"/><nd ref="41.4415450 -97.0688240"/>
<tag k="highway" 
v="residential"/><tag k="name"                                                          v="West 5th Street"/>
<tag 
k="tiger:cfcc" v="A41"/><tag 
k="tiger:county" v="Colfax, NE"/>
A proper parser should be unaffected by this. It is difficult to write such a parser.

--- rod.

Last edited by theNbomr; 10-04-2011 at 02:19 PM.
 
1 members found this post helpful.
Old 10-04-2011, 02:55 PM   #21
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
Hmm, I see your point. I will look into using an XML parser, but thus far everything I've found makes no sense to me. I guess I tend to learn best by example, so what's what I look for when searching google.
 
Old 10-04-2011, 06:56 PM   #22
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Well, when I said 'It is difficult to write such a parser', I left out an additional point, which is that to write something of that complexity in bash would be just plain ridiculous (apologies to those who've already done so, and I'm sure someone must have). Given that, it would seem that using a more complete programming language is necessary. Since you say you're doing a bit of programming already, it probably makes sense that you can be more productive with a more powerful language anyway. As a sort of part-time programmer, I'd suggest looking at some other scripting language such as Perl or Python (this is where others will jump in to complete the list of 50 or so other candidates). sundialsvcs has given you examples of Perl modules which can be used to robustly parse XML, and I'm sure most languages will have similar modules available. You'll have to just choose a language. In most cases, it will be a bit painful at first, just like it was to learn bash, but by now you probably have a little bit to build on.
With respect to XML parsing in specific, there are some generalities to explain. In whatever language you use, the XML parser module (a language-agnostic description) will have some documented API (application programmer's interface), which is a collection of function calls and/or variables to read/write to extract data from your XML source. Happily, for XML, these tend to follow either of two somewhat standard forms. One form is that the XML parser reads the XML data, and as it does so, it calls bits of your code to hand off chunks of data that your program wants. At each of these callbacks, you can do whatever is necessary with the data (like print it to a file). Another style is that the parser just swallows the whole thing, breaking it into component pieces, and then provides a collection of functions to navigate around in and extract specified data from the XML data.
What is nice about this is that once you've done this with one language, you can apply what you know to almost any language that has an XML parser. Nicer still is that documentation for one parser applies fairly well to same-style parsers, even ones written for a different language.
If you do choose to take the jump, there are plenty of people in these forums and elsewhere who can provide guidance along the way. Since you seem to be inclined to self start, you'll probably find that helpful people will gravitate to the questions you ask. Good luck.

--- rod.
 
1 members found this post helpful.
Old 10-04-2011, 11:15 PM   #23
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,541

Rep: Reputation: 878Reputation: 878Reputation: 878Reputation: 878Reputation: 878Reputation: 878Reputation: 878
Just for fun, here is a shell-with-xmlstarlet solution:
Code:
#!/bin/sh
xml ed \
    -i //way/nd -t attr -n lat -v '' \
    -i //way/nd -t attr -n lon -v '' \
    -u //way/nd/@lat -x 'substring-before(../@ref, " ")' \
    -u //way/nd/@lon -x 'substring-after (../@ref, " ")' \
    "$1" \
    | \
    xml sel -T -t -m //way -v 'tag[@k="name"]/@v' -o ': ' \
    -v 'math:highest(nd/@lat) - math:lowest(nd/@lat)' \
    -o ', ' \
    -v 'math:highest(nd/@lon) - math:lowest(nd/@lon)' \
    --nl
XPath could really use some higher order functions: a function that operates on strings will ignore all but the first node when given a nodeset, making it pretty useless; so I had break up the latitude and longitude into their own attributes in a separate step.

Last edited by ntubski; 10-04-2011 at 11:17 PM. Reason: grammar
 
1 members found this post helpful.
Old 10-05-2011, 12:07 PM   #24
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Nice tip, ntubski. xmlstarlet looks like the definitive solution for the OP. I never considered the possibility that a bash-friendly tool already existed for XML parsing. I think I will have to give it a spin.

--- rod.
 
  


Reply

Tags
xml


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Want something new and challenging Dazlord LinuxQuestions.org Member Intro 4 09-02-2011 08:57 PM
Challenging Requirements ncjks LinuxQuestions.org Member Intro 1 08-02-2010 04:49 PM
[SOLVED] Looking for a more challenging distro yanfaun Linux - Newbie 44 01-14-2010 07:09 PM
Need Help in these Challenging Project] netsoft2005 Linux - General 3 03-14-2006 01:46 AM
Please Help!! (Really Challenging Question) waiser General 0 12-17-2004 02:01 PM


All times are GMT -5. The time now is 02:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration