LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-04-2011, 08:49 AM   #1
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Rep: Reputation: Disabled
Challenging for loop


Hello everyone

I'm a novice bash programmer and I've spent countless hours trying to figure this out, but have been unable to do so.

I have a large text file, populated by OpenStreetMap XML data. I've already been able to manipulate the data somewhat, but now I need to assign values of sorted data from inside each block of text, which begins with <way and ends with </way>. I think a for loop needs to be utilized, but so far I've not been able to get it to work, it seems to only handle the first block of text.

Here's a sample of the blocks of text I'm working with:

Code:
<way id="14047734" user="balrog-kun" uid="20587" visible="true" version="2" changeset="4355871" timestamp="2010-04-07T16:22:53Z">
  <nd ref="41.4415530 -97.0597650"/>
  <nd ref="41.4415610 -97.0602370"/>
  <nd ref="41.4415610 -97.0604260"/>
  <nd ref="41.4415630 -97.0610820"/>
  <nd ref="41.4415650 -97.0617280"/>
  <nd ref="41.4415670 -97.0623990"/>
  <nd ref="41.4415680 -97.0630460"/>
  <nd ref="41.4415660 -97.0637100"/>
  <nd ref="41.4415600 -97.0643730"/>
  <nd ref="41.4415540 -97.0650030"/>
  <nd ref="41.4415550 -97.0656600"/>
  <nd ref="41.4415560 -97.0663220"/>
  <nd ref="41.4415540 -97.0669980"/>
  <nd ref="41.4415510 -97.0676330"/>
  <nd ref="41.4415480 -97.0682330"/>
  <nd ref="41.4415450 -97.0688240"/>
  <tag k="highway" v="residential"/>
  <tag k="name" v="West 5th Street"/>
  <tag k="tiger:cfcc" v="A41"/>
  <tag k="tiger:county" v="Colfax, NE"/>
  <tag k="tiger:name_base" v="5th"/>
  <tag k="tiger:name_direction_prefix" v="W"/>
  <tag k="tiger:name_type" v="St"/>
  <tag k="tiger:reviewed" v="no"/>
  <tag k="tiger:separated" v="no"/>
  <tag k="tiger:source" v="tiger_import_dch_v0.6_20070813"/>
  <tag k="tiger:tlid" v="136044731:136044732:136044733:136044734:136044735:136044736:136044737:136044738:136044739:136044740:136044741:136044742:136044743:136044744"/>
  <tag k="tiger:zip_left" v="68661"/>
  <tag k="tiger:zip_right" v="68661"/>
 </way>
 <way id="14051838" user="balrog-kun" uid="20587" visible="true" version="2" changeset="4355871" timestamp="2010-04-07T16:24:51Z">
  <nd ref="41.4424990 -97.0597570"/>
  <nd ref="41.4425060 -97.0604140"/>
  <nd ref="41.4425100 -97.0610700"/>
  <nd ref="41.4425150 -97.0617210"/>
  <nd ref="41.4425190 -97.0623910"/>
  <nd ref="41.4425180 -97.0629970"/>
  <nd ref="41.4425170 -97.0632130"/>
  <nd ref="41.4425070 -97.0637110"/>
  <nd ref="41.4424940 -97.0643640"/>
  <nd ref="41.4424920 -97.0644770"/>
  <nd ref="41.4424960 -97.0650030"/>
  <nd ref="41.4425010 -97.0656700"/>
  <nd ref="41.4425060 -97.0663240"/>
  <nd ref="41.4425100 -97.0669920"/>
  <nd ref="41.4425090 -97.0676190"/>
  <nd ref="41.4425080 -97.0682240"/>
  <nd ref="41.4425070 -97.0688140"/>
  <tag k="highway" v="residential"/>
  <tag k="name" v="West 6th Street"/>
  <tag k="tiger:cfcc" v="A41"/>
  <tag k="tiger:county" v="Colfax, NE"/>
  <tag k="tiger:name_base" v="6th"/>
  <tag k="tiger:name_direction_prefix" v="W"/>
  <tag k="tiger:name_type" v="St"/>
  <tag k="tiger:reviewed" v="no"/>
  <tag k="tiger:separated" v="no"/>
  <tag k="tiger:source" v="tiger_import_dch_v0.6_20070813"/>
  <tag k="tiger:tlid" v="136044713:136044714:136044715:136044716:136044717:136044718:136044719:136044720:136044721:136044722:136044723:136044724:136044725:136044726"/>
  <tag k="tiger:zip_left" v="68661"/>
  <tag k="tiger:zip_right" v="68661"/>
 <way id="14047724" user="balrog-kun" uid="20587" visible="true" version="2" changeset="4355871" timestamp="2010-04-07T16:22:51Z">
  <nd ref="41.4406220 -97.0650040"/>
  <nd ref="41.4415540 -97.0650030"/>
  <nd ref="41.4424960 -97.0650030"/>
  <nd ref="41.4434480 -97.0649950"/>
  <nd ref="41.4439650 -97.0649910"/>
  <nd ref="41.4443920 -97.0649820"/>
  <nd ref="41.4453630 -97.0649630"/>
  <tag k="highway" v="residential"/>
  <tag k="name" v="Denver Street"/>
  <tag k="tiger:cfcc" v="A41"/>
  <tag k="tiger:county" v="Colfax, NE"/>
  <tag k="tiger:name_base" v="Denver"/>
  <tag k="tiger:name_type" v="St"/>
  <tag k="tiger:reviewed" v="no"/>
  <tag k="tiger:separated" v="no"/>
  <tag k="tiger:source" v="tiger_import_dch_v0.6_20070813"/>
  <tag k="tiger:tlid" v="136036929:136036965:136036994:136037040:136037019"/>
  <tag k="tiger:zip_left" v="68661"/>
  <tag k="tiger:zip_right" v="68661"/>
 </way>
What I need the loop to do is examine each block of text, sort all the latitudes, sort all the longitudes, identify the highest and lowest of each, and then echo the difference between the highest and lowest for both latitude and longitude, next to the street name which is found in the block on the line which looks like this: <tag k="name" v="West 5th Street"/>, separated by commas.

I've been able to do this so far, by assigning variables to the highest and lowest of each, then doing the calculation of the differences using bc (since we're dealing with a decimal).

If bash is not an ideal language for this task, can you suggest another? Bash is the only language I've ever dealt with, and as I said, I'm a novice.

Can anyone help?
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 10-04-2011, 09:39 AM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Here is an awk solution:
Code:
BEGIN {
  minlat = 90
  maxlat = -90
  minlon = 180
  maxlon = -180
  sorted = 1
}

/<way /,/<\/way>/ {

  while ( $0 ~ /<nd ref=/ ) {
  
    c++
    lat[c] = gensub(/.*="([^ ]+) .*/,"\\1",1)
    lon[c] = gensub(/.*="[^ ]+ ([^"]+)".*/,"\\1",1)
    rec[c] = $0
    
    if ( lon[c] < minlon ) minlon = lon[c]
    if ( lon[c] > maxlon ) maxlon = lon[c]
    if ( lat[c] < minlat ) minlat = lat[c]
    if ( lat[c] > maxlat ) maxlat = lat[c]
    
    getline
    
    sorted = 0
  
  }
  
  if ( sorted == 0 ) {
  
    n = asort(rec)
  
    for (i = 1; i <= n; i++)
      print rec[i]
      
    sorted = 1   
    
  }
  
  if ( $0 ~ /<tag k="name"/ )
    print $0, maxlon-minlon ",", maxlat-minlat
  else
    print

}
Note that in your example there is a missing </way> after the second <way ...>. If you prefer bash, please show us what have you done so far, so that we can correct/complete your attempt.
 
Old 10-04-2011, 09:46 AM   #3
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
Thanks for helping, colucix! Can you tell me how to execute this script? Is it bash? C+? You mentioned awk, but I'm used to awk being a one-line command....
 
Old 10-04-2011, 09:50 AM   #4
Nylex
LQ Addict
 
Registered: Jul 2003
Location: London, UK
Distribution: Slackware
Posts: 7,464

Rep: Reputation: Disabled
AWK is programming language. You can put the code given above in a file with

Code:
#!/bin/awk -f
at the top and then make the script executable and run it. Edit: You'll need to give the correct path to awk on your system, obviously.

Last edited by Nylex; 10-04-2011 at 09:55 AM.
 
Old 10-04-2011, 09:56 AM   #5
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
awk is a complete programming language. You can take a look at the official GNU awk guide for details. To run this code you can save it in a file, e.g. test.awk then run awk using the -f option. Suppose your input file is OpenStreetMap.xml
Code:
awk -f test.awk OpenStreetMap.xml > Modified_OpenStreetMap.xml
In alternative you can add a sha-bang in the very first line of test.awk:
Code:
#!/usr/bin/awk -f
add executable permissions to the script and run:
Code:
./test.awk OpenStreetMap.xml > Modified_OpenStreetMap.xml
that is the sha-bang informs the shell to interpret the subsequent commands using awk.

Edit: beaten by nylex!
 
Old 10-04-2011, 10:08 AM   #6
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
Awesome! I got it working by adding #!/usr/bin/awk -f to the very top and made is executable. Then ran the command on the file with
Code:
grep '"name"' | cut -f4,5 -d'"' | sed 's/"\/>/,/g'
and now I get the desired output to continue my project...

Code:
West 5th Street, -0.009059, 2.3e-05
West 6th Street, -0.009067, 0.000974
Denver Street, -0.009067, 0.004741
However, can you tell me why one calculation results in "2.3e-05" output? Is there something that can be changed in the awk code to allow for more decimal places (if that's the problem)?
 
Old 10-04-2011, 10:13 AM   #7
Nylex
LQ Addict
 
Registered: Jul 2003
Location: London, UK
Distribution: Slackware
Posts: 7,464

Rep: Reputation: Disabled
Look at this, as well as the next section.
 
Old 10-04-2011, 10:18 AM   #8
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
Here's the bash code I was trying to make work. Honestly, this is not the most functional version, as I've modified the code dozens of times trying to get it to work.

Code:
for b in "$(echo $(cat bound.txt) | sed 's/<way/\n/g')"; do
	name="$(echo "$b" | sed 's/>/\n/g' | grep '"name"' | cut -f4 -d'"')" ;
	#for a in "$(echo "$b" | sed 's/>/\n/g' | grep ref=)" ; do 
	lat1="$(echo "$b" | awk -F'ref=' '{print $2$4$6$8$10}' | awk -F'"' '{print $2" "$4" "$6" "$8}' | awk '{print $1" "$3" "$5" "$7}')" ;
	long1=$(echo "$a" | cut -f2 -d'"' | awk '{print $2}' | head -1) ;
	lat2=$(echo "$a" | cut -f2 -d'"' | awk '{print $1}' | tail -1) ;
	long2=$(echo "$a" | cut -f2 -d'"' | awk '{print $2}' | tail -1) ;
	diff=$(echo $(echo "$(echo "$lat2 - $lat1" | bc) * 100000000" | bc) | cut -f1 -d'.') ;
		if [[ $diff -gt 5000 ]]; then
			echo "$name" "runs East & West"
		else
			echo "$name" "runs North & South"
		fi
done
At this point, this calculation doesn't work, but it once did (determined the correct direction of the street), but only for the last block of text in the file. As you can see, I'm trying to figure whether each street is parallel or perpendicular to a subject street, then I'm going to use another script I have for calculating distance between two geocodes (median values of lat/long of each street) and echo the farthest North, South, East, and West around a given point.

Who knows, maybe I'm going about this wrong, but I'm trying to get street names which describe the boundaries of a given address and radius. I've already got the address geocoding and radius code in place, which is fed to OpenStreetMap's server to provide me with the above mentioned XML to parse through.

Once again, I apologize for the novice code above.

Last edited by messinwu; 10-04-2011 at 10:19 AM. Reason: forgot the word 'not' in first line
 
Old 10-04-2011, 10:28 AM   #9
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Yes. You can try to add
Code:
OFMT="%.6f"
in the BEGIN section of the awk program, as suggested by Nylex. Take in mind that OFMT controls the numeric output from the print statement. If you use printf you can refine the format of the entire output at your pleasure.

An aside note: my script was meant to print out the whole input with some modifications (sorted lat/lon and differences in the k="name" line. But if your requirement is only what you've shown in your post above here is a reduced version of the script:
Code:
#!/usr/bin/awk -f
BEGIN {
  minlat = 90
  maxlat = -90
  minlon = 180
  maxlon = -180
  OFMT="%.6f"
}

/<way /,/<\/way>/ {

  while ( $0 ~ /<nd ref=/ ) {
  
    c++
    lat[c] = gensub(/.*="([^ ]+) .*/,"\\1",1)
    lon[c] = gensub(/.*="[^ ]+ ([^"]+)".*/,"\\1",1)
    
    if ( lon[c] < minlon ) minlon = lon[c]
    if ( lon[c] > maxlon ) maxlon = lon[c]
    if ( lat[c] < minlat ) minlat = lat[c]
    if ( lat[c] > maxlat ) maxlat = lat[c]
    
    getline
  
  }
  
  if ( $0 ~ /<tag k="name"/ ) {
    street=gensub(/.*v="([^"]+).*/,"\\1",1)
    print street ",", maxlon-minlon ",", maxlat-minlat
  }

}
Take in mind that using awk you don't need grep, cut or sed, since it can easily manage regular expressions, fields and substitutions.
 
2 members found this post helpful.
Old 10-04-2011, 10:38 AM   #10
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
Sounds like I need to learn AWK, I didn't know it was a complete programming language like bash.... I thought it was just a tool to be used inside of bash, like bc, sed, or cut.
 
Old 10-04-2011, 10:39 AM   #11
Nylex
LQ Addict
 
Registered: Jul 2003
Location: London, UK
Distribution: Slackware
Posts: 7,464

Rep: Reputation: Disabled
I was the same. There's a good tutorial here.

Edit: I think I'll leave this thread, as I'm not an AWK expert (I've only written one simple script!).

Last edited by Nylex; 10-04-2011 at 12:35 PM.
 
Old 10-04-2011, 11:02 AM   #12
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,422

Rep: Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158Reputation: 1158
Start with a real programming language. Don't send a shell to do a Camel's work.

Seriously... one of the single most-important features of Linux/Unix is the plethora of tools that are available, literally at your fingertips and at no cost. And it's not simply the language. Contributed software libraries contain thousands of ready-made and battle-tested tools. Therefore, your approach to any problem at all shouldn't begin with "how do I write this from scratch?" Instead, it should always assume that you are merely one of thousands of others who've already encountered almost exactly the same problem, and that a prefabricated solution already exists for it which you don't have to write.

Quote:
"The Comprehensive Perl Archive Network (CPAN) currently has 99,897 Perl modules in 23,477 distributions, written by 9,249 authors..."

Although I happen to be most-comfortable with Perl, "it isn't the particular language, but rather the entire approach." Pick your language. You've got at least a half-dozen to choose from. ("Bash scripting" is not one of them.) Every one of them has a contributed library in which problems like this one have been quite thoroughly and completely solved before. You cannot improve upon them with your isolated tho' well-meaning efforts, no matter how hard you try.

Quote:
Actum Ne Agas: Do Not Do A Thing Already Done.
Notice that, by dissing Bash as a solution, I'm not saying that "it can't be done in Bash." I'm simply asserting that, whether it can be done or not, Bash isn't the right tool for the job. Yeah, you can hammer a nail using a screwdriver. You can probably even drive a screw using a pocketknife. Your selection of tools and technique is the first, most important, and yet, most overlooked part of the process.

Quote:
Originally Posted by very important:
Please do not interpret this response as being intended to be in any way whatsoever "publicly demeaning or insulting to you." If any offense is taken, I cordially and sincerely wish to apologize to you in advance. Yes, I do mean to be quite direct and forceful in my language. But, not at your personal expense.

Last edited by sundialsvcs; 10-04-2011 at 11:13 AM.
 
Old 10-04-2011, 11:19 AM   #13
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
sundialsvcs - thanks for your comments. I agree with you 100%, I always try to find existing code and modify it to my needs. In fact, that's why I've never had to ask a question in the open forum like this in the 5 years I've been using Linux. I always find my answer using Google, as someone else has had the same issue before me. However, after extensive searching, nobody's ever tried to do what I'm trying to do. So, I had no choice but to go at it from scratch. Correct me if I'm wrong.

I'm only using bash thus far because it's the most understandable one, at least from my perspective.
 
Old 10-04-2011, 12:34 PM   #14
messinwu
LQ Newbie
 
Registered: Oct 2011
Posts: 17

Original Poster
Rep: Reputation: Disabled
colucix - I think I found a bug of sorts in the code you wrote for me. The 'minlat' variable seems to not change sometimes; it is being re-used in the next block of text, thus resulting in incorrect calculations.

I had the script print maxlat and minlat instead of the calculations, and noticed the minlat is the same for several of the streets, when in fact those latitudes are not even listed in that respective list in that '<way ' block. It seems to be carried over from the previous block perhaps.

Can you assist a bit more?
 
Old 10-04-2011, 01:29 PM   #15
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
messinwu, I advise you to heed well all of what sundialsvcs says. You may be falling into the 'every problem looks like a nail because my only tool is a hammer' syndrome. I think this is particularly important in this case, which uses XML. While the XML formatting (in terms of visual layout) may be as it is shown in your example today, XML can be reformatted to remove all whitespace without loss of information. The source of your XML may legitimately chose to change the formatting at random. As such, the use of a proper XML parser would be dictated, and writing one from scratch is non-trivial, but has been done for you by others in several different forms, and for various programming languages. It would serve you well to apply the expertise of others in developing your solution.

--- rod.
 
0 members found this post helpful.
  


Reply

Tags
xml


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Want something new and challenging Dazlord LinuxQuestions.org Member Intro 4 09-02-2011 08:57 PM
Challenging Requirements ncjks LinuxQuestions.org Member Intro 1 08-02-2010 04:49 PM
[SOLVED] Looking for a more challenging distro yanfaun Linux - Newbie 44 01-14-2010 07:09 PM
Need Help in these Challenging Project] netsoft2005 Linux - General 3 03-14-2006 01:46 AM
Please Help!! (Really Challenging Question) waiser General 0 12-17-2004 02:01 PM


All times are GMT -5. The time now is 04:47 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration