ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm a novice bash programmer and I've spent countless hours trying to figure this out, but have been unable to do so.
I have a large text file, populated by OpenStreetMap XML data. I've already been able to manipulate the data somewhat, but now I need to assign values of sorted data from inside each block of text, which begins with <way and ends with </way>. I think a for loop needs to be utilized, but so far I've not been able to get it to work, it seems to only handle the first block of text.
Here's a sample of the blocks of text I'm working with:
What I need the loop to do is examine each block of text, sort all the latitudes, sort all the longitudes, identify the highest and lowest of each, and then echo the difference between the highest and lowest for both latitude and longitude, next to the street name which is found in the block on the line which looks like this: <tag k="name" v="West 5th Street"/>, separated by commas.
I've been able to do this so far, by assigning variables to the highest and lowest of each, then doing the calculation of the differences using bc (since we're dealing with a decimal).
If bash is not an ideal language for this task, can you suggest another? Bash is the only language I've ever dealt with, and as I said, I'm a novice.
Can anyone help?
Click here to see the post LQ members have rated as the most helpful post in this thread.
BEGIN {
minlat = 90
maxlat = -90
minlon = 180
maxlon = -180
sorted = 1
}
/<way /,/<\/way>/ {
while ( $0 ~ /<nd ref=/ ) {
c++
lat[c] = gensub(/.*="([^ ]+) .*/,"\\1",1)
lon[c] = gensub(/.*="[^ ]+ ([^"]+)".*/,"\\1",1)
rec[c] = $0
if ( lon[c] < minlon ) minlon = lon[c]
if ( lon[c] > maxlon ) maxlon = lon[c]
if ( lat[c] < minlat ) minlat = lat[c]
if ( lat[c] > maxlat ) maxlat = lat[c]
getline
sorted = 0
}
if ( sorted == 0 ) {
n = asort(rec)
for (i = 1; i <= n; i++)
print rec[i]
sorted = 1
}
if ( $0 ~ /<tag k="name"/ )
print $0, maxlon-minlon ",", maxlat-minlat
else
print
}
Note that in your example there is a missing </way> after the second <way ...>. If you prefer bash, please show us what have you done so far, so that we can correct/complete your attempt.
Thanks for helping, colucix! Can you tell me how to execute this script? Is it bash? C+? You mentioned awk, but I'm used to awk being a one-line command....
awk is a complete programming language. You can take a look at the official GNU awk guide for details. To run this code you can save it in a file, e.g. test.awk then run awk using the -f option. Suppose your input file is OpenStreetMap.xml
Awesome! I got it working by adding #!/usr/bin/awk -f to the very top and made is executable. Then ran the command on the file with
Code:
grep '"name"' | cut -f4,5 -d'"' | sed 's/"\/>/,/g'
and now I get the desired output to continue my project...
Code:
West 5th Street, -0.009059, 2.3e-05
West 6th Street, -0.009067, 0.000974
Denver Street, -0.009067, 0.004741
However, can you tell me why one calculation results in "2.3e-05" output? Is there something that can be changed in the awk code to allow for more decimal places (if that's the problem)?
Here's the bash code I was trying to make work. Honestly, this is not the most functional version, as I've modified the code dozens of times trying to get it to work.
Code:
for b in "$(echo $(cat bound.txt) | sed 's/<way/\n/g')"; do
name="$(echo "$b" | sed 's/>/\n/g' | grep '"name"' | cut -f4 -d'"')" ;
#for a in "$(echo "$b" | sed 's/>/\n/g' | grep ref=)" ; do
lat1="$(echo "$b" | awk -F'ref=' '{print $2$4$6$8$10}' | awk -F'"' '{print $2" "$4" "$6" "$8}' | awk '{print $1" "$3" "$5" "$7}')" ;
long1=$(echo "$a" | cut -f2 -d'"' | awk '{print $2}' | head -1) ;
lat2=$(echo "$a" | cut -f2 -d'"' | awk '{print $1}' | tail -1) ;
long2=$(echo "$a" | cut -f2 -d'"' | awk '{print $2}' | tail -1) ;
diff=$(echo $(echo "$(echo "$lat2 - $lat1" | bc) * 100000000" | bc) | cut -f1 -d'.') ;
if [[ $diff -gt 5000 ]]; then
echo "$name" "runs East & West"
else
echo "$name" "runs North & South"
fi
done
At this point, this calculation doesn't work, but it once did (determined the correct direction of the street), but only for the last block of text in the file. As you can see, I'm trying to figure whether each street is parallel or perpendicular to a subject street, then I'm going to use another script I have for calculating distance between two geocodes (median values of lat/long of each street) and echo the farthest North, South, East, and West around a given point.
Who knows, maybe I'm going about this wrong, but I'm trying to get street names which describe the boundaries of a given address and radius. I've already got the address geocoding and radius code in place, which is fed to OpenStreetMap's server to provide me with the above mentioned XML to parse through.
Once again, I apologize for the novice code above.
Last edited by messinwu; 10-04-2011 at 09:19 AM.
Reason: forgot the word 'not' in first line
in the BEGIN section of the awk program, as suggested by Nylex. Take in mind that OFMT controls the numeric output from the print statement. If you use printf you can refine the format of the entire output at your pleasure.
An aside note: my script was meant to print out the whole input with some modifications (sorted lat/lon and differences in the k="name" line. But if your requirement is only what you've shown in your post above here is a reduced version of the script:
Sounds like I need to learn AWK, I didn't know it was a complete programming language like bash.... I thought it was just a tool to be used inside of bash, like bc, sed, or cut.
Seriously... one of the single most-important features of Linux/Unix is the plethora of tools that are available, literally at your fingertips and at no cost. And it's not simply the language. Contributed software libraries contain thousands of ready-made and battle-tested tools. Therefore, your approach to any problem at all shouldn't begin with "how do I write this from scratch?" Instead, it should always assume that you are merely one of thousands of others who've already encountered almost exactly the same problem, and that a prefabricated solution already exists for it which you don't have to write.
Quote:
"The Comprehensive Perl Archive Network (CPAN) currently has 99,897 Perl modules in 23,477 distributions, written by 9,249 authors..."
Although I happen to be most-comfortable with Perl, "it isn't the particular language, but rather the entire approach." Pick your language. You've got at least a half-dozen to choose from. ("Bash scripting" is not one of them.) Every one of them has a contributed library in which problems like this one have been quite thoroughly and completely solved before. You cannot improve upon them with your isolated tho' well-meaning efforts, no matter how hard you try.
Quote:
Actum Ne Agas: Do Not Do A Thing Already Done.
Notice that, by dissing Bash as a solution, I'm not saying that "it can't be done in Bash." I'm simply asserting that, whether it can be done or not, Bash isn't the right tool for the job. Yeah, you can hammer a nail using a screwdriver. You can probably even drive a screw using a pocketknife. Your selection of tools and technique is the first, most important, and yet, most overlooked part of the process.
Quote:
Originally Posted by very important:
Please do not interpret this response as being intended to be in any way whatsoever "publicly demeaning or insulting to you." If any offense is taken, I cordially and sincerely wish to apologize to you in advance. Yes, I do mean to be quite direct and forceful in my language. But, not at your personal expense.
Last edited by sundialsvcs; 10-04-2011 at 10:13 AM.
sundialsvcs - thanks for your comments. I agree with you 100%, I always try to find existing code and modify it to my needs. In fact, that's why I've never had to ask a question in the open forum like this in the 5 years I've been using Linux. I always find my answer using Google, as someone else has had the same issue before me. However, after extensive searching, nobody's ever tried to do what I'm trying to do. So, I had no choice but to go at it from scratch. Correct me if I'm wrong.
I'm only using bash thus far because it's the most understandable one, at least from my perspective.
colucix - I think I found a bug of sorts in the code you wrote for me. The 'minlat' variable seems to not change sometimes; it is being re-used in the next block of text, thus resulting in incorrect calculations.
I had the script print maxlat and minlat instead of the calculations, and noticed the minlat is the same for several of the streets, when in fact those latitudes are not even listed in that respective list in that '<way ' block. It seems to be carried over from the previous block perhaps.
messinwu, I advise you to heed well all of what sundialsvcs says. You may be falling into the 'every problem looks like a nail because my only tool is a hammer' syndrome. I think this is particularly important in this case, which uses XML. While the XML formatting (in terms of visual layout) may be as it is shown in your example today, XML can be reformatted to remove all whitespace without loss of information. The source of your XML may legitimately chose to change the formatting at random. As such, the use of a proper XML parser would be dictated, and writing one from scratch is non-trivial, but has been done for you by others in several different forms, and for various programming languages. It would serve you well to apply the expertise of others in developing your solution.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.