Find URL in Debian package index via awk/sed (=find a line, then search from there)
Hello everybody,
I would like to create a small script that searches a huge file in a special way. I assume that sed and awk may be able to do this, but have not the slightest clue how to start. The task in general: 1. Search the text file for a line with the content "aaa". 2. From there find the next line beginning with "bbb:" 3. Get the value behind "bbb:" 4. Search the text file for a line with the content "ccc:" plus value from last step 5. From there find the next line beginning with "ddd:" 6. Get the value behing "ddd:" Steps 1-3 and 4-6 are the same just with different values. The task in my case: I want to find the download URL of a virtual debian package. As this package is for a different architecture I can not use apt, dpkg, etc. (can't I?) First I download and extract the Packages index to get the text file I want to search in: Code:
wget -N http://http.us.debian.org/debian/dists/lenny/main/binary-armel/Packages.bz2 Code:
nano Packages Can sed/awk do this? In an "understandable" form? If so, could you also give some clues why you choosed your way and give some small explanations. This would help me with this reoccurring task, and also help me and others to get into sed/awk. Any help greatly appreciated Maddes |
Hi,
it's difficult to give you advise on that without knowing what your file actually looks like. Why not search for bbb in the first place? And is the line with bbb following immidiately after the line with aaa or are there some lines in between? Please clarify. |
Thanks for your reply.
I gave instructions in "The task in my case" section about how to download the file and extract it. And also what I currently do by hand to find the value I need. There are several hundreds of lines that start with "bbb:" (in my case "Depends:"), but I want to find the one following "aaa" (in my case "Package: linux-image-orion5x"). And there can be some other unimportant lines after "aaa" and before "bbb". Relevant excerpt from the file: Code:
Package: linux-image-ixp4xx I hope that makes it more clear now. Maddes |
Hi,
This is a sed script that worked: Code:
#!/bin/sed -nf Code:
chmod 774 sed-script BTW, I use sed version 4.1.5 and bash version 3.2.39. I tried it with the following sample data. I copied your data and pasted it twice into a textfile with slight modifications to also test against some unexpected cases. However, since your file suggests that it is in a standardized format I do not think that this was necessary; but one never knows what might be still to come in the future. So here is the modified data: Code:
Package: linux-image-ixp4xx |
That sed script works on the original file too. Thanks crts.
Will try to understand it over the next few days and then report back. If anybody thinks there is another solution to it, or has comments to the current solution feel free to add your thoughts. Maddes |
Quote:
|
Have the latest script version and the result is fine.
One immediate question: Can the strings "Package: linux-image-orion5x" and "Depends: " be replaced with variables, something similar as $1 $2 in shell scripts? e.g. to call it like ./sed_script "Package: linux-image-orion5x" "Depends: " <filename> Maddes |
Quote:
Code:
#!/bin/bash Code:
./bash_script 'Package: linux-image-orion5x' 'Depends: ' <filename> |
Have slightly changed the script and it works great. Exactly what I wanted.
Still have to understand sed :) Code:
root@debian5:~# ./get_package_detail 'linux-image-orion5x' 'Depends: ' Packages |
An awk solution
Recently I got much more into sed and awk, so here's the same solution in awk.
Note that awk is more like a programming language than a stream editor like sed, therefore it also takes longer to process the file. Code:
#!/bin/sh |
Better and faster sed solution
Additionally it turned out that an empty line marks the end of a package inside the list, therefore the block can be found and processed much easier with sed:
Code:
#!/bin/sh |
What you really appear to want is just to extract specific lines from individual entries so that you can process them with other commands. If you set awk's record and field separators properly, then this should be reasonably easy to do.
Code:
#!/usr/bin/awk -f Code:
packagename=linux-image-orion5x If we can absolutely guarantee that all field entries are always present and in the same order then it can be even easier. You can just print out the field numbers that correspond to the lines you want. |
All times are GMT -5. The time now is 05:00 AM. |