LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
LinkBack Search this Thread
Old 03-26-2010, 06:59 PM   #1
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Rep: Reputation: 1
Find URL in Debian package index via awk/sed (=find a line, then search from there)


Hello everybody,

I would like to create a small script that searches a huge file in a special way.
I assume that sed and awk may be able to do this, but have not the slightest clue how to start.

The task in general:
1. Search the text file for a line with the content "aaa".
2. From there find the next line beginning with "bbb:"
3. Get the value behind "bbb:"
4. Search the text file for a line with the content "ccc:" plus value from last step
5. From there find the next line beginning with "ddd:"
6. Get the value behing "ddd:"

Steps 1-3 and 4-6 are the same just with different values.


The task in my case:
I want to find the download URL of a virtual debian package.
As this package is for a different architecture I can not use apt, dpkg, etc. (can't I?)

First I download and extract the Packages index to get the text file I want to search in:
Code:
wget -N http://http.us.debian.org/debian/dists/lenny/main/binary-armel/Packages.bz2
bunzip Packages.bz2
Then I do the following manual tasks:
Code:
nano Packages
#  CTRL+W, "Package: linux-image-orion5x"
#    Copy value from "Depends:" line, e.g. "linux-image-2.6.26-2-orion5x"
#  CTRL+W, "Package: " + value copied, e.g. "Package: linux-image-2.6.26-2-orion5x"
#    Copy value from "Filename:" line, e.g. "pool/main/l/linux-2.6/linux-image-2.6.26-2-orion5x_2.6.26-21_armel.deb"

Can sed/awk do this? In an "understandable" form?
If so, could you also give some clues why you choosed your way and give some small explanations.
This would help me with this reoccurring task, and also help me and others to get into sed/awk.

Any help greatly appreciated
Maddes
 
Old 03-28-2010, 04:40 AM   #2
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,

it's difficult to give you advise on that without knowing what your file actually looks like. Why not search for bbb in the first place? And is the line with bbb following immidiately after the line with aaa or are there some lines in between? Please clarify.
 
Old 03-28-2010, 06:16 AM   #3
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Original Poster
Rep: Reputation: 1
Thanks for your reply.
I gave instructions in "The task in my case" section about how to download the file and extract it.
And also what I currently do by hand to find the value I need.

There are several hundreds of lines that start with "bbb:" (in my case "Depends:"), but I want to find the one following "aaa" (in my case "Package: linux-image-orion5x").
And there can be some other unimportant lines after "aaa" and before "bbb".

Relevant excerpt from the file:
Code:
Package: linux-image-ixp4xx
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-ixp4xx
Depends: linux-image-2.6.26-2-ixp4xx
Filename: pool/main/l/linux-latest-2.6/linux-image-ixp4xx_2.6.26+17+lenny1_armel.deb
Size: 2514
MD5sum: 3bdd134c4704a3d18e3a31c5471a9436
SHA1: aa422f4ad5f992a8fc4777d0796a57c1484e70f6
SHA256: 5c5a90e03e1f35bb803f383d1cadef04ba8aacc72229b2b9af5f8e1756918745
Description: Linux image on IXP4xx
 This package depends on the latest binary image for Linux kernel on IXP4xx
 based (Linksys NSLU2, etc) machines.
Tag: admin::kernel, role::dummy

Package: linux-image-orion5x
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-orion5x
Depends: linux-image-2.6.26-2-orion5x
Filename: pool/main/l/linux-latest-2.6/linux-image-orion5x_2.6.26+17+lenny1_armel.deb
Size: 2526
MD5sum: a6c9721c85fa20012d8f8416a15c8236
SHA1: 8bf5552d2c2a4ee6e6eced35c1b5a2cc9815e8bf
SHA256: cab463cf0950a17a5962dc0fdcb6b21a85953395e51b22cefb7fcc8116773788
Description: Linux image on Orion
 This package depends on the latest binary image for Linux kernel on Orion
 5181, 5182 and 5281 based (QNAP TS-109/TS-209, etc) machines.
Tag: admin::kernel, qa::low-popcon, role::dummy

Package: linux-image-versatile
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-versatile
Depends: linux-image-2.6.26-2-versatile
Filename: pool/main/l/linux-latest-2.6/linux-image-versatile_2.6.26+17+lenny1_armel.deb
Size: 2504
MD5sum: 9e50b372cd67bedb0999491fa19ad64c
SHA1: 8868af24e081d77ea15c0e32ceb6170e94863199
SHA256: d376774482acfd8f765790da030b0087b35bdb760cb1c19298f1d6ca5d84f989
Description: Linux image on Versatile
 This package depends on the latest binary image for Linux kernel on
 Versatile (PB, AB, Qemu) machines.
Tag: admin::kernel, role::dummy
So the result of the first search (steps 1-3) should be "linux-image-2.6.26-2-orion5x".

I hope that makes it more clear now.

Maddes

Last edited by maddes.b; 03-28-2010 at 06:31 AM.
 
Old 03-28-2010, 07:19 AM   #4
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,
This is a sed script that worked:

Code:
#!/bin/sed -nf
/Package: linux-image-orion5x/ {
	x
	d
	}

/Package: linux-image-orion5x/ !{
	H
	}

/Depends:/ {
	H
	x
	/Package: linux-image-orion5x/ {
		s/\(.*\)\(Depends: \)\(.*\)/\3/
		p
		d
		}
	}
You will have to type (or better yet paste) this script into a textfile and safe it. Afterwards make the file executable by
Code:
chmod 774 sed-script
Let me know if it works for you.

BTW, I use sed version 4.1.5 and bash version 3.2.39.

I tried it with the following sample data. I copied your data and pasted it twice into a textfile with slight modifications to also test against some unexpected cases. However, since your file suggests that it is in a standardized format I do not think that this was necessary; but one never knows what might be still to come in the future.

So here is the modified data:
Code:
Package: linux-image-ixp4xx
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-ixp4xx
Depends: linux-image-2.6.26-2-ixp4xx
Filename: pool/main/l/linux-latest-2.6/linux-image-ixp4xx_2.6.26+17+lenny1_armel.deb
Size: 2514
MD5sum: 3bdd134c4704a3d18e3a31c5471a9436
SHA1: aa422f4ad5f992a8fc4777d0796a57c1484e70f6
SHA256: 5c5a90e03e1f35bb803f383d1cadef04ba8aacc72229b2b9af5f8e1756918745
Description: Linux image on IXP4xx
 This package depends on the latest binary image for Linux kernel on IXP4xx
 based (Linksys NSLU2, etc) machines.
Tag: admin::kernel, role::dummy

Package: linux-image-orion5x
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-orion5x
Depends: linux-image-2.6.26-2-orion5x
Depends: THIS WILL BE IGNORED !!!
Filename: pool/main/l/linux-latest-2.6/linux-image-orion5x_2.6.26+17+lenny1_armel.deb
Size: 2526
MD5sum: a6c9721c85fa20012d8f8416a15c8236
SHA1: 8bf5552d2c2a4ee6e6eced35c1b5a2cc9815e8bf
SHA256: cab463cf0950a17a5962dc0fdcb6b21a85953395e51b22cefb7fcc8116773788
Description: Linux image on Orion
 This package depends on the latest binary image for Linux kernel on Orion
 5181, 5182 and 5281 based (QNAP TS-109/TS-209, etc) machines.
Tag: admin::kernel, qa::low-popcon, role::dummy

Package: linux-image-versatile
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-versatile
Depends: linux-image-2.6.26-2-versatile
Filename: pool/main/l/linux-latest-2.6/linux-image-versatile_2.6.26+17+lenny1_armel.deb
Size: 2504
MD5sum: 9e50b372cd67bedb0999491fa19ad64c
SHA1: 8868af24e081d77ea15c0e32ceb6170e94863199
SHA256: d376774482acfd8f765790da030b0087b35bdb760cb1c19298f1d6ca5d84f989
Description: Linux image on Versatile
 This package depends on the latest binary image for Linux kernel on
 Versatile (PB, AB, Qemu) machines.
Tag: admin::kernel, role::dummy

Package: linux-image-ixp4xx
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-ixp4xx
Depends: linux-image-2.6.26-2-ixp4xx
Filename: pool/main/l/linux-latest-2.6/linux-image-ixp4xx_2.6.26+17+lenny1_armel.deb
Size: 2514
MD5sum: 3bdd134c4704a3d18e3a31c5471a9436
SHA1: aa422f4ad5f992a8fc4777d0796a57c1484e70f6
SHA256: 5c5a90e03e1f35bb803f383d1cadef04ba8aacc72229b2b9af5f8e1756918745
Description: Linux image on IXP4xx
 This package depends on the latest binary image for Linux kernel on IXP4xx
 based (Linksys NSLU2, etc) machines.
Tag: admin::kernel, role::dummy

Package: linux-image-orion5x
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-orion5x
Depends: linux-image-2.6.26-2-orion5x AND SOMETHING ELSE !!!
Filename: pool/main/l/linux-latest-2.6/linux-image-orion5x_2.6.26+17+lenny1_armel.deb
Size: 2526
MD5sum: a6c9721c85fa20012d8f8416a15c8236
Depends: THIS WILL BE IGNORED !!!	
SHA1: 8bf5552d2c2a4ee6e6eced35c1b5a2cc9815e8bf
SHA256: cab463cf0950a17a5962dc0fdcb6b21a85953395e51b22cefb7fcc8116773788
Description: Linux image on Orion
 This package depends on the latest binary image for Linux kernel on Orion
 5181, 5182 and 5281 based (QNAP TS-109/TS-209, etc) machines.
Tag: admin::kernel, qa::low-popcon, role::dummy

Package: linux-image-versatile
Priority: optional
Section: admin
Installed-Size: 32
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: armel
Source: linux-latest-2.6 (17+lenny1)
Version: 2.6.26+17+lenny1
Provides: linux-latest-modules-2.6.26-2-versatile
Depends: linux-image-2.6.26-2-versatile
Filename: pool/main/l/linux-latest-2.6/linux-image-versatile_2.6.26+17+lenny1_armel.deb
Size: 2504
MD5sum: 9e50b372cd67bedb0999491fa19ad64c
SHA1: 8868af24e081d77ea15c0e32ceb6170e94863199
SHA256: d376774482acfd8f765790da030b0087b35bdb760cb1c19298f1d6ca5d84f989
Description: Linux image on Versatile
 This package depends on the latest binary image for Linux kernel on
 Versatile (PB, AB, Qemu) machines.
Tag: admin::kernel, role::dummy

Last edited by crts; 03-28-2010 at 08:16 AM.
 
Old 03-28-2010, 08:24 AM   #5
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Original Poster
Rep: Reputation: 1
That sed script works on the original file too. Thanks crts.
Will try to understand it over the next few days and then report back.

If anybody thinks there is another solution to it, or has comments to the current solution feel free to add your thoughts.

Maddes

Last edited by maddes.b; 03-28-2010 at 09:13 AM.
 
Old 03-28-2010, 08:29 AM   #6
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Quote:
Originally Posted by maddes.b View Post
That sed script works on the original file too. Thanks crts.
Will try to understand it over the next few days and then report back.

If anybody thinks there is another solution to it, or has comments to the current solution feel free to add your thoughts.

Maddes
After I read your original post I realised that you do not want to print Depends. I had it adjusted. Not sure which version of the script you have but the current one will not print the preceding 'Depends: ' but just the value.
 
Old 03-28-2010, 09:28 AM   #7
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Original Poster
Rep: Reputation: 1
Have the latest script version and the result is fine.

One immediate question:
Can the strings "Package: linux-image-orion5x" and "Depends: " be replaced with variables, something similar as $1 $2 in shell scripts?
e.g. to call it like ./sed_script "Package: linux-image-orion5x" "Depends: " <filename>

Maddes

Last edited by maddes.b; 03-28-2010 at 09:30 AM.
 
Old 03-28-2010, 10:48 AM   #8
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Quote:
Originally Posted by maddes.b View Post
Have the latest script version and the result is fine.

One immediate question:
Can the strings "Package: linux-image-orion5x" and "Depends: " be replaced with variables, something similar as $1 $2 in shell scripts?
e.g. to call it like ./sed_script "Package: linux-image-orion5x" "Depends: " <filename>

Maddes
You will have to make it a bash script then:
Code:
#!/bin/bash

sed -n ' 
'/"$1"/' {
	x
	d
	}

'/"$1"/' !{
	H
	}

'/"$2"/' {
	H
	x
	'/"$1"/' {
		s/\(.*\)\('"$2"'\)\(.*\)/\3/
		p
		d
		}
	}' "$3"
If you call it the quote the parameters in 'single-quotes' like
Code:
./bash_script 'Package: linux-image-orion5x' 'Depends: ' <filename>
Otherwise they might get expanded unintentionally by the shell.
 
1 members found this post helpful.
Old 03-28-2010, 01:36 PM   #9
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Original Poster
Rep: Reputation: 1
Have slightly changed the script and it works great. Exactly what I wanted.
Still have to understand sed

Code:
root@debian5:~# ./get_package_detail 'linux-image-orion5x' 'Depends: ' Packages
linux-image-2.6.26-2-orion5x
root@debian5:~# ./get_package_detail 'linux-image-2.6.26-2-orion5x' 'Filename: ' Packages
pool/main/l/linux-2.6/linux-image-2.6.26-2-orion5x_2.6.26-21_armel.deb

Last edited by maddes.b; 03-28-2010 at 01:43 PM.
 
Old 06-26-2013, 05:40 PM   #10
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Original Poster
Rep: Reputation: 1
An awk solution

Recently I got much more into sed and awk, so here's the same solution in awk.
Note that awk is more like a programming language than a stream editor like sed, therefore it also takes longer to process the file.

Code:
#!/bin/sh

awk --posix '
BEGIN {
	process = 0
}

/Package:[[:space:]]*/ {
	# before checking the new block: end the previous block if that was processed
	if (process == 1) {
		print "----------------------------------------"
		process = 0
	}

	# if wanted block then process that block
	if (/Package:[[:space:]]*'"${1}"'/) {
		print $0
		process = 1
	}
}

# when block should be processed, look for wanted pattern and print those lines
process == 1 {
	if (/'"${2}"'/) {
		print $0
	}
}

END {
	if (process == 1) {
		print "----------------------------------------"
	}
}' ${3:+"$3"}

Last edited by maddes.b; 07-04-2013 at 02:15 PM.
 
Old 06-26-2013, 05:42 PM   #11
maddes.b
LQ Newbie
 
Registered: Aug 2009
Location: Germany
Distribution: Debian, OpenWrt
Posts: 23

Original Poster
Rep: Reputation: 1
Better and faster sed solution

Additionally it turned out that an empty line marks the end of a package inside the list, therefore the block can be found and processed much easier with sed:

Code:
#!/bin/sh

sed -n '
/^Package:[[:space:]]*'"${1}"'/,/^$/ {
	/^Package:[[:space:]]*/p
	/'"${2}"'/p
	/^$/a ----------------------------------------
}' ${3:+"$3"}

Last edited by maddes.b; 06-26-2013 at 05:43 PM.
 
Old 06-28-2013, 07:37 AM   #12
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
What you really appear to want is just to extract specific lines from individual entries so that you can process them with other commands. If you set awk's record and field separators properly, then this should be reasonably easy to do.

Code:
#!/usr/bin/awk -f

BEGIN{
    RS="\n\n"
    FS="\n"
}

{ 
    if ( $1 ~ pname )
    {
        for ( i=2 ; i<=NF ; i++ )
        {
            if ( $i ~ fnames )
            {			
                sub( /[^:]+: / , "" , $i )
                print $i
            }
        }
    }
}
Then to execute it in the shell:

Code:
packagename=linux-image-orion5x
fields='^(Depends|Filename)'

read -r -d '' package filename < <(
    /path/to/script.awk "pname=$packagename" "fnames=$fields" infile.txt
)

printf '%s\n' "$package" "$filename"

#results:
linux-image-2.6.26-2-orion5x
pool/main/l/linux-latest-2.6/linux-image-orion5x_2.6.26+17+lenny1_armel.deb
Now you can use the resulting variables as desired. Note that the fields variable is written as a regular expression, since that's the way awk treats them in tests. It also assumes there will be only a single match of the packagename.

If we can absolutely guarantee that all field entries are always present and in the same order then it can be even easier. You can just print out the field numbers that correspond to the lines you want.

Last edited by David the H.; 06-28-2013 at 07:38 AM.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
awk, sed find and replace recursively from files bluewind Linux - Newbie 17 02-26-2010 10:06 AM
SED how to find multiple patterns on a single line yaazz Programming 9 07-31-2009 04:20 AM
Use sed to find and replace a url xmrkite Linux - Software 4 10-10-2007 07:20 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM
find awk sed.. something along these lines citrus Linux - General 1 08-21-2006 03:04 PM


All times are GMT -5. The time now is 04:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration