LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-21-2007, 05:07 PM   #1
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,561

Rep: Reputation: 57
Extract from stdout the string in [ ], in a nicely way


cat myfile.txt
Code:
    * [82] Download the cdrom [83] Linux Ubuntu Forum
I would like that a script i.e.
Code:
mysuperscript.sh myfile.txt
simply just outputs all intances that are between [ ]:

Code:
82 
83
I am using awk, but that is not really easy with it. I like awk since it is much easier than C, perl, and python. ... but that is not made for that at all I guess

Has someone an idea ?
thank you
 
Old 11-21-2007, 06:11 PM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Not awk, but a one-liner at least:
Code:
cat myfile.txt | perl -e 'while (<>){ print join "\n", $_ =~ m/\[([0-9]+)\]/g; }'
--- rod.

Last edited by theNbomr; 11-21-2007 at 06:13 PM.
 
Old 11-22-2007, 02:31 AM   #3
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
my go

here's a little script,

Code:
#!/usr/bin/perl

$/ = undef;             # go into slurp mode into the array @L
$slurp = <>;            # slurp the file

@L = $slurp =~ m/\[(.*?)\]/g;   # get the match

$" = "\n";              # set array separator which works if you...
print "@L\n";           # print array in quotes
 
Old 11-22-2007, 07:50 AM   #4
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

With standard commands:
Code:
#!/usr/bin/env sh

# @(#) s1       Demonstrate extraction of bracket-bounded numeric string.

set -o nounset
echo

debug=":"
debug="echo"

## Use local command version for the commands in this demonstration.

echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version bash grep sed

echo

echo '* [82] Download the cdrom [83] Linux Ubuntu Forum' >data1
FILE=data1

grep -E -o '\[[0-9]+\]' $FILE |
sed -e 's/\[//' -e 's/\]//'

exit 0
Producing:
Code:
% ./s1

(Versions displayed with local utility "version")
GNU bash 2.05b.0
grep (GNU grep) 2.5.1
GNU sed version 4.1.2

82
83
See man pages for details ... cheers, makyo
 
Old 11-22-2007, 08:00 AM   #5
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 38
Corrected (missread the question):

Code:
set -- $(grep -oE '\[[^]]+\]' filename)
printf "%s\n" "${@//[\[\]]}"
zsh:
Code:
set -- $(<filename)
print  "${(F)${(Mz)@#*\[*\]}//[\[\]]}"

Last edited by radoulov; 11-22-2007 at 08:11 AM.
 
Old 11-22-2007, 08:17 AM   #6
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
How's this for a cheat:
Code:
sed 's/[^][:digit:][]//g' test
It assumes all square brackets & numbers are in the pattern to keep. However, sed is only 1/24th the size of perl, and 1/3rd the size of grep.

The grep solution gets my vote.
 
Old 11-22-2007, 08:32 AM   #7
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 38
If the pattern is always numeric (without the brackets,
as the OP wants)

Code:
awk NF RS='[^0-9]]*'
Otherwise:

Code:
awk '$0=$2' RS="]" FS="["

Last edited by radoulov; 11-22-2007 at 08:39 AM.
 
Old 11-22-2007, 12:48 PM   #8
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,561

Original Poster
Rep: Reputation: 57
Amazing replies !! Amazing
 
Old 11-22-2007, 01:07 PM   #9
PAix
Member
 
Registered: Jul 2007
Location: United Kingdom, W Mids
Distribution: SUSE 11.0 as of Nov 2008
Posts: 195

Rep: Reputation: 40
Radoulov,
Code:
#1
awk '$0=$2' RS="]" FS="{" < test.txt
I see it works, but don't understand how. Can you explain at all please?

My difficulty: I see the primes (') which normally delimit an awk program. So it's essentially
Code:
#2
 '$0=$2'
If I run this on it's own, I can see that it is the complete program and what it does in the default environment; it takes the second field and makes it into the total field and prints it ( the default action - in the same way that print infers print $0 ).
Now because the RS="]" FS="[" are outside the program (I would normally have expected to see this type of assignment inside a BEGIN block)[code]
Code:
#3
awk 'BEGIN { RS="]";  FS="[" }
$0 = $2' < test.txt
, I expect that they are being passed as parameters and would have expected to see
Code:
#4
awk  -v FS="[" -v RS="]" '
$0 = $2' < test.txt
, which without the -v will throw a syntax error.

Well, Radoulov, I have to thank you for the explanation thus far, but here comes the bit I couldn't work out; what makes putting the command line assignments(?) after the program and before the redirected stdin input work?

I hope that this discussion will also be of use to others that hadn't quite thought about the subtleties implicit in your VERY NEAT bit of code. Please share your magic with me/us.

Personally (before today) I would have written the code as shown in #3 above, had I been smart enough to work it out for myself. Honest, I would prefer to leave you guessing . The version at #4 I might have used if it had occurred to me that It was valid to pass values to the awk built-in variables. It didn't although I am otherwise familiar with parameter passing in this manner.

So a super thread at several levels with lots of learning content. Well done everyone.
Sorry Bigears, I'm just an awk kinda guy. Perl is just too rich for me. One day perhaps, but don't hold your breath.
 
Old 11-22-2007, 01:27 PM   #10
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Another one line Perl solution.
Code:
perl -lne 'print for /\[(\d+)\]/g' file
 
Old 11-22-2007, 01:34 PM   #11
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,561

Original Poster
Rep: Reputation: 57
Code:
| grep -o 'http:[^"]*'
this thing, for instance, I am not really understanding

^ is for begin of line

-o, --only-matching
Show only the part of a matching line that matches PATTERN.

* to say whatever

[ ] and ", not idea
 
Old 11-22-2007, 01:36 PM   #12
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,561

Original Poster
Rep: Reputation: 57
Now my final script looks like this:

I am not sure if it is "nicely"
Code:
#!/bin/sh
cat do.html | grep MYSTRING1 | grep MYSTRING1 |  awk '$0=$2' RS="]" FS="["  > "/tmp/.TMP1.txt" 
for each in $(cat "/tmp/.TMP1.txt" ) ; do
	cat do.html | grep "^  $each." -A1 | grep "MYSTRING3" grep -o 'http:[^"]*'
done
 
Old 11-22-2007, 01:37 PM   #13
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

@PAix:

awk '$0=$2' RS="]" FS="[" infile

and

awk -v RS="]" -v FS="[" '$0=$2' infile

Are both according to the command line syntax rules: From the Sed & Awk manual:
awk [-v var=value] [-Fre] [--] 'pattern { action }' var=value datafile(s)
 
Old 11-22-2007, 02:24 PM   #14
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 38
Hi,
@PAix:

Yes,
as already stated, the variable assignment variable=text
is part of the Awk syntax and its usage may depend on
personal style.
But note that -v variable=text is not the same as variable=text,
for examle:

Code:
zsh-4.3.4-dev-2% awk -vv=x 'BEGIN{print v}'
x
zsh-4.3.4-dev-2% awk 'BEGIN{print v}' v=x

zsh-4.3.4-dev-2%
or:

Code:
zsh-4.3.4-dev-2% awk -vv=x 'BEGIN{print v}{print v;exit}' v=z <(yes)
x
z
Another important point is that
older awks don't support the -v syntax.

Anyway,
my Awk examples were kind of joke:
one doesn't have to use Awk for such trivial
task, like Ed Morton says it's like
suggesting a pneumatic drill, when it's posisble
to break it up with a toothpick.

Last edited by radoulov; 11-23-2007 at 12:56 AM. Reason: spelling ..., sorry :)
 
Old 11-22-2007, 07:08 PM   #15
PAix
Member
 
Registered: Jul 2007
Location: United Kingdom, W Mids
Distribution: SUSE 11.0 as of Nov 2008
Posts: 195

Rep: Reputation: 40
Hi Radoulov,

Thank you for taking the time.
It took a moment, but yes, all understood, the post-code assignment is only good where there are instances of data read in.
My sed is weak (working on it) and my awk/nawk (Solaris) is probably a little dated.

Re the pneumatic drill/toothpick anology, when you have a hammer, every problem is a nail!

Joke or not, it served a very useful update for me. It's appreciated.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
extract substring from string in C baddah Programming 6 02-02-2010 04:22 AM
stdout string formating: Bash vs. cpp jhwilliams Linux - Software 1 08-31-2007 05:30 PM
anyway to extract a version string from a non running vmlinuz ? rcorkum Slackware 8 12-08-2006 02:33 PM
redirecting stdout to /dev/null and stderr to stdout? Thinking Programming 1 05-18-2006 02:36 AM
The two won't play nicely. onelung02 Linux - Networking 3 08-17-2005 10:32 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration