LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-25-2014, 05:54 PM   #1
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Rep: Reputation: Disabled
Bash Regular Expression help


Hey guys
I am new to regular expressions and could use a little help if possible. Below is some output that I want to parse/filter so I have been trying my hand at some regular expressions.

I am trying to return any data samples with:

Size: 5000 to 30000
Birate: 320 or higher
Length: Not zero

After filtering everything to these requirements I then need to grab the previous line containing the link. I had this working before using the "-B 1" switch in order to get the previous line but once I changed my regex a bit it stopped working.

The best I have come up with so far for parsing the size is the following, but I am aiming for 5 meg to 30meg rather than 10 meg to 30meg

#Size: [0-9][0-9][0-9][0-9][K0-9][KB] #1 meg to 100meg
#Size: [0-3][0-5][0-9][0-9][0-9]KB #10 meg to 30meg

Code:
Sample data

[702] slsk://tarabusaw/E:/_GROOVESHARK/DMC/Commercial/2001/220/03 - DJ Luck & MC Neat Megamix - Les Adams.mp3
Size: 10607KB Bitrate: 192 Length: 0:00 Queue: 0 Speed: 22322 Free: Y filetype: mp3
Code:
Bash script to regex data ( currently not working )

file=$(  cat result.txt | grep 'Size: [0-3][0-5][0-9][0-9][0-9]KB' | grep 'Bitrate: [34][28][0-9]' | grep -v 'Length: 0:00' | grep -B 1 'slsk')
echo $file
Any help would be appreciated guys
 
Old 11-25-2014, 07:20 PM   #2
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
First, you don't need to use cat. grep takes a file as input.

In your statement the first grep is only going to pipe one line to the next grep and so on. You can try putting -B 1 with each grep.

Quote:
but I am aiming for 5 meg to 30meg rather than 10 meg to 30meg
As long as there is always a space between "Size:" and the first digit the first expression can be [\b,0-3].


One regex would make things a lot cleaner:
Code:
file=$(grep -B 1 'Size: [\b,0-3][0-5][0-9][0-9][0-9]KB.*Bitrate: [34][28][0-9].*Length: 0:00' result.txt | grep -C 1 'slsk')
 
1 members found this post helpful.
Old 11-25-2014, 07:26 PM   #3
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by norobro View Post
First, you don't need to use cat. grep takes a file as input.

In your statement the first grep is only going to pipe one line to the next grep and so on. You can try putting -B 1 with each grep.

As long as there is always a space between "Size:" and the first digit the first expression can be [\b,0-3].


One regex would make things a lot cleaner:
Code:
file=$(grep -B 1 'Size: [\b,0-3][0-5][0-9][0-9][0-9]KB.*Bitrate: [34][28][0-9].*Length: 0:00' result.txt | grep -C 1 'slsk')
Hey thanks a lot for the reply!

it appears that your regex only shows data with a length of 0:00 instead of not equal to 0:00
also the second line also appears. I'm trying to just dump the slsk: links up to .mp3, but of course only those that match the criteria. Thanks again for the help and do you have any idea why this is happening?
 
Old 11-25-2014, 07:37 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,996

Rep: Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187Reputation: 3187
May want to be careful there as the current solution provided now includes Length = 0.00 which is what was being asked to exclude

Another interesting thing to fact or in would be how well do you know the data prior to running the script?
I ask this because if out of maybe 1000s of lines there are potentially only a handful with 'slsk' in them, the script may be looking at the wrong information first (just a thought)

Another point, from memory bitrate is normally a fixed set of values, ie. I do not think you could have a bitrate of 100 (could be wrong of course).
Assuming correct, [34][28][0-9] would yield results which cannot exist but may be in the data ... again just a thought
 
1 members found this post helpful.
Old 11-25-2014, 07:41 PM   #5
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
May want to be careful there as the current solution provided now includes Length = 0.00 which is what was being asked to exclude

Another interesting thing to fact or in would be how well do you know the data prior to running the script?
I ask this because if out of maybe 1000s of lines there are potentially only a handful with 'slsk' in them, the script may be looking at the wrong information first (just a thought)

Another point, from memory bitrate is normally a fixed set of values, ie. I do not think you could have a bitrate of 100 (could be wrong of course).
Assuming correct, [34][28][0-9] would yield results which cannot exist but may be in the data ... again just a thought
Yes you are correct, it displays entries only with 0:00 length.
Also the concerns that you raised are not really a concern as the bitrate seems to consistently work and the slsk link is always the line preceding the attribute line (size etc).

You wouldn't happen to have a solution?
 
Old 11-25-2014, 08:35 PM   #6
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Sorry I overlooked the "-v".

I had to actually try my code.

Try this:
Code:
file=$(grep -B 1 -P 'Size: [\b,0-3][0-5][0-9][0-9][0-9]KB.*Bitrate: [1][9][0-9].*Length: (?!0:00)' result.txt | grep 'slsk' -C 1)
 
1 members found this post helpful.
Old 11-25-2014, 08:40 PM   #7
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by norobro View Post
Sorry I overlooked the "-v".

I had to actually try my code.

Try this:
Code:
file=$(grep -B 1 -P 'Size: [\b,0-3][0-5][0-9][0-9][0-9]KB.*Bitrate: [1][9][0-9].*Length: (?!0:00)' result.txt | grep 'slsk' -C 1)
hmmm I appear to get no results with that. I will walk through it step by step tomorrow when I am a bit more awake... Thanks for the help guys
 
Old 11-25-2014, 08:47 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,099

Rep: Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117
You need to show us more data.
Not the way I would have done it, but we can always learn - thank you all folks, including the OP.
 
Old 11-25-2014, 09:04 PM   #9
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
@Scottish_Jason - Note that I changed the bit rate expressions to match the one line of data that you supplied.
 
1 members found this post helpful.
Old 11-25-2014, 10:06 PM   #10
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
You need to show us more data.
Not the way I would have done it, but we can always learn - thank you all folks, including the OP.

results.txt

Search: mc neat Results from: User: dudu77
[711] slsk://dudu77/c:/users/eduardo/desktop/slsk/musics..................................()/balkan neat/03-(dunkelbunt)_feat_raf_mc_and_fanfare_ciocarlia-asfalt_tango.mp3
Size: 7071KB Bitrate: 96 Length: 10:03 Queue: 35 Speed: 9421 Free: N filetype: mp3

[712] slsk://dudu77/c:/users/eduardo/desktop/slsk/musics..................................()/balkan neat/06-(dunkelbunt)_feat_raf_mc_and_fanfare_ciocarlia-the_chocolate_butterfly.mp3
Size: 5066KB Bitrate: 96 Length: 7:12 Queue: 35 Speed: 9421 Free: Y filetype: mp3

[713] slsk://dudu77/c:/users/eduardo/desktop/slsk/musics..................................()/balkan neat/09-(dunkelbunt)_feat_stblocket-rauk_cocek_(dunkelbunt_rmx_feat_raf_mc).mp3
Size: 7258KB Bitrate: 96 Length: 10:19 Queue: 35 Speed: 9421 Free: Y filetype: mp3

---------
Search: mc neat Results from: User: shoom55
[714] slsk://shoom55/f:/albums/cd1/03 - nng ft kallahan & mc neat - right before my eyes.mp3
Size: 5275KB Bitrate: 160 Length: 4:30 Queue: 26 Speed: 16184 Free: N filetype: mp3

---------
Search: mc neat Results from: User: KiLLaBeeZ
[715] slsk://KiLLaBeeZ/d:/music/[=-various_artists-=]/va-pure_rnb_2-(retail)-2cd-2001-h3x/208-dj_luck_and_mc_neat_feat_jj-aint_no_stoppin_us_now.mp3
Size: 7574KB Bitrate: 180 Length: 5:44 Queue: 343 Speed: 34843 Free: N filetype: mp3

Last edited by Scottish_Jason; 11-25-2014 at 10:18 PM.
 
Old 11-25-2014, 11:01 PM   #11
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by norobro View Post
@Scottish_Jason - Note that I changed the bit rate expressions to match the one line of data that you supplied.
Yes I see that, but it still should have returned 192k samples in that case

edit: Actually I am getting results now that I have changed the bitrate back to the previous one.. great!
only problem left is that it displays both lines. While writing this I think I just remembered about a switch that prints only one line? will go and check

edit: ohhh -C 1 .... and it is already implemented, hmm...

Last edited by Scottish_Jason; 11-25-2014 at 11:07 PM.
 
Old 11-25-2014, 11:07 PM   #12
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Try this:
Code:
grep -B 1 -P '[0-9][0-9][0-9][0-9][0-9K][KB][\s,B].*[0-9][0-9][\s,0-9].*(?!0:00)' result.txt | grep 'slsk' -C 1
 
Old 11-25-2014, 11:24 PM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,099

Rep: Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117
Hmmm, lots of possible corner cases.
If it must be done in bash, I'd probably extract the numeric values into an array, and do real arithmetic tests on the values. sed or grep can do the extraction easily.

Better option might be a language with regex and proper logic idioms. Perl or awk might be a good start.
 
1 members found this post helpful.
Old 11-25-2014, 11:28 PM   #14
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by norobro View Post
Try this:
Code:
grep -B 1 -P '[0-9][0-9][0-9][0-9][0-9K][KB][\s,B].*[0-9][0-9][\s,0-9].*(?!0:00)' result.txt | grep 'slsk' -C 1
thanks again, but that never worked either... it just spat the whole text file out by the looks of it
 
Old 11-25-2014, 11:29 PM   #15
Scottish_Jason
LQ Newbie
 
Registered: Jan 2014
Posts: 15

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Hmmm, lots of possible corner cases.
If it must be done in bash, I'd probably extract the numeric values into an array, and do real arithmetic tests on the values. sed or grep can do the extraction easily.

Better option might be a language with regex and proper logic idioms. Perl or awk might be a good start.
I was thinking about doing it that way but came to the conclusion it might be over my head. I am fairly new to bash and regex and have never used perl etc. Only C+
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Binary codes in Bash regular expression. sanktwo Linux - Newbie 12 09-27-2012 07:39 AM
[SOLVED] BASH - regular expression elalexluna83 Programming 3 09-12-2012 10:23 AM
[SOLVED] bash script using regular expression edwardcode Programming 5 05-31-2012 02:07 AM
Bash Script / Regular Expression Problem rm_-rf_windows Linux - General 4 03-28-2012 01:05 PM
[SOLVED] [bash] rm regular expression help RaptorX Programming 26 08-01-2009 06:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration