LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
LinkBack Search this Thread
Old 02-25-2014, 01:31 PM   #1
petemac117
LQ Newbie
 
Registered: Feb 2014
Posts: 8

Rep: Reputation: Disabled
Need help combining two awk commands


Hey guys,

I have to do some shell scripting for my job which I've never done before. I've been learning a lot and picking up bash fairly well, as frustrating as it can be.

Anyways, I have this command:

videoLength=$(echo $video | awk 'BEGIN{FS="__"}{print $2}' | awk 'BEGIN{FS="."}{print $1}');

which parses the following line and extracts the clip length ("15000" in this case)

2014_02_24-10_30_14_525__15000.mp4

The code works, the only problem is that the script I'm writing needs to perform this extraction somewhere on the order of 60,000 times per day of video files being checked. I'm thinking that it might run a bit faster if I can somehow combine the two awk statements into a single one.

Is there a simple way to do this? It would be trivial if the file separators were all the same but with little awk experience I don't really know how to handle the mix of "_" "__" and "."

Any suggestions would be much appreciated.
 
Old 02-25-2014, 01:48 PM   #2
szboardstretcher
Senior Member
 
Registered: Aug 2006
Distribution: Arch 2014.02.01
Posts: 2,316
Blog Entries: 1

Rep: Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741
assuming you are pulling out the duration, a simpler, and less expensive way to do this is:

Code:
mediainfo --Output=General;%Duration% sample_filename.mp4
are you sure you just want to patch together your awks?

Last edited by szboardstretcher; 02-25-2014 at 01:50 PM.
 
Old 02-25-2014, 01:51 PM   #3
petemac117
LQ Newbie
 
Registered: Feb 2014
Posts: 8

Original Poster
Rep: Reputation: Disabled
I'm not sure I really understand what the code you posted does. Could you please explain it?

And no, I'm not set on just combining the awks, the real goal is just to make the script run faster.

Thanks for the quick reply
 
1 members found this post helpful.
Old 02-25-2014, 01:56 PM   #4
szboardstretcher
Senior Member
 
Registered: Aug 2006
Distribution: Arch 2014.02.01
Posts: 2,316
Blog Entries: 1

Rep: Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741
Ahhh. I apologize. I really misread what you were looking for.

If you are getting a list of filenames like:
Code:
2014_02_24-10_30_14_525__15000.mp4
Then the fastest way to get that 15000 number out is this:

Code:
videoLength=$(echo $video | grep -o -P '(?<=__).*(?=\.)')
No need to run an awk program or two.
 
1 members found this post helpful.
Old 02-25-2014, 02:34 PM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian Jessie / sid
Posts: 1,258

Rep: Reputation: 385Reputation: 385Reputation: 385Reputation: 385
'Single' awk

Code:
videoLength=$( awk -F[_.] '{print $(NF-1)}' <<<$video )
Two step bash string manipulation

Code:
videoLength=${video/*__/}
videoLength=${videoLength%.mp4}
Honestly couldn't tell you which was faster
 
2 members found this post helpful.
Old 02-25-2014, 02:45 PM   #6
petemac117
LQ Newbie
 
Registered: Feb 2014
Posts: 8

Original Poster
Rep: Reputation: Disabled
Alright these solutions seem to be exactly what I'm looking for (though I'm still a bit lost with all of the syntax). I'll give each of them a try and see how they work out in terms of speed.

Again thanks for all of the quick replies I've never used this forum before I was expecting to have to wait until tomorrow to get any solid answers, you guys are great though.
 
1 members found this post helpful.
Old 02-25-2014, 02:54 PM   #7
szboardstretcher
Senior Member
 
Registered: Aug 2006
Distribution: Arch 2014.02.01
Posts: 2,316
Blog Entries: 1

Rep: Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741
Helpful posts all around!
 
Old 02-25-2014, 03:03 PM   #8
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian Jessie / sid
Posts: 1,258

Rep: Reputation: 385Reputation: 385Reputation: 385Reputation: 385
If you get stuck let us know,

Even if I'm not around, someone else will be able to explain in further detail on how the grep, awk and bash solutions work
 
Old 02-25-2014, 03:49 PM   #9
petemac117
LQ Newbie
 
Registered: Feb 2014
Posts: 8

Original Poster
Rep: Reputation: Disabled
Wow...

I ended up using this one:

Code:
videoLength=${video/*__/}
videoLength=${videoLength%.mp4}
...dropped the runtime for 5 days from just under 30 minutes to just under 1 minute. I didn't think it would be that big of a difference but I guess when you run those 2 awk commands a couple tens of thousands of times it really adds up.

Anyways, thanks a lot Firerat you just made my day! Just out of curiosity could you explain how those two lines work? I take it the first one sets videoLength to everything after the "__" and the second one then sets it to everything before the ".mp4". I've never seen that syntax but it works extremely well so I'll take it.

szboardstretcher thank you too, although when I just copied and pasted that code it spat back a bunch of errors. I couldn't really tell you why but oh well.

Thanks again, guys!
 
Old 02-25-2014, 03:58 PM   #10
szboardstretcher
Senior Member
 
Registered: Aug 2006
Distribution: Arch 2014.02.01
Posts: 2,316
Blog Entries: 1

Rep: Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741
You probably have copy/paste issues,. but here is the working output of it FWIW:

Code:
[root@dev ~]# video=2014_02_24-10_30_14_525__15000.mp4
[root@dev ~]# echo $video
2014_02_24-10_30_14_525__15000.mp4
[root@dev ~]# videoLength=$(echo $video | grep -o -P '(?<=__).*(?=\.)')
[root@dev ~]# echo $videoLength
15000
 
1 members found this post helpful.
Old 02-25-2014, 04:05 PM   #11
petemac117
LQ Newbie
 
Registered: Feb 2014
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by szboardstretcher View Post
You probably have copy/paste issues,. but here is the working output of it FWIW:

Code:
[root@dev ~]# video=2014_02_24-10_30_14_525__15000.mp4
[root@dev ~]# echo $video
2014_02_24-10_30_14_525__15000.mp4
[root@dev ~]# videoLength=$(echo $video | grep -o -P '(?<=__).*(?=\.)')
[root@dev ~]# echo $videoLength
15000
I tried it again and it still spits back the following errors:

Code:
Usage: grep [-HhrilLnqvsoeFEABC] PATTERN [FILEs...]

/bin/testing: line 150: let: totalTime+=: syntax error: operand expected (error token is "=")
grep: invalid option -- P
BusyBox v1.13.3 (2014-01-27 12:56:22 EST) multi-call binary
I think the problem is that the script is running on the camera itself which has a watered-down version of linux (BusyBox)
 
Old 02-25-2014, 04:07 PM   #12
szboardstretcher
Senior Member
 
Registered: Aug 2006
Distribution: Arch 2014.02.01
Posts: 2,316
Blog Entries: 1

Rep: Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741Reputation: 741
Ah yes. The busybox grep command.

Pretty much worthless. Well, at least the bash replacements worked!

And the bash replacements, which i admit i forgot about, will indeed take up FAR less resources since bash is already loaded, where as grep would have to be loaded and run 60,000 times.

Last edited by szboardstretcher; 02-25-2014 at 04:08 PM.
 
1 members found this post helpful.
Old 02-25-2014, 04:13 PM   #13
metaschima
Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 572

Rep: Reputation: Disabled
So, is this a list of file names or are you operating on individual files ? It is usually faster to operate on lists.
 
Old 02-25-2014, 04:27 PM   #14
petemac117
LQ Newbie
 
Registered: Feb 2014
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by metaschima View Post
So, is this a list of file names or are you operating on individual files ? It is usually faster to operate on lists.
Right now it is operating on single files within a for loop:

Code:
for video in $(ls)
do

videoLength=${video/*__/}
videoLength=${videoLength%.mp4} 
let totalTime+=$videoLength;

done;
I'm pretty happy with the speed now but would it be faster to load these into a list and perform the operation on the list as a whole? It seems like how I have it is the most efficient since it has to increment the total time after each iteration, but I could be wrong.
 
Old 02-25-2014, 05:34 PM   #15
metaschima
Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 572

Rep: Reputation: Disabled
I would make a list and use awk to calculate everything (including the total), should be much faster.

Here's an example:
Code:
bash-4.2$ cat test.txt 
2014_02_24-10_30_14_525__15000.mp4
2014_02_24-10_30_14_525__15000.mp4
2014_02_24-10_30_14_525__15000.mp4
2014_02_24-10_30_14_525__15000.mp4
2014_02_24-10_30_14_525__15000.mp4
2014_02_24-10_30_14_525__15000.mp4
bash-4.2$ awk 'BEGIN{FS="__"}{ total+=substr($2,1,index($2,".")-1) }END{print total}' test.txt 
90000
bash-4.2$ awk 'BEGIN{FS="[_.]"}{ total+=$(NF-1) }END{ print total}' test.txt 
90000

Last edited by metaschima; 02-25-2014 at 05:47 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Combining With awk If Possible: ali2011 Programming 1 01-14-2012 04:38 PM
[SOLVED] Combining Two Files Using AWK ali2011 Programming 8 12-15-2011 10:03 PM
AWK - combining multiple columns AlexYZ Programming 5 02-24-2010 07:09 AM
Combining two commands into one spes_hominis Linux - Software 5 05-16-2007 10:51 AM


All times are GMT -5. The time now is 11:04 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration