LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-27-2010, 06:13 AM   #1
citygrid
LQ Newbie
 
Registered: Mar 2010
Posts: 10

Rep: Reputation: 0
Problem with RegEx using sed


I'm trying to isolate a number from a text file using sed. The text file looks like this:

-GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933 frames Codec -GARBAGE-GARBAGE-GARBAGE-

I tried the following:
Code:
sed "s/^.*Number of frames: //g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
Strangely, it only seems to be stripping off the end, but not the beginning, like so:
-GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933

I'm obviously not using the command correctly, so what am I doing wrong?

If anyone has alternatives using awk or grep, I'd be open to those as well, but for future reference I'm curious to know why my argument above is not working the way I expect it to.

Thanks in advance!
 
Old 03-27-2010, 06:31 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Almost always a bad specification - maybe there are two spaces somewhere, maybe a <tab>, ...
Use as little specific data as possible. If you just want the number, just specify numbers - something like (untested)
Code:
sed -r 's/.*([[:digit:]]*).*/\1/' "info.txt" > "frames.txt"
 
Old 03-27-2010, 06:56 AM   #3
citygrid
LQ Newbie
 
Registered: Mar 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
Almost always a bad specification - maybe there are two spaces somewhere, maybe a <tab>, ...
Thanks for replying.

I thought of that, of course, and went out in search of \t and double spaces, but there aren't any. The real mystery is that it's finding the expression. If I put in:
Code:
sed "s/Number of frames: /SOMEWORD/g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
it returns:
-GARBAGE-GARBAGE-GARBAGE- SOMEWORD183933

So it would seem that it has something to do with finding the beginning of the file (which is one line).

Any other ideas?

Quote:
Use as little specific data as possible. If you just want the number, just specify numbers - something like (untested)
Code:
sed -r 's/.*([[:digit:]]*).*/\1/' "info.txt" > "frames.txt"
This doesn't work, unfortunately, because "info.txt" contains a lot of numeric information about a video, such as number of frames, resolution, duration, audio bitrate, etc., so I wouldn't only be getting what I needed (which is the total number of frames).

Last edited by citygrid; 03-27-2010 at 06:58 AM. Reason: clarification
 
Old 03-27-2010, 07:05 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
What happens if you try your original attempt without the anchor (and are you using gnu sed) ?.
 
Old 03-27-2010, 07:40 AM   #5
citygrid
LQ Newbie
 
Registered: Mar 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
What happens if you try your original attempt without the anchor (and are you using gnu sed) ?.
It returns
-GARBAGE-GARBAGE-GARBAGE-......€183933

I'm not very experienced with Linux, but I imagine I'm using gnu sed. I'm on Ubuntu and am typing the command into the terminal.
 
Old 03-27-2010, 08:31 AM   #6
citygrid
LQ Newbie
 
Registered: Mar 2010
Posts: 10

Original Poster
Rep: Reputation: 0
It's bizarre. It has something to do with the anchor not finding the beginning of the line/file, and I can't figure it out. Even putting this in directly:

Code:
sed "s/^.*183933//g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
didn't return an empty file, as I would expect, but still gave me the whole file up to and including the number.

Anyway, knowing that I could at least replace the expression "Number of frames: " let me do this:

Code:
sed "s/Number of frames: /\n/g; s/ frames Codec.*$//g" "info.txt" | head -2 | tail -1 > "frames.txt"
so I've solved my problem, albeit in a convoluted manner, but it still doesn't give me any insight into why the first expression doesn't work. If anyone can explain this, please let me know.

In any case, Syg00, thank you for taking the time to help me out!
 
Old 03-27-2010, 05:39 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
That implies your leading search text occurs more than once (per line) in the data - try something like
Code:
sed -r 's/.*Number of frames: ([[:digit:]]+).*/\1/' "info.txt" > "frames.txt"
 
Old 03-27-2010, 08:43 PM   #8
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Quote:
Originally Posted by citygrid View Post
I'm trying to isolate a number from a text file using sed. The text file looks like this:

-GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933 frames Codec -GARBAGE-GARBAGE-GARBAGE-

I tried the following:
Code:
sed "s/^.*Number of frames: //g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
Strangely, it only seems to be stripping off the end, but not the beginning, like so:
-GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933

I'm obviously not using the command correctly, so what am I doing wrong?

If anyone has alternatives using awk or grep, I'd be open to those as well, but for future reference I'm curious to know why my argument above is not working the way I expect it to.

Thanks in advance!
Hi,

I copy&pasted your data into a file and executed your command. It worked fine, i.e. I got 183933 as output. I am using sed version 4.1.5, bash version is 3.2.39.
I noticed that you are using "double-quotes" instead of 'single-quotes' so maybe your sed instruction just fell victim to some expansion issues?
 
Old 03-27-2010, 09:17 PM   #9
citygrid
LQ Newbie
 
Registered: Mar 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by crts View Post
Hi,
I copy&pasted your data into a file and executed your command. It worked fine, i.e. I got 183933 as output. I am using sed version 4.1.5, bash version is 3.2.39.
I noticed that you are using "double-quotes" instead of 'single-quotes' so maybe your sed instruction just fell victim to some expansion issues?
Thanks for replying.

I did indeed try the single quote option, but it gave me the same output.

The funny thing is that when I pasted exactly what I wrote above, the regex worked for me, too. This led me to believe that there was something funky going on with the original output file from the video encoding program rather than the regular expression itself.

Anyway, I played around with it a bit, and found that if I resaved the text file as UTF-8 in gedit, then the original argument that I posted worked.

So in the end, it's simply a question of character coding, it seems.

Unfortunately, I don't know enough about the subject to understand why it monkeyed up the regex or how to fix the problem in the future, so if someone could enlighten me, I'd be much obliged.

Thanks again for your responses!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with sed regex homer_3 Linux - General 1 08-18-2009 01:57 PM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 05:10 AM
sed regex question ShaqDiesel Programming 1 02-09-2007 07:24 PM
regex problem with sed ta0kira Programming 7 06-20-2005 12:33 AM
Help with Sed and regex cmfarley19 Programming 6 11-18-2004 01:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration