Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm trying to isolate a number from a text file using sed. The text file looks like this:
-GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933 frames Codec -GARBAGE-GARBAGE-GARBAGE-
I tried the following:
Code:
sed "s/^.*Number of frames: //g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
Strangely, it only seems to be stripping off the end, but not the beginning, like so: -GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933
I'm obviously not using the command correctly, so what am I doing wrong?
If anyone has alternatives using awk or grep, I'd be open to those as well, but for future reference I'm curious to know why my argument above is not working the way I expect it to.
Almost always a bad specification - maybe there are two spaces somewhere, maybe a <tab>, ...
Use as little specific data as possible. If you just want the number, just specify numbers - something like (untested)
Code:
sed -r 's/.*([[:digit:]]*).*/\1/' "info.txt" > "frames.txt"
Almost always a bad specification - maybe there are two spaces somewhere, maybe a <tab>, ...
Thanks for replying.
I thought of that, of course, and went out in search of \t and double spaces, but there aren't any. The real mystery is that it's finding the expression. If I put in:
Code:
sed "s/Number of frames: /SOMEWORD/g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
it returns:
-GARBAGE-GARBAGE-GARBAGE- SOMEWORD183933
So it would seem that it has something to do with finding the beginning of the file (which is one line).
Any other ideas?
Quote:
Use as little specific data as possible. If you just want the number, just specify numbers - something like (untested)
Code:
sed -r 's/.*([[:digit:]]*).*/\1/' "info.txt" > "frames.txt"
This doesn't work, unfortunately, because "info.txt" contains a lot of numeric information about a video, such as number of frames, resolution, duration, audio bitrate, etc., so I wouldn't only be getting what I needed (which is the total number of frames).
Last edited by citygrid; 03-27-2010 at 06:58 AM.
Reason: clarification
It's bizarre. It has something to do with the anchor not finding the beginning of the line/file, and I can't figure it out. Even putting this in directly:
Code:
sed "s/^.*183933//g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
didn't return an empty file, as I would expect, but still gave me the whole file up to and including the number.
Anyway, knowing that I could at least replace the expression "Number of frames: " let me do this:
Code:
sed "s/Number of frames: /\n/g; s/ frames Codec.*$//g" "info.txt" | head -2 | tail -1 > "frames.txt"
so I've solved my problem, albeit in a convoluted manner, but it still doesn't give me any insight into why the first expression doesn't work. If anyone can explain this, please let me know.
In any case, Syg00, thank you for taking the time to help me out!
I'm trying to isolate a number from a text file using sed. The text file looks like this:
-GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933 frames Codec -GARBAGE-GARBAGE-GARBAGE-
I tried the following:
Code:
sed "s/^.*Number of frames: //g; s/ frames Codec.*$//g" "info.txt" > "frames.txt"
Strangely, it only seems to be stripping off the end, but not the beginning, like so: -GARBAGE-GARBAGE-GARBAGE- Number of frames: 183933
I'm obviously not using the command correctly, so what am I doing wrong?
If anyone has alternatives using awk or grep, I'd be open to those as well, but for future reference I'm curious to know why my argument above is not working the way I expect it to.
Thanks in advance!
Hi,
I copy&pasted your data into a file and executed your command. It worked fine, i.e. I got 183933 as output. I am using sed version 4.1.5, bash version is 3.2.39.
I noticed that you are using "double-quotes" instead of 'single-quotes' so maybe your sed instruction just fell victim to some expansion issues?
Hi,
I copy&pasted your data into a file and executed your command. It worked fine, i.e. I got 183933 as output. I am using sed version 4.1.5, bash version is 3.2.39.
I noticed that you are using "double-quotes" instead of 'single-quotes' so maybe your sed instruction just fell victim to some expansion issues?
Thanks for replying.
I did indeed try the single quote option, but it gave me the same output.
The funny thing is that when I pasted exactly what I wrote above, the regex worked for me, too. This led me to believe that there was something funky going on with the original output file from the video encoding program rather than the regular expression itself.
Anyway, I played around with it a bit, and found that if I resaved the text file as UTF-8 in gedit, then the original argument that I posted worked.
So in the end, it's simply a question of character coding, it seems.
Unfortunately, I don't know enough about the subject to understand why it monkeyed up the regex or how to fix the problem in the future, so if someone could enlighten me, I'd be much obliged.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.