LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   strip out a random part of a file name (https://www.linuxquestions.org/questions/linux-software-2/strip-out-a-random-part-of-a-file-name-4175589650/)

timl 09-18-2016 01:10 AM

strip out a random part of a file name
 
Hi,

I have a bunch of files I ripped from youtube. All files contain a random string before the extension. EG.
Quote:

Working Week - Sweet Nothing (Live)-3Hm1VFCYsxw.aac
Working Week - This Time-8kF0yjh-q-Y.aac
(I converted to aac) As you can see there is no fixed length or format to these file names. As I look at the examples I can see the complication is that the second file name contains two dashes, I thought there was consistently only one dash in this random string. My desire is to remove these strings, thus
Quote:

Working Week - Sweet Nothing (Live).aac
Working Week - This Time.aac
It would be really good to employ a command line tool to achieve this. Is there a process I can use which locates and removes the last string before the dot?

I also thought about stripping out the prefix:
Quote:

Working Week -
and then removing all fields between the first dash and the dot. Any ideas?

Apologies for providing a blank template but my sed/awk knowledge is very limited.

Cheers

syg00 09-18-2016 01:27 AM

The only way you can solve these sort of problems is to precisely define the data so regex can be built.
You've only half defined it.

Turbocapitalist 09-18-2016 01:45 AM

Yes, you'll need to precisely identify the pattern in order to move forward.

One tool that will help is the perl-based version of rename. Not only will it take perl regex, which is much more powerful and flexible than "awk", it also has the -n option to do a dry run without changing anything. The dry run allows you to practice before changing anything.

ondoho 09-18-2016 04:59 AM

First of all (!) i'd look if the software used for ripping has options for how to create filenames.
then, it seems to me that these last chars are the youtube video ids. not random.
and looking at some yt video links, they seem to be always 11 characters long.
so just remove 11 characters before the extensions?
i would use bash for that.
see here:
http://www.tldp.org/LDP/abs/html/str...ipulation.html

pan64 09-18-2016 05:11 AM

yes, they told you what to do, so you need to find out the rule what you want to use and construct a regex to do that.
You may try the tool rename which is already available, so you only need to execute it, you do not need to write a script.
The only additional info I can give you is to use an online regexp tester, like http://www.regexr.com/ and you may try this regexp, although I'm not really sure if that fits your needs:
Code:

^[^-]+- ([^-]+)-.*\.([^.]+)$

timl 09-18-2016 06:12 AM

thanks for the suggestions all. I need to look into the links and thoughts provided. It looks like a bit of work involved so I will see how I go.

Cheers

hyperhead 09-18-2016 06:34 AM

Hi

This worked for me

ls | grep Working | sed 's/[-][0-9].*[.]/./g'

However if you have any file names where the youtube id starts with a letter it wont work, it also assumes all the files start with the word Working.

Its enough to get you started anyhow.

keefaz 09-18-2016 08:28 AM

If id has always 11 chars length
Code:

file="Working Week - Sweet Nothing (Live)-3Hm1VFCYsxw.aac"
echo "${file::-16}.acc"


syg00 09-18-2016 08:18 PM

cute - I always forget about poor old bash.

Sefyir 09-18-2016 09:45 PM

Others has shown you methods of filtering the information.
I'm going to suggest youtube-dl as you can format the output template.

OUTPUT TEMPLATE parameters
https://github.com/rg3/youtube-dl/bl...utput-template

Example:
Code:

youtube-dl -o "%(title)s.%(ext)s"  --restrict-filenames 'https://www.youtube.com/watch?v=_HONxwhwmgU'
->Come_Together_-_John_Lennon_The_Beatles_Live_In_New_York_City.mp4

Use --restrict-filenames if you want to avoid nasty whitespace and special characters like ()''.
youtube-dl -h for more options

timl 09-19-2016 01:40 AM

Yep, Sefyir, that suggestion works. I'll have a play around with string manipulation as well.


All times are GMT -5. The time now is 06:05 AM.