Script to sort files based on part of file name

valunthar · 06-24-2011, 04:13 PM

I'm trying to write a script that would organize files in a directory based on part of a file name, but I'm running into a few snags in figuring out some of the logic. So far the script creates a new folder based on the name of the file in question and move the file into the new directory. However the current solution would only exacerbate the problem as it would put each file of a series into it's own folder rather than grouping the series into one directory. Nor does it take into account folders that have already been created. Also the filenames in this folder don't use a specific naming convention and usually hold a lot of extraneous information so tools such as sed would be of limited use.

Here's what I've come up with so far

Code:

#!/bin/bash

for f in $1; do
  dir=$1/$f
  mkdir -p "$dir"
  mv "$f" "$dir"
done

Any help with ironing out these bugs would be greatly appreciated.

David the H. · 06-24-2011, 06:28 PM

I don't quite understand what you mean by organizing files within a directory. What does that mean?

You can display names differently according to different attributes, using various options in ls, or the sort command, or such. But you can't change their true alphanumeric sorting order unless you rename everything.

So are you trying to rename all the files? If so, show us an actual example of the existing file and directory structure, and what you want it to look like afterwards.

If you have some other purpose in mind, please elaborate.

In other words, what is your real goal?

sundialsvcs · 06-24-2011, 06:33 PM

Why do it in bash?

In a very short Perl script you can slurp in a list of files, sort it using a cmp function that considers only part of the name and ... presto.

In a variety of other high-level programming tools, all of which are available on your Linux box, you can do the same.

valunthar · 06-24-2011, 07:53 PM

Bash is the only language I'm familiar with, though I am open to suggestions for other languages if it will do the job. Would you happen to have any code suggestions?

Tinkster · 06-24-2011, 09:16 PM

How about you add some examples of your file-names, and how they
are meant to relate to the directories you want to move them in?

Cheers,
Tink

valunthar · 06-24-2011, 11:25 PM

Some examples of file names would be

[releasegroup]series name-episode number.extension
[releasegroup]series name-episode number[md5 hash].extension
[releasegroup]series name-episode number[codec].extension
[releasegroup]series name-episode number[resolution].extension

I'm pretty sure that covers all of the permutations in that folder. Most file names contain spaces, while a few use underscores instead. Pretty much all of them uses dashes to separate the series name and episode number, but there are a couple that don't use them at all.

As far as the folder structure is concerned I would like to move all of the files pertaining to a specific series into a folder that corresponds to the name of the series in question.

David the H. · 06-25-2011, 06:22 AM

The way this would normally be done is to loop through the filenames, extract the title part from the name, use that to create a directory, and then move the file into it.

But you can only automate this if there's some way to clearly and reliably differentiate the text you want from the text that you don't want. If there's no regular pattern to the filenames, then it's likely to be difficult to impossible to do this.

Perhaps if you showed us some actual examples of the variations in filenames that could be encountered, we could come up with something that would work in the majority of cases. For the above file patterns, for example, this might work:

Code:

IFS=$'\n'   # sets field separator to newline; makes the script ignore
	    # spaces when word-splitting

for file ; do    #no need for "in", for loops default to reading "$@"

	dir="${file#*]}"    #strip everything up to the first "]" in the name
	dir="${dir%%-*}"    #strip everything after the first "-" in the name
	dir="${dir// /_}"   #replaces spaces with underscores in the dirname.
				 
	mkdir -p "$dir"
	mv -t "$dir" "$file"

done

This is not a very robust extraction though. It will only work on that very specific pattern where the title comes between "]" and "-".

Frankly, if I were you I'd first work to get all the files renamed in a consistent pattern. Then sorting them would be a cinch.

sundialsvcs · 06-25-2011, 07:02 AM

Quote:

Originally Posted by valunthar

Bash is the only language I'm familiar with, though I am open to suggestions for other languages if it will do the job. Would you happen to have any code suggestions?

It definitely should not be, because on your Linux system you will find (or can easily install for free):

Perl
Python
PHP
Ruby
gprolog
R
... and many more.

There are so many "tools for the job" out there, which are specifically intended to do that job, it makes little sense IMHO to use Bash.

The most significant element of these languages, in practice, is the very large library of contributed code that accompanies it. For instance, look at http://search.cpan.org for Perl. (Go ahead... type something... type anything...) All of these languages have similar.

And if you really love to do "programming in the half-shell," the Korn shell is just your ticket. (Minus the red sauce and good wine, unfortunately.)

When you start "digging deep" into the Windows system, well, you discover that it's not actually that deep. On the other hand, when you start poking around on a Linux box, even with what's installed there by default, you see that it's an Energizer® Bunny ... "it just keeps going, and going, and going."

David the H. · 06-26-2011, 03:56 AM

I have to disagree with the above. This is exactly the type of situation that shell scripting is designed for. After all, it only requires a simple loop and text pattern matching, along with directory creation and file renaming. Bash can do these just as well as any other language (assuming that there are actually usable patterns to match, as I talked about earlier).

Sure, you may be able to do it slightly better (read faster or more efficiently) if you already know perl or another language, but if you don't already know it, then why take hours or even days out of your time to study and write your own (likely poor) first attempts, when you can use a language you're already familiar with to do the same thing?

Suggesting learning other languages is a good thing, but suggesting learning one just to solve the problem at hand is not very helpful. Particularly when it can be accomplished without taking that extra step.