LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   separate out files based on name (https://www.linuxquestions.org/questions/programming-9/separate-out-files-based-on-name-872372/)

anon091 04-01-2011 08:05 AM

separate out files based on name
 
I have a folder named Pictures that contains a bunch of .jpg files. My problem is that they all have randomly numbered names, then there is a duplicate of the file that is random numbers then the letter a right before the .jpg.

for example, there would be 123.jpg and 123a.jpg, where 123a.jpg is just a resized version of 123.

What i'd like to do but have NO clue how to, is to have a script or something go through my Pictures folder, then copy the ones that end in a.jpg to a folder called Resized, and ones that dont have that to a folder called Originals. That way my Pictures folder will be in tact, and i'll have copies of them all separated out.

I have to do this all through the CLI on a machine, maybe I dont even need a script and can just do it with a slick command?

colucix 04-01-2011 08:09 AM

Are they all in the Pictures main directory or are they placed in sub-directories?

anon091 04-01-2011 08:14 AM

sub-directories, and lots of them

colucix 04-01-2011 08:20 AM

Hence you need a command to search recursively inside the Pictures folder. You can try find, using the -regex option to match file names with numbers and w/ or w/o the trailing a. The -exec action might serve to move them. On the other hand, if you want to preserve the original directory structure inside Resized and Originals, you may want to write a more elaborate script.

kurumi 04-01-2011 08:21 AM

Code:

mv *a.jpg /destination

Nominal Animal 04-01-2011 08:24 AM

Quote:

Originally Posted by rjo98 (Post 4310668)
copy the ones that end in a.jpg to a folder called Resized, and ones that dont have that to a folder called Originals.

If you want them copied to the folder and not duplicate the folder structure,
Code:

find Pictures/ -name '*a.jpg' -exec cp -vi -- '{}' Resized/ ';'
find Pictures/ -name '*[^a].jpg' -exec cp -vi -- '{}' Originals/ ';'

If you want to duplicate the folder structure too, then
Code:

cd Pictures/
find ./ -type d -exec mkdir -p -- '../Resized/{}' '../Originals/{}' ';'
find ./ -name '*a.jpg' -exec cp -vi -- '{}' '../Resized/{}' ';'
find ./ -name '*[^a].jpg' -exec cp -vi -- '{}' '../Originals/{}' ';'


anon091 04-01-2011 08:26 AM

cool, thanks guys. I'll try out Nominal's suggestions, I never would have been able to come up with that myself haha.

anon091 04-08-2011 01:21 PM

I tried the folder structure one, and it says "missing argument to '-exec' "

colucix 04-08-2011 03:38 PM

At this point keep it simple and do a loop to achieve your task step-by-step and to have a better control on what happens. For example, suppose you have the following directories:
Code:

/home/rjo98/Originals
/home/rjo98/Pictures
/home/rjo98/Resized

first move the re-sized pictures:
Code:

while read src
do
  dst=${src/Pictures/Resized}
  echo mkdir -p $(dirname $dst)
  echo mv $src $dst
done < <(find /home/rjo98/Pictures -name \*a.jpg)

then the original ones:
Code:

while read src
do
  dst=${src/Pictures/Originals}
  echo mkdir -p $(dirname $dst)
  echo mv $src $dst
done < <(find /home/rjo98/Pictures -name \*.jpg)

The echo statements are for testing purposes. Once you've verified the resulting commands are what you're looking for, remove the echo and run again. Please, note that the find command in the first loop searches the *a.jpg files, the second one searches all the remaining jpg. Hope this helps.

anon091 04-08-2011 03:41 PM

Thanks. How do I get it to run all those lines at once, put them in a .sh file then run that?

anon091 04-08-2011 03:43 PM

also, i was hoping to preserve the Pictures folder by just doing copies, in case something screwed up, but i guess that's why we echo first...

Ramurd 04-08-2011 03:44 PM

Quote:

I tried the folder structure one, and it says "missing argument to '-exec' "
I think that is because the && are caught by the shell and thus not to find; try escaping them ( replace them to \&\& )

Edit: the thread lived, and had to put in the quote to show what I was referring to.

colucix 04-08-2011 03:47 PM

Quote:

Originally Posted by rjo98 (Post 4318409)
Thanks. How do I get it to run all those lines at once, put them in a .sh file then run that?

Just copy/paste them on the command line, modify the path in the find command according to the actual path of the Pictures directory and press enter. Beware of not removing the echo statements before testing.

colucix 04-08-2011 03:50 PM

Quote:

Originally Posted by rjo98 (Post 4318411)
also, i was hoping to preserve the Pictures folder by just doing copies, in case something screwed up, but i guess that's why we echo first...

That's right. Anyway, you might substitute the mv command with cp, since the syntax in this case is the same.

anon091 04-08-2011 03:57 PM

getting somewhere with ramurd's suggestion...

now i get

mkdir: invalid option -- i

colucix 04-08-2011 04:13 PM

Quote:

Originally Posted by rjo98 (Post 4318423)
getting somewhere with ramurd's suggestion...

now i get

mkdir: invalid option -- i

What you're doing it's really dangerous... have you a backup of the Pictures folder? What if instead of a bash error (that causes the command to abort) it brings to an unpredictable result? Again, keep it simple...

Ramurd 04-08-2011 04:20 PM

Quote:

Originally Posted by rjo98 (Post 4318423)
getting somewhere with ramurd's suggestion...

now i get

mkdir: invalid option -- i

Given that:
1) none of the mkdirs stated by Nominal should give an -- i option
2) Nominal's commands are quoted thoroughly

I expect you're not putting all the quotes in there, they're very important, otherwise you can get bad and unpredictable results.
Otherwise Nominal's commands are safe and not dangerous at all; in fact very simple as it's all in one command that suits all.

To test out the commands you actually get, you can try to put an "echo" in front of them (so mkdir ... becomes echo mkdir ...) and see if you get results that are not so logical... Again, I expect a few missing quotes around {}.

anon091 04-08-2011 04:24 PM

that's what i do get, the only i that i have after a - is for the cp, maybe something isn't copying and pasting right from the web to my CLI.

colucix 04-08-2011 04:27 PM

Quote:

Originally Posted by Ramurd (Post 4318453)
I expect you're not putting all the quotes in there, they're very important, otherwise you can get bad and unpredictable results.
Otherwise Nominal's commands are safe and not dangerous at all; in fact very simple as it's all in one command that suits all.

Have you tested it? I tried the exact command suggested by Nominal Animal and then I applied your modification: I got the same results (errors) as the OP.

anon091 04-08-2011 04:27 PM

does anyone see what I did wrong?


find /Pictures/ -name '*A.jpg' -exec mkdir -p "/Resized/`dirname '{}'`" \&\& cp -vi '{}' "/Resized/`dirname '{}'`" ';'

anon091 04-08-2011 04:40 PM

while read src
do
dst=${src/Pictures/Resized}
echo mkdir -p $(dirname $dst)
echo mv $src $dst
done < <(find /home/rjo98/Pictures -name \*a.jpg)


I'm trying to modify this, would i make it like this? I'm so confused right now. My pictures folder is /Pictures and i want them to go to /Resized if they are *A.jpg

while read src
do
dst=${src/Resized}
echo mkdir -p $(dirname $dst)
echo mv $src $dst
done < <(find /Pictures -name \*A.jpg)

colucix 04-08-2011 04:44 PM

Quote:

Originally Posted by rjo98 (Post 4318465)
does anyone see what I did wrong?


find /Pictures/ -name '*A.jpg' -exec mkdir -p "/Resized/`dirname '{}'`" \&\& cp -vi '{}' "/Resized/`dirname '{}'`" ';'

The argument '{}' is not correctly interpreted inside the command substitution. Look at the difference between the results of the two commands:
Code:

find /Pictures/ -name '*A.jpg' -exec dirname '{}' \;
and
Code:

find /Pictures/ -name '*A.jpg' -exec echo `dirname '{}'` \;
or even
Code:

find /Pictures/ -name '*A.jpg' -exec echo `ls -l '{}'` \;
which will result in a more explicit error.

colucix 04-08-2011 04:52 PM

Quote:

Originally Posted by rjo98 (Post 4318481)
I'm trying to modify this, would i make it like this? I'm so confused right now. My pictures folder is /Pictures and i want them to go to /Resized if they are *A.jpg

while read src
do
dst=${src/Resized}
echo mkdir -p $(dirname $dst)
echo mv $src $dst
done < <(find /Pictures -name \*A.jpg)

Nope. The statement
Code:

${src/Pictures/Resized}
is a parameter substitution to replace the string Pictures with the string Resized in the src variable. You have to leave it untouched. The following should work:
Code:

while read src
do
  dst=${src/Pictures/Resized}
  echo mkdir -p $(dirname $dst)
  echo mv $src $dst
done < <(find /Pictures -name \*A.jpg)

Suppose the find command gives the only and unique result:
Code:

/Pictures/subdir/subsubdir/123A.jpg
inside the loop it will result in
Code:

dst=/Resized/subdir/subsubdir/123A.jpg
echo mkdir -p /Resized/subdir/subsubdir
echo mv /Pictures/subdir/subsubdir/123A.jpg /Resized/subdir/subsubdir/123A.jpg

which is exactly what you're looking for.

anon091 04-08-2011 04:54 PM

ok, thanks. I'll give it a shot, even though I dont understand the substitution, like how it knows where to stop

colucix 04-08-2011 04:58 PM

A little add-on. In case there are directory or file names with blank spaces, better to use double quotes:
Code:

while read src
do
  dst="${src/Pictures/Resized}"
  mkdir -p "$(dirname "$dst")"
  mv "$src" "$dst"
done < <(find /Pictures -name \*A.jpg)


anon091 04-08-2011 05:01 PM

Thanks, i'll try that instead.

but i have a question. so what if my original path was /here/are/the/Pictures and i wanted to move the ones with *a.jpg to /i/want/them/here

how would i put that in that dst line? maybe that will help me understand how it works.

colucix 04-08-2011 05:29 PM

Indeed the substitution assumes that the two directories Pictures and Resized have the same path. Suppose you have a string like this:
Code:

/home/user/Pictures/something/some.jpg
and you want to transform it to
Code:

/home/user/Resized/something/some.jpg
if the original string is assigned to a variable (called src), you can replace the substring Pictures with the string Resized using
Code:

${src/Pictures/Resized}
that is by using the following syntax:
Code:

${variable/pattern/replacement}
Obviously if the two directories are not in the same path, you can try a slightly different approach. For example:
Code:

while read src
do
  dst=/i/want/them/here"${src/*Pictures/}"
  mkdir -p "$(dirname "$dst")"
  mv "$src" "$dst"
done < <(find /here/are/the/Pictures -name \*a.jpg)

In this case the replacement string in the substitution is null (that is it removes the first part of the path until the Pictures directory). Then the new path is prefixed to the result. This avoids confusion between the slashes used as delimiters in the substitution syntax and the slashes contained in the destination path.

jschiwal 04-08-2011 05:47 PM

Quote:

Originally Posted by rjo98 (Post 4318282)
I tried the folder structure one, and it says "missing argument to '-exec' "

You will get this error if you don't add: \; to the end of the line.

Nominal Animal 04-08-2011 06:34 PM

Quote:

Originally Posted by rjo98 (Post 4318282)
I tried the folder structure one, and it says "missing argument to '-exec' "

Sorry, there was a thinko in the command. It's fixed now. The correct commands are
Code:

cd Pictures/
find ./ -type d -exec mkdir -p -- '../Resized/{}' '../Originals/{}' ';'
find ./ -name '*a.jpg' -exec cp -vi -- '{}' '../Resized/{}' ';'
find ./ -name '*[^a].jpg' -exec cp -vi -- '{}' '../Originals/{}' ';'


grail 04-09-2011 02:06 AM

How about:
Code:

#!/bin/bash

RESIZE='Pictures/Resize'
ORIGINAL='Pictures/Original'

while read -r FILEPATH FILE
do
    if [[ $FILE =~ a.jpg ]]
    then
        TMP="$RESIZE${FILEPATH#*Pictures}"
        [[ -d $TMP ]] || mkdir -p "$TMP"
    else
        TMP="$ORIGINAL${FILEPATH#*Pictures}"
        [[ -d $TMP ]] || mkdir -p "$TMP"
    fi

    cp -v "$FILEPATH/$FILE" "$TMP/$FILE"
done< <(find Pictures/ -type f -iname '*[0-9a].jpg' -printf "%h %f\n")

This is presumed to being run from the directory above your Pictures directory. If you wish to make it a little more flexible you could force the user to enter the directory path
and set it within the script, like:
Code:

if (( $# != 1 ))
then
    echo "Usage: $0 <path to files>"
    exit 1
else
    DIR="$1"
fi

<snip>

done< <(find "$DIR" -type f -iname '*[0-9a].jpg' -printf "%h %f\n")


anon091 04-11-2011 10:39 AM

Nominal, I dont think your solution is working totally, I think its skipping some files/folders, as when i ran the command on my linux box, it only got 3.4GB of stuff, while if i do a search for *a.jpg on my windows machine connected to it, it's only halfway done copying the files to a windows machine and its already at 5GB.

could it be folder names or something screwing this up?

anon091 04-11-2011 10:42 AM

I just ran the command for not a's then did a
find ./ -name '*a.jpg'
from when i was in that folder which should return no results since it was supposed to only grab the not a.jpg's, and its returning a ton of results.

This is so confusing

grail 04-11-2011 11:16 AM

But if you are using cp instead of mv then the files are copied and the originals will still be in the same place. Unless you have of course changed that??

Also, how your Windows machine reads sizes could also be affected by the format, ie fat or ntfs.

anon091 04-11-2011 11:25 AM

Isnt that one find statement looking for all the jpg's that aren't *a.jpg though, so it should be skipping them in that copy?

It was over 12GB off, and the number of files was vastly different.

Nominal Animal 04-11-2011 12:20 PM

Quote:

Originally Posted by rjo98 (Post 4321087)
Nominal, I dont think your solution is working totally

Ah, you must be using VFAT or NTFS, or some other case-insensitive file system. You should then use '*[Aa].[Jj][Pp][Gg]' and '[^Aa].[Jj][Pp][Gg]' instead, I think.

Let's recap. Start by changing to the folder that contains the Pictures folder. I'll write out each command you need to run following that explicitly.

First, remove and recreate the Resized and Originals directories by running
Code:

rm -rf Resized Originals
mkdir Resized Originals

Then, change to the Pictures/ folder and copy the directory structure only in ../Resized/ and ../Originals/ by running
Code:

cd Pictures
find . -type d -exec mkdir -p -- '../Resized/{}' '../Originals/{}' ';'

Then, still in the Pictures/ folder, copy the resized versions to the ../Resized/ tree by running
Code:

find . -type f -name '*[Aa].[Jj][Pp][Gg]' -exec cp -vi -- '{}' '../Resized/{}' ';'
Finally, copy the originals to the [FONT=Monospace]../Originals/[FONT] tree by running
Code:

find . -type f -name '*[^Aa].[Jj][Pp][Gg]' -exec cp -vi -- '{}' '../Originals/{}' ';'
If you wish to move the files, you should replace -exec cp with -exec mv in the two commands above.

Quote:

Originally Posted by rjo98 (Post 4321087)
I think its skipping some files/folders, as when i ran the command on my linux box, it only got 3.4GB of stuff, while if i do a search for *a.jpg on my windows machine connected to it, it's only halfway done copying the files to a windows machine and its already at 5GB.

No, it's not skipping, it is doing exactly what you told it to do. It works perfectly with file and folder names with spaces or any special characters, even quotes. It's not that.

The reason it does something else than what you'd expect is that Windows ignores case in file names, but POSIX operating systems do not. In other words, you said you have file names '*a.jpg', but actually you have '*A.jpg', '*a.JPG', '*A.Jpg' etc. For Windows users it is very easy to miss, but everybody else expects case sensitive file names.

I'd appreciate if you could confirm that your images have names like that, not just '*a.jpg' in lower case, so I don't feel like an bully pointing this out.

Nominal Animal 04-11-2011 12:44 PM

If you want to find out the actual total size of files in Linux, you can run e.g.
Code:

find Pictures -type f -printf '%s\n' | awk '{ s+=$1 } END { printf("%.3f GB\n", s/1000000000.0) }'
To find out the number of files, you can run e.g.
Code:

find Pictures -type f -printf '\n' | wc -l
To verify, you can run if the number of original and resized images sums up to the total:
Code:

TOTAL=`find Pictures -type f -printf '\n' | wc -l`
ORIGINALS=`find Pictures -type f -name '*[^Aa].[Jj][Pp][Gg]' -printf '\n' | wc -l`
RESIZED=`find Pictures -type f -name '*[Aa].[Jj][Pp][Gg]' -printf '\n' | wc -l`
echo "You have $ORIGINALS original images, $RESIZED resized images, and $[TOTAL-ORIGINALS-RESIZED] other files, in Pictures folder."

The total size of the original images, resized images, and all files you can find via
Code:

find Pictures -type f -name '*[^Aa].[Jj][Pp][Gg]' -printf '%s\n' | awk '{ s+=$1 } END { printf("%16.0f bytes of original images\n", s) }'
find Pictures -type f -name '*[Aa].[Jj][Pp][Gg]' -printf '%s\n' | awk '{ s+=$1 } END { printf("%16.0f bytes of resized images\n", s) }'
find Pictures -type f -printf '%s\n' | awk '{ s+=$1 } END { printf("%16.0f bytes total\n", s) }'

(There is also the du command you can use, see man du.)

I personally don't trust Windows further than I can throw it, at least not the status bar or info sidebar stuff. If I remember correctly, the Properties dialog (from the context menu) tells you both the total number of bytes and the number of bytes required to store the files on-disk, though.

grail 04-11-2011 10:37 PM

Also, you could use -iname to ignore case in your find.


All times are GMT -5. The time now is 12:54 AM.