LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Copy files based on filename (https://www.linuxquestions.org/questions/linux-newbie-8/copy-files-based-on-filename-4175429144/)

Khandi 09-26-2012 12:12 PM

Copy files based on filename
 
Hi,

Im new to these forums so I want to apologize first of all for my english and secondly Ii want to apologize if I post in the wrong forum to aks this question. I am fairly new to linux and am looking for ways to make my life more convenient.

The problem i am currently facing is the following:

I want to automatically sort files based on filename. The format of these names is <identifier><date>.xlsx

What i want to achieve is having a script to copy the most recent files from a specific identifier to a specfic map. So for example I have the following files:

12345_20120901.xlsx
12345_20120902.xlsx
54321_20120901.xlsx
54321_20120902.xlsx

Wat i want is to have file;
12345_20120902.xlsx
54321_20120902.xlsx

to be placed in a different folder.. say /mnt/khandi/stuff/

What is the best way to achieve this by using a bashscript?

And again, I am very sorry if I post this in the wrong subforum and/or if it is too much to ask.

Thanks for the help, it is really appreciated on forehand.

Khandi

zwitterion-241920 09-26-2012 02:07 PM

Welcome to LQ!
If English is not your native language, I accept your apologies. We can't expect everyone be good at English, but do try to write correct English. :)
Accidentally posting in the wrong forum isn't a huge issue, a moderator will move your thread. (I think this is the right forum)

I don't know how to have bash find the most recent date, but if you want to do this for today's date, you could do:
Code:

mv *$(date +%y%m%d).xlsx /mnt/khandi/stuff
When bash reads this line, it first executes the commands between $( and ) and replaces these with their output.

Code:

date +%y%m%d
tells the computer to output the date in year-month-day format, so the command then becomes

Code:

mv *20120926.xlsx /mnt/khandi/stuff
The asterisk character is a wildcard that can be substituted for any number of any characters, which is everything, so that command then expands to:

Code:

mv 12345_20120926.xlsl 54321_20120926.xlsx /mnt/khandi/stuff
mv is the command to move files, from one place to another, so 12345_20120926.xlsx and 54321_20120926.xlsx are moved into /mnt/khandi/stuff.
Use cp instead of mv if you want to copy the files.

Khandi 09-26-2012 02:42 PM

Menno,

Thank you very much for the help, I really appreciate it. I will look at your snippets tomorrow to see if it will work for what I have in mind. It does seem to be able to get me where I want to be, in a way.

Although as I see it, due to using a wildcard for the identifier it does not seem to work on a specfic identifier, right? For example. I would very much want to move the most recent file from a certain identifier based on the date (and if possible even time, example: 12345_201209071345_yadada.xlsx is a valid filename even. So now i put it like this (sorry if I misrepresented the problem by limiting it by date only and not mention time as well)

In that line of thought, I was wondering though, since the date as I have it formatted in the filename is build up in <year><month><day><time> the most recent file should always have the highest number (if i ignore the first 6 characters in the filename a.k.a. <identifier_>. Do you perhaps know a way in which i would be able to differentiate between files based on the highest numerical value between different identifiers? so identifier A has several files and i want to move/copy the most recent one and i want the same to happen to identifier B.

I don't want to bother you too much since you already have been of great help, but nevertheless i wanted to ask you this in the hopes you know of a way for me to get closer to a solution to my problem.

Thanks for the help!

zwitterion-241920 09-26-2012 02:51 PM

I'm not a big expert on shell scripting, but this algorithm might work:

Code:

1: read filename
2: remove filename extension
3: read last 6 characters and store them in a variable, let's call it HIGHESTNUMBER
4: execute steps 1 and 2, but only update HIGHESTNUMBER if those six characters are greater than HIGHESTNUMBER
5: do step 4 until there are no more files left.
mv *$HIGHESTNUMBER.xlsx /mnt/khandi/stuff

We'll have to find a way to translate steps 1 through 5 to bash (or another language) and then it should work.
You might want to read the Advanced Bash Scripting Guide at tldp.org and the various tutorials at the UNIX grymoire.

Khandi 09-26-2012 02:59 PM

Menno,

Thank you for the fresh perspective on the matter. It does seem I have quite some work ahead of me to figure this one out. But the pointers you gave me are really helpful. Tomorrow i have time to delve into the matter again and will most certainly give your train of thought a chance, see if I can capture your steps into some sane scripting :D. Thanks for the links to those guides they will most likely proof to be very useful. Whenever I have found a solution to my problem I will most certainly share it here for all to see so we can all share our knowledge a bit :)

Thanks, again, for your help!

zwitterion-241920 09-26-2012 03:51 PM

I think I've found something:
Code:

mv $(ls -t | grep '\.xlsx' | head -n 1) /mnt/khandi/stuff
In UNIX-like operating systems, every file has a timestamp, that tells you when it was last modified. The command
Code:

ls -t
prints all files in the directory, the most recently modified file first. The '|' character makes the output of one command the input of the other.
The output of ls goes to grep, which only prints files that match a given pattern (.xlsx in this case) and then to head, which only prints the amount of lines specified after -n. The stuff between $( and ) thus gives the name of the most recently modified .xlsx file or (for a greater -n value) files, which mv then moves to /mnt/khandi/stuff.
This also makes it redundant to put the date in the filename.

I've tested this in bash and sh, but I'm not sure if it works in csh, zsh, ksh or tcsh.

UPDATE: it also works in zsh
UPDATE: typo corrected, "ls -l" should have been "ls -t"

Khandi 09-26-2012 04:31 PM

That is so awesome!! i will try this first thing tomorrow morning and let you know if it works out! Thanks a lot! really helpful! great! :D

pingu 09-29-2012 02:47 AM

Quote:

Originally Posted by mennohellinga (Post 4790122)
The command
ls -l
prints all files in the directory, the most recently modified file first.

You need to add a 't' to get the output sorted - "ls -lt".
This might differ between different versions of bash, at least on my Debian Lenny and OpenSuse 11 you need the 't'.

zwitterion-241920 09-29-2012 05:30 AM

Quote:

Originally Posted by pingu (Post 4792243)
You need to add a 't' to get the output sorted - "ls -lt".
This might differ between different versions of bash, at least on my Debian Lenny and OpenSuse 11 you need the 't'.

Actually, I need "ls -t". I've corrected the typo.
The -l option would output all sorts of other information that I would have to sed out, while grep already puts the individual files on their own lines.

I use
Code:

GNU bash, version 4.2.37(2)-release (x86_64-unknown-linux-gnu)
grep (GNU grep) 2.14

but I don't think the bash version is an issue because it also works in zsh 5.0.0 (x86_64-unknown-linux-gnu).

Wim Sturkenboom 09-29-2012 12:25 PM

There is one risk in using 'ls -t' for this problem. If the file has been 'touched' in some way or is not created on the date that is in the filename, the file's date does not reflect the datepart of the filename.

Code:

wim@i3-2120:~$ touch abc_19700101.xlsx
wim@i3-2120:~$ ls -l abc_19700101.xlsx
-rw-rw-r-- 1 wim wim 0 Sep 29 19:17 abc_19700101.xlsx
wim@i3-2120:~$

Where will the file go? And where should it go? ;)

pingu 09-29-2012 12:50 PM

Ah, yes - good point!
Reading OP's first post again, I'm not quite sure about the needs.
So, mr OP Khandi: what exactly is it you want to do?
* Put all files with same date in the filename in specific folder
* Put files created / changed on same date in specific folder
* Put the most recent files in specific folder

zwitterion-241920 09-29-2012 02:46 PM

Quote:

Originally Posted by Wim Sturkenboom (Post 4792630)
There is one risk in using 'ls -t' for this problem. If the file has been 'touched' in some way or is not created on the date that is in the filename, the file's date does not reflect the datepart of the filename.

Code:

wim@i3-2120:~$ touch abc_19700101.xlsx
wim@i3-2120:~$ ls -l abc_19700101.xlsx
-rw-rw-r-- 1 wim wim 0 Sep 29 19:17 abc_19700101.xlsx
wim@i3-2120:~$

Where will the file go? And where should it go? ;)

I think OP implements some sort of primitive version control system by clicking 'save as' every time he saves a modification. Also, I don't think most people will use any program other than OpenOffice/LibreOffice calc on .xlsx files.

Which made me realize this whole discussion is pointless: he needs a version control system (I can recommend Mercurial. (see this slashdot thread or the Wikipedia article.)) and then he just executes
Code:

mv * /mnt/khandi/stuff
to move the most recent version to the mounted storage device.

Khandi 10-02-2012 08:17 AM

Again, thanks eveyone for the kind advice. i will look into Mercurial see if that has what i need.

Pingu:

So, mr OP Khandi: what exactly is it you want to do?
* Put all files with same date in the filename in specific folder
* Put files created / changed on same date in specific folder
* Put the most recent files in specific folder

What i want is to move the most recent file per identifier in a different folder.

12345_20120901.xlsx
12345_20120902.xlsx
54321_20120901.xlsx
54321_20120902.xlsx

12345 and 54321 being the identifiers. I do not have the possibility to change filenames. But they are not being modified on that location. So modification date is something i can work with. so i want The most recent file from 12345 AND the most recent one from 54321 moved to a single location. We are talking about roughly 800 different identifiers. Thats why i want to be able to grab the most recent files in the most efficient way possible. Best thing would be to be able to grab only the files modified between now and 48 hours in the past. To make sure eveything stays tidy.

Thanks for all the help. Sorry i responded this late!

schneidz 10-02-2012 08:32 AM

something like this mite work:
Code:

ls -1 | cut -b -5 | sort | uniq | while read line
do
 cp `ls -1 $line_* | tail -n 1` /mnt/khandi/stuff/
done


Snark1994 10-03-2012 05:21 AM

I think what pingu was getting at was "How do you define 'most recent'?" Are we going by the modification date of the file (e.g. what you get by running 'ls -l filename') or the date in the name of the file?


All times are GMT -5. The time now is 07:17 PM.