LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   merge files by creation/modification date? (http://www.linuxquestions.org/questions/linux-newbie-8/merge-files-by-creation-modification-date-836174/)

andre.fm 10-04-2010 04:44 PM

merge files by creation/modification date?
 
Hi,

This is my first post so sorry if it's not on the right place, but hope someone can help me.


I have a folder with hundreds of .txt files (logs of some java application) that I have to merge in to one single .txt file. This application produces a new log file everyday:
day1: logFriday10September2010.txt
day2: logSaturday11September2010.txt
...
day8: logFriday17September2010.txt
...
and so on...

I could merge the files easily with "cat" and ">>" however, the problem is that I have to do it by taking into account the date (creation or modification) of the file.

If I simple use the cat command the output file will receive for example, all Fridays in a row, then all Saturdays, etc. and in that way I'm not considering the date.

I've searched for the options of the find command, since the files after creation are not modified...I try to use this for example:
$ find . -newer <some old file>
but that lists me all files after that <old file> and not by correct date.

Is it possible to do this?

Thanks in advance.

GrapefruiTgirl 10-04-2010 05:18 PM

Hi, welcome to LQ!

This is a fine place for your question, though if it turns into a programming contest, we might move it to /Programming. ;)

Meanwhile - I don't fully understand the question; precisely how you want the logs grouped (selected) by date is what I don't understand. Can you show, using some `ls -l` of a dozen or so of these files, exactly how you want them selected for merge?

It should be a not too monumental task, once we understand the exact requirement.

Thanks!

andre.fm 10-04-2010 05:31 PM

Quote:

Originally Posted by GrapefruiTgirl (Post 4117682)
Meanwhile - I don't fully understand the question; precisely how you want the logs grouped (selected) by date is what I don't understand. Can you show, using some `ls -l` of a dozen or so of these files, exactly how you want them selected for merge?

You're right, I've read my post again and it's confuse even for me :)

Anyway here it is an example:

$ ll -a
total 952
drwxr-xr-x 2 andre andre 4096 2010-10-02 18:13 ./
drwxr-xr-x 5 andre andre 4096 2010-10-02 18:13 ../
-rw-r--r-- 1 andre andre 19672 2010-09-11 00:51 AM-usa-cmu-1mb-1000-Fri10Sep2010-18h21m06s.txt
-rw-r--r-- 1 andre andre 83749 2010-09-18 00:59 AM-usa-cmu-1mb-1000-Fri17Sep2010-00h08m11s.txt
-rw-r--r-- 1 andre andre 21976 2010-09-25 00:58 AM-usa-cmu-1mb-1000-Fri24Sep2010-17h47m35s.txt
-rw-r--r-- 1 andre andre 83433 2010-09-14 00:52 AM-usa-cmu-1mb-1000-Mon13Sep2010-00h02m00s.txt
-rw-r--r-- 1 andre andre 20946 2010-09-28 01:21 AM-usa-cmu-1mb-1000-Mon27Sep2010-00h06m01s.txt
-rw-r--r-- 1 andre andre 83727 2010-09-12 00:52 AM-usa-cmu-1mb-1000-Sat11Sep2010-00h01m09s.txt
-rw-r--r-- 1 andre andre 83801 2010-09-19 00:59 AM-usa-cmu-1mb-1000-Sat18Sep2010-00h08m31s.txt
-rw-r--r-- 1 andre andre 84059 2010-09-26 00:58 AM-usa-cmu-1mb-1000-Sat25Sep2010-00h07m42s.txt
-rw-r--r-- 1 andre andre 83627 2010-09-13 00:52 AM-usa-cmu-1mb-1000-Sun12Sep2010-00h01m32s.txt
-rw-r--r-- 1 andre andre 13419 2010-09-19 04:49 AM-usa-cmu-1mb-1000-Sun19Sep2010-00h08m51s.txt
-rw-r--r-- 1 andre andre 61991 2010-09-27 01:05 AM-usa-cmu-1mb-1000-Sun26Sep2010-00h08m12s.txt
-rw-r--r-- 1 andre andre 5816 2010-09-16 02:33 AM-usa-cmu-1mb-1000-Thu16Sep2010-00h02m48s.txt
-rw-r--r-- 1 andre andre 46407 2010-09-17 00:58 AM-usa-cmu-1mb-1000-Thu16Sep2010-10h47m47s.txt
-rw-r--r-- 1 andre andre 16116 2010-09-30 17:19 AM-usa-cmu-1mb-1000-Thu30Sep2010-00h08m29s.txt
-rw-r--r-- 1 andre andre 83771 2010-09-15 00:53 AM-usa-cmu-1mb-1000-Tue14Sep2010-00h02m16s.txt
-rw-r--r-- 1 andre andre 25133 2010-09-29 01:43 AM-usa-cmu-1mb-1000-Tue28Sep2010-00h21m22s.txt
-rw-r--r-- 1 andre andre 83452 2010-09-16 00:53 AM-usa-cmu-1mb-1000-Wed15Sep2010-00h02m33s.txt
-rw-r--r-- 1 andre andre 21341 2010-09-30 01:08 AM-usa-cmu-1mb-1000-Wed29Sep2010-00h43m11s.txt


Now I want some file.txt that has the contents of:
1st - the contents of the oldest file (AM-usa-cmu-1mb-1000-Fri10Sep2010-18h21m06s.txt)
2nd - the contents of the 2nd oldest file (AM-usa-cmu-1mb-1000-Sat11Sep2010-00h01m09s.txt)
3rd - the contents of the 3rd oldest file ...

If I do for instance
$ cat *.txt >> file.txt
I get all messed up inside file.txt!

I would like to know if it's possible to take advantage of the creation date of the files and make an output file with "cat" or something but having as input all files sorted by date??

Was I clear now?

If you or someone have one idea to do this it would be wonderful since I'm doing copy-paste of files manually now!!
Thank you.

GrapefruiTgirl 10-04-2010 05:53 PM

Yes, more clear now.

Inspired by this post by grail I have an idea.

Code:

cat $(find .  -maxdepth 1 ! -name "\.*" -type f -printf "%T+ %p\n" | sort | sed 's/^.* //') >> BIGFILE
To break it down:

1) Find all files (except hidden ones) in current dir (no deeper), and show their date+time data:
find . -maxdepth 1 ! -name "\.*" -type f -printf "%T+ %p\n"

2) pipe it through sort to sort by numeric date+time data

3) sed to remove all the date junk from the start of each filename

4) It's all in a $() so cat cat's each file of the whole list, into one giant file.

No warranty included, but give that a go. Note that if you run the thing once, the output file is produced; if you then want to adjust something and run it again to try again, delete the output file first or you get an error about "the output file is the input file"..

Seems to work for me, but to make sure for yourself that the sorting order is good, run the `find` command piped into the `sort` first; if the sort order looks good, add the cat and the $() and the >> BIGFILE.

P.S. I did not test it on filenames with spaces in them, so if this poses a problem (and you detect it) do tell - we'd need a code change if there's a problem.

P.P.S - I removed the /g from the sed command - make sure you do the same.

Good luck!

andre.fm 10-04-2010 06:37 PM

Quote:

Originally Posted by GrapefruiTgirl (Post 4117698)
Yes, more clear now.

Inspired by this post by grail I have an idea.

Code:

cat $(find .  -maxdepth 1 ! -name "\.*" -type f -printf "%T+ %p\n" | sort | sed 's/^.* //') >> BIGFILE
To break it down:

1) Find all files (except hidden ones) in current dir (no deeper), and show their date+time data:
find . -maxdepth 1 ! -name "\.*" -type f -printf "%T+ %p\n"

2) pipe it through sort to sort by numeric date+time data

3) sed to remove all the date junk from the start of each filename

4) It's all in a $() so cat cat's each file of the whole list, into one giant file.

No warranty included, but give that a go. Note that if you run the thing once, the output file is produced; if you then want to adjust something and run it again to try again, delete the output file first or you get an error about "the output file is the input file"..

Seems to work for me, but to make sure for yourself that the sorting order is good, run the `find` command piped into the `sort` first; if the sort order looks good, add the cat and the $() and the >> BIGFILE.

P.S. I did not test it on filenames with spaces in them, so if this poses a problem (and you detect it) do tell - we'd need a code change if there's a problem.

P.P.S - I removed the /g from the sed command - make sure you do the same.

Good luck!

OK,

to me, officially, you are a genius :)
you have absolutely no idea of the effort and hours of work you're saving me from!

I've try the find first and it has sorted the files perfectly.
With the cat, the output is just perfect :)

Again, thank you :)


PS: Is there a need to close this thread or give the subject as closed/solved?

GrapefruiTgirl 10-04-2010 06:41 PM

No, not a genius, but thank you anyway. :)

I'm glad that works for you.

And yes, if you're satisfied with the solution to the problem, you can mark the thread [SOLVED] using the Thread Tools menu atop the first post. It'll help people who are searching for solved threads specifically.

Cheerios!


All times are GMT -5. The time now is 09:51 PM.