merge files by creation/modification date?
This is my first post so sorry if it's not on the right place, but hope someone can help me.
I have a folder with hundreds of .txt files (logs of some java application) that I have to merge in to one single .txt file. This application produces a new log file everyday:
and so on...
I could merge the files easily with "cat" and ">>" however, the problem is that I have to do it by taking into account the date (creation or modification) of the file.
If I simple use the cat command the output file will receive for example, all Fridays in a row, then all Saturdays, etc. and in that way I'm not considering the date.
I've searched for the options of the find command, since the files after creation are not modified...I try to use this for example:
$ find . -newer <some old file>
but that lists me all files after that <old file> and not by correct date.
Is it possible to do this?
Thanks in advance.
Hi, welcome to LQ!
This is a fine place for your question, though if it turns into a programming contest, we might move it to /Programming. ;)
Meanwhile - I don't fully understand the question; precisely how you want the logs grouped (selected) by date is what I don't understand. Can you show, using some `ls -l` of a dozen or so of these files, exactly how you want them selected for merge?
It should be a not too monumental task, once we understand the exact requirement.
Anyway here it is an example:
$ ll -a
drwxr-xr-x 2 andre andre 4096 2010-10-02 18:13 ./
drwxr-xr-x 5 andre andre 4096 2010-10-02 18:13 ../
-rw-r--r-- 1 andre andre 19672 2010-09-11 00:51 AM-usa-cmu-1mb-1000-Fri10Sep2010-18h21m06s.txt
-rw-r--r-- 1 andre andre 83749 2010-09-18 00:59 AM-usa-cmu-1mb-1000-Fri17Sep2010-00h08m11s.txt
-rw-r--r-- 1 andre andre 21976 2010-09-25 00:58 AM-usa-cmu-1mb-1000-Fri24Sep2010-17h47m35s.txt
-rw-r--r-- 1 andre andre 83433 2010-09-14 00:52 AM-usa-cmu-1mb-1000-Mon13Sep2010-00h02m00s.txt
-rw-r--r-- 1 andre andre 20946 2010-09-28 01:21 AM-usa-cmu-1mb-1000-Mon27Sep2010-00h06m01s.txt
-rw-r--r-- 1 andre andre 83727 2010-09-12 00:52 AM-usa-cmu-1mb-1000-Sat11Sep2010-00h01m09s.txt
-rw-r--r-- 1 andre andre 83801 2010-09-19 00:59 AM-usa-cmu-1mb-1000-Sat18Sep2010-00h08m31s.txt
-rw-r--r-- 1 andre andre 84059 2010-09-26 00:58 AM-usa-cmu-1mb-1000-Sat25Sep2010-00h07m42s.txt
-rw-r--r-- 1 andre andre 83627 2010-09-13 00:52 AM-usa-cmu-1mb-1000-Sun12Sep2010-00h01m32s.txt
-rw-r--r-- 1 andre andre 13419 2010-09-19 04:49 AM-usa-cmu-1mb-1000-Sun19Sep2010-00h08m51s.txt
-rw-r--r-- 1 andre andre 61991 2010-09-27 01:05 AM-usa-cmu-1mb-1000-Sun26Sep2010-00h08m12s.txt
-rw-r--r-- 1 andre andre 5816 2010-09-16 02:33 AM-usa-cmu-1mb-1000-Thu16Sep2010-00h02m48s.txt
-rw-r--r-- 1 andre andre 46407 2010-09-17 00:58 AM-usa-cmu-1mb-1000-Thu16Sep2010-10h47m47s.txt
-rw-r--r-- 1 andre andre 16116 2010-09-30 17:19 AM-usa-cmu-1mb-1000-Thu30Sep2010-00h08m29s.txt
-rw-r--r-- 1 andre andre 83771 2010-09-15 00:53 AM-usa-cmu-1mb-1000-Tue14Sep2010-00h02m16s.txt
-rw-r--r-- 1 andre andre 25133 2010-09-29 01:43 AM-usa-cmu-1mb-1000-Tue28Sep2010-00h21m22s.txt
-rw-r--r-- 1 andre andre 83452 2010-09-16 00:53 AM-usa-cmu-1mb-1000-Wed15Sep2010-00h02m33s.txt
-rw-r--r-- 1 andre andre 21341 2010-09-30 01:08 AM-usa-cmu-1mb-1000-Wed29Sep2010-00h43m11s.txt
Now I want some file.txt that has the contents of:
1st - the contents of the oldest file (AM-usa-cmu-1mb-1000-Fri10Sep2010-18h21m06s.txt)
2nd - the contents of the 2nd oldest file (AM-usa-cmu-1mb-1000-Sat11Sep2010-00h01m09s.txt)
3rd - the contents of the 3rd oldest file ...
If I do for instance
$ cat *.txt >> file.txt
I get all messed up inside file.txt!
I would like to know if it's possible to take advantage of the creation date of the files and make an output file with "cat" or something but having as input all files sorted by date??
Was I clear now?
If you or someone have one idea to do this it would be wonderful since I'm doing copy-paste of files manually now!!
Yes, more clear now.
Inspired by this post by grail I have an idea.
1) Find all files (except hidden ones) in current dir (no deeper), and show their date+time data:
find . -maxdepth 1 ! -name "\.*" -type f -printf "%T+ %p\n"
2) pipe it through sort to sort by numeric date+time data
3) sed to remove all the date junk from the start of each filename
4) It's all in a $() so cat cat's each file of the whole list, into one giant file.
No warranty included, but give that a go. Note that if you run the thing once, the output file is produced; if you then want to adjust something and run it again to try again, delete the output file first or you get an error about "the output file is the input file"..
Seems to work for me, but to make sure for yourself that the sorting order is good, run the `find` command piped into the `sort` first; if the sort order looks good, add the cat and the $() and the >> BIGFILE.
P.S. I did not test it on filenames with spaces in them, so if this poses a problem (and you detect it) do tell - we'd need a code change if there's a problem.
P.P.S - I removed the /g from the sed command - make sure you do the same.
to me, officially, you are a genius :)
you have absolutely no idea of the effort and hours of work you're saving me from!
I've try the find first and it has sorted the files perfectly.
With the cat, the output is just perfect :)
Again, thank you :)
PS: Is there a need to close this thread or give the subject as closed/solved?
No, not a genius, but thank you anyway. :)
I'm glad that works for you.
And yes, if you're satisfied with the solution to the problem, you can mark the thread [SOLVED] using the Thread Tools menu atop the first post. It'll help people who are searching for solved threads specifically.
|All times are GMT -5. The time now is 02:23 PM.|