Clean log bash script?
I've got a very ugly series of pidgin logs, so I set out to write a bash script that could run through the output of ls and check to see if there was more than one log for a given day, and if there was, to cat them all to the first one for that date. The name format of the logs runs as year-month-day.html.
I'm kind of drawing a blank, though, and I can't come up with how this would work out. Any help or psuedocode would be greatly appreciated. |
Maybe you can just use the logrotate command, or create a custom log script in /etc/logrotate.d/pidgin
I'm just shooting out random thoughts here, and should be better than nothing. :( Code:
#!/bin/sh |
The logs are html webpages? It may be better to use tar to archive them. An HTML file will contain a header and tags, so concatenating a number of them together wouldn't produce a very useful output. You're description of the filename format for the logs can only match a single log file a day in the same directory, so merging logs from the same day conflicts with the format you gave unless you have a number of them each in it's own subdirectory.
You could use the find command to select the log files matching the filename pattern and that are over 24 hours old. ( using -iname or -iregex and -ctime ). The output of the find command could be used as the source of filenames for the tar command. Code:
find ~/.pidgin/logs/ -iregex "/home/<yourusername>/.pidgen/[12][0-9][0-9][0-9]-[0-9][0-9]*-[0-9][0-9]*.html" -ctime +1 | xargs cf pidgenlog-$(date +'%f').tar Also, where are these logs. Are they logs of conversations located in your home directory? If the html logs are xhtml compliant, an xslt translation could extract the raw data from a number of files, which could be reconstituted to a single file. You could possible do the same thing using sed, but you would need to examine the source of the html files to determine how to extract the information you want saved. When I was first responding to the post, I didn't realize that pidgin was the new name of gaim and thought that you meant pigeon as an adjective meaning a mess.:o |
I hadn't thought about the HTML tags. Since I don't really have a choice in the existing ones being in HTML, the script would have to be something like:
## Some kind of conditional to find if there's more than one for a given matching day ## Remove the final line from the first file and the first line from every other one that matches day ## Append files to the first one for that date in descending order Is there a way to use find, locate or something similar to return the number of results rather than the results themselves? The only other way I could come up with would be to write a bunch of conditionals centered around something like this: Code:
find $Year-$Month-$Day*.html >> temp |
You have the right idea.
find /dir -iname "$Year-$Month-$Day*.html" | wc -l. However, there may be more tags to deal with. Simply stripping of the first and last line may not do it. You can use sed to delete the first and last line: sed '1d;$d' file >temp test: Code:
echo -e "line1\nline2\nline3\nline4" | cat -n Here is a simple script I wrote to produce very simple html pages from txt files. I put it in my ~/bin/ directory. Code:
#!/bin/bash Particular tags may identify the information you want to extract. I will do this to extract the file names I have backed up using the K3b program. The *.k3b project file is actually a zip file, which are XML files. It is a simple matter to use sed in a one liner ( although a long one with all of the pipes "|" ) to extract only the file names, convert \n to \000 and then pipe the file list to "| xargs -0 rm". In both of these examples, you need to study the format that you are working with. Especially when working with regular expressions; you need to be careful and exact. |
All times are GMT -5. The time now is 05:52 PM. |