LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Clean log bash script? (https://www.linuxquestions.org/questions/linux-general-1/clean-log-bash-script-594670/)

QuarQuar 10-25-2007 09:42 PM

Clean log bash script?
 
I've got a very ugly series of pidgin logs, so I set out to write a bash script that could run through the output of ls and check to see if there was more than one log for a given day, and if there was, to cat them all to the first one for that date. The name format of the logs runs as year-month-day.html.

I'm kind of drawing a blank, though, and I can't come up with how this would work out. Any help or psuedocode would be greatly appreciated.

Micro420 10-25-2007 10:29 PM

Maybe you can just use the logrotate command, or create a custom log script in /etc/logrotate.d/pidgin

I'm just shooting out random thoughts here, and should be better than nothing. :(
Code:

#!/bin/sh

for LOGFILES in `ls /var/log/pidgin-logs/`
do

cat $LOGFILES >> `ls /var/log/pidgin-logs/ | sed xxx` #what's the sed option to list the first line?  Something like $1;$p????  Will this even work?
done


jschiwal 10-26-2007 12:44 AM

The logs are html webpages? It may be better to use tar to archive them. An HTML file will contain a header and tags, so concatenating a number of them together wouldn't produce a very useful output. You're description of the filename format for the logs can only match a single log file a day in the same directory, so merging logs from the same day conflicts with the format you gave unless you have a number of them each in it's own subdirectory.

You could use the find command to select the log files matching the filename pattern and that are over 24 hours old. ( using -iname or -iregex and -ctime ). The output of the find command could be used as the source of filenames for the tar command.
Code:

find ~/.pidgin/logs/ -iregex "/home/<yourusername>/.pidgen/[12][0-9][0-9][0-9]-[0-9][0-9]*-[0-9][0-9]*.html" -ctime +1 | xargs cf pidgenlog-$(date +'%f').tar
.pidgen/[12][0-9][0-9][0-9]-[0-9][0-9]*-[0-9][0-9]*.html" -ctime +1 | rm

I've guessed on the base location of the logs.

Also, where are these logs. Are they logs of conversations located in your home directory?

If the html logs are xhtml compliant, an xslt translation could extract the raw data from a number of files, which could be reconstituted to a single file. You could possible do the same thing using sed, but you would need to examine the source of the html files to determine how to extract the information you want saved.

When I was first responding to the post, I didn't realize that pidgin was the new name of gaim and thought that you meant pigeon as an adjective meaning a mess.:o

QuarQuar 10-27-2007 11:02 PM

I hadn't thought about the HTML tags. Since I don't really have a choice in the existing ones being in HTML, the script would have to be something like:

## Some kind of conditional to find if there's more than one for a given matching day
## Remove the final line from the first file and the first line from every other one that matches day
## Append files to the first one for that date in descending order

Is there a way to use find, locate or something similar to return the number of results rather than the results themselves? The only other way I could come up with would be to write a bunch of conditionals centered around something like this:

Code:

find $Year-$Month-$Day*.html >> temp
wc -l temp


jschiwal 10-27-2007 11:46 PM

You have the right idea.
find /dir -iname "$Year-$Month-$Day*.html" | wc -l.

However, there may be more tags to deal with. Simply stripping of the first and last line may not do it.
You can use sed to delete the first and last line:
sed '1d;$d' file >temp

test:
Code:

echo -e "line1\nline2\nline3\nline4" | cat -n
echo -e "line1\nline2\nline3\nline4" | sed '1d;$d'

You really need to study what the logs look like. Open the source in an editor. If they haven't had all of the extra white space stripped to reduce the filesize, it may be trivial to extract the body and construct a composite html page; or to produce a text log file. On the other hand, if they aren't simple you may need to resort to xml tools to translate the file into another form.

Here is a simple script I wrote to produce very simple html pages from txt files. I put it in my ~/bin/ directory.
Code:

#!/bin/bash
sed -e '1i\
<?xml version="1.0" encoding="UTF-8"?>\
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">\
<html xmlns="http://www.w3.org/1999/xhtml">\
<head>\
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\
<meta name="Generator" content="Kate, the KDE Advanced Text Editor" />\
<title>'"${1}"'</title>\
</head>\
<body>\
<pre>' -e '$a\
</pre>\
</body> ' ${1} >${1%.txt}.html

I simply converted a text file to an html page in the Kate program and then used that as a model for a 2 command sed program.

Particular tags may identify the information you want to extract.
I will do this to extract the file names I have backed up using the K3b program. The *.k3b project file is actually a zip file, which are XML files. It is a simple matter to use sed in a one liner ( although a long one with all of the pipes "|" ) to extract only the file names, convert \n to \000 and then pipe the file list to "| xargs -0 rm". In both of these examples, you need to study the format that you are working with. Especially when working with regular expressions; you need to be careful and exact.


All times are GMT -5. The time now is 05:52 PM.