Help answer threads with 0 replies.
Go Back > Forums > Linux Forums > Linux - General
User Name
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.


  Search this Thread
Old 10-25-2007, 09:42 PM   #1
LQ Newbie
Registered: Oct 2007
Posts: 6

Rep: Reputation: 0
Clean log bash script?

I've got a very ugly series of pidgin logs, so I set out to write a bash script that could run through the output of ls and check to see if there was more than one log for a given day, and if there was, to cat them all to the first one for that date. The name format of the logs runs as year-month-day.html.

I'm kind of drawing a blank, though, and I can't come up with how this would work out. Any help or psuedocode would be greatly appreciated.
Old 10-25-2007, 10:29 PM   #2
Senior Member
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Rep: Reputation: 45
Maybe you can just use the logrotate command, or create a custom log script in /etc/logrotate.d/pidgin

I'm just shooting out random thoughts here, and should be better than nothing.

for LOGFILES in `ls /var/log/pidgin-logs/`

cat $LOGFILES >> `ls /var/log/pidgin-logs/ | sed xxx` #what's the sed option to list the first line?  Something like $1;$p????  Will this even work?

Last edited by Micro420; 10-25-2007 at 10:32 PM.
Old 10-26-2007, 12:44 AM   #3
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
The logs are html webpages? It may be better to use tar to archive them. An HTML file will contain a header and tags, so concatenating a number of them together wouldn't produce a very useful output. You're description of the filename format for the logs can only match a single log file a day in the same directory, so merging logs from the same day conflicts with the format you gave unless you have a number of them each in it's own subdirectory.

You could use the find command to select the log files matching the filename pattern and that are over 24 hours old. ( using -iname or -iregex and -ctime ). The output of the find command could be used as the source of filenames for the tar command.
find ~/.pidgin/logs/ -iregex "/home/<yourusername>/.pidgen/[12][0-9][0-9][0-9]-[0-9][0-9]*-[0-9][0-9]*.html" -ctime +1 | xargs cf pidgenlog-$(date +'%f').tar
.pidgen/[12][0-9][0-9][0-9]-[0-9][0-9]*-[0-9][0-9]*.html" -ctime +1 | rm
I've guessed on the base location of the logs.

Also, where are these logs. Are they logs of conversations located in your home directory?

If the html logs are xhtml compliant, an xslt translation could extract the raw data from a number of files, which could be reconstituted to a single file. You could possible do the same thing using sed, but you would need to examine the source of the html files to determine how to extract the information you want saved.

When I was first responding to the post, I didn't realize that pidgin was the new name of gaim and thought that you meant pigeon as an adjective meaning a mess.

Last edited by jschiwal; 10-26-2007 at 01:04 AM.
Old 10-27-2007, 11:02 PM   #4
LQ Newbie
Registered: Oct 2007
Posts: 6

Original Poster
Rep: Reputation: 0
I hadn't thought about the HTML tags. Since I don't really have a choice in the existing ones being in HTML, the script would have to be something like:

## Some kind of conditional to find if there's more than one for a given matching day
## Remove the final line from the first file and the first line from every other one that matches day
## Append files to the first one for that date in descending order

Is there a way to use find, locate or something similar to return the number of results rather than the results themselves? The only other way I could come up with would be to write a bunch of conditionals centered around something like this:

find $Year-$Month-$Day*.html >> temp
wc -l temp
Old 10-27-2007, 11:46 PM   #5
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
You have the right idea.
find /dir -iname "$Year-$Month-$Day*.html" | wc -l.

However, there may be more tags to deal with. Simply stripping of the first and last line may not do it.
You can use sed to delete the first and last line:
sed '1d;$d' file >temp

echo -e "line1\nline2\nline3\nline4" | cat -n
echo -e "line1\nline2\nline3\nline4" | sed '1d;$d'
You really need to study what the logs look like. Open the source in an editor. If they haven't had all of the extra white space stripped to reduce the filesize, it may be trivial to extract the body and construct a composite html page; or to produce a text log file. On the other hand, if they aren't simple you may need to resort to xml tools to translate the file into another form.

Here is a simple script I wrote to produce very simple html pages from txt files. I put it in my ~/bin/ directory.
sed -e '1i\
<?xml version="1.0" encoding="UTF-8"?>\
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">\
<html xmlns="">\
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\
<meta name="Generator" content="Kate, the KDE Advanced Text Editor" />\
<pre>' -e '$a\
</body> ' ${1} >${1%.txt}.html
I simply converted a text file to an html page in the Kate program and then used that as a model for a 2 command sed program.

Particular tags may identify the information you want to extract.
I will do this to extract the file names I have backed up using the K3b program. The *.k3b project file is actually a zip file, which are XML files. It is a simple matter to use sed in a one liner ( although a long one with all of the pipes "|" ) to extract only the file names, convert \n to \000 and then pipe the file list to "| xargs -0 rm". In both of these examples, you need to study the format that you are working with. Especially when working with regular expressions; you need to be careful and exact.

Last edited by jschiwal; 10-28-2007 at 08:33 PM. Reason: corrected sed command


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script to put log files into single file and email DragonM15 Programming 13 11-08-2007 03:27 AM
Bash script for server log (namely var/log/messages) tenaciousbob Programming 17 05-24-2007 10:43 AM
gzipped log backups: clean up phats_O Debian 4 05-14-2005 12:39 AM
How to clean up /var/log/messages ? JaBa Linux - General 10 10-30-2004 12:32 AM
How i can Clean up the log file of proxy? AZIMBD03 Red Hat 4 10-10-2003 08:27 AM > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:32 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration