Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
this still won't do. I also had some problems to understand what the OP exactly wants. The panorama description is indeed helpful. Try running your command with the sample data from post #11. If you merge file1 and file2 the output should be
Code:
mercury
uranus
venus
jupiter
earth
mars
jupiter
saturn
uranus
neptune
mars
@bonzer21: What is wrong with the script in post #11? I tested it with the sample data and it seems to perform OK. In order to work with the new data I adjusted some quoting issue.
-W NUM --width=NUM
Output at most NUM (default 130) print columns.
-- excerpt from man diff, q.v.
If that does help you solve the problem, then it would be useful if you were to post the complete output you are expecting from the merging of the 2 sample log files ... cheers, makyo
@bonzer21:
One issue that I noticed with my script is that you will have to make sure that your logfiles won't have any leading or trailing empty lines. Otherwise they will just be concatenated. So you might have to preprocess them. Probably best to make sure that there are no blank lines at all.
Your script works exactly as required crts - it reconstructs the original log file. Thank you
Although minor, I know from past experience that wc's output is less awkward (gives no filename) when its input is stdin, avoiding the need for awk altogether.
Here's the whole story including crts's script, the input files and the final output:
Code:
$ cat script.sh
#!/bin/bash
# invoke as ./script.sh fileA fileB
count=0
lastOccurence=$(grep -n "$(head -n 1 ${2})" "$1" | sed -nr '$ {s/^([0-9]*):.*/\1/;p}')
while read line
do
if [[ $(grep -n "$line" "$1" | sed -nr '$ {s/^([0-9]*):.*/\1/;p}') == $lastOccurence ]]; then
(( count++ ))
(( lastOccurence++ ))
else
(( lastOccurence-- ))
break
fi
done < "$2"
if [[ $(wc -l < "$1") == $lastOccurence ]]; then
sed -e "1,$count d" "$2" >> "$1"
else
cat "$2" >> "$1"
fi
$ cat log1.txt
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /favicon.ico" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /favicon.ico" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /style.css" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /info.html" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/logo.gif" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
$ cat log2.txt
987.654.321.098 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /style.css" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /info.html" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/logo.gif" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
$ ./script.sh log1.txt log2.txt
$ cat log1.txt
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /favicon.ico" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /favicon.ico" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /style.css" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /info.html" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/logo.gif" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
Tinkster - your diff/awk worked fine until it saw space! I'd adjust it myself but I'm not that well versed in awk
Tinkster - your diff/awk worked fine until it saw space! I'd adjust it myself but I'm not that well versed in awk
Heh. Fair enough. I wouldn't have gone down that alley
if I had read through the entire thread first, and seen
your actual log data rather than planet names ;}
Here's a version with 'diff - only'
Code:
$ diff --old-line-format='%L' --new-line-format='%L' --unchanged-line-format='%L' -W 200 log1.txt log2.txt
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /favicon.ico" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /favicon.ico" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /images/head.jpg" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /style.css" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /info.html" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/logo.gif" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
I had no idea diff could do this until today - reading
the man-page in a desperate attempt to find a separator
I could output for use w/ awk (so the spaces weren't an
issue for awk's defaults).
Fair enough, too. Did bonzer21 specify what he'd like to
happen in that case?
Cheers,
Tink
Nope, he did not. We also do not know how many log files are to be merged and if he made sure that every logs tail matches the next ones head. If he did the I'd say that the diff solution is definitely the elegant way to go.
True that ... another "stuff-up" possibility that wasn't
discussed is what happens if user A has a habit of revisiting
pages in the same sequence, over and over again, and that
gets split into several files ... ;}
Logs w/o timing info seem quite pointless, really.
Rest assured if I'd set this system up it would be timestamped to the millisecond, as it happens I am applying approximate times to the logs I'm joining. They're not perfect, but they'll give an idea of relative times between each request.
I'll bring the example home a little more - the system I'm working with is embedded and proprietary, I can't view the whole log (I doubt the device even keeps the whole log), but I can view the last few ~70ish entries of it at any given time. It's as if the log were scrolling up like the credits of a movie but with varying speed, depending on how busy the web server is, and once an entry has scrolled off the top it's gone forever. In an effort to produce a much more useful, browsable log, I am dumping each "screen" of data every few seconds to text files. To avoid missing any entries, I'm capturing at a rate that is faster than the log is every likely to scroll, which means that the text files I'm creating often heavily overlap.
As Tinkster pointed out, there will be times when there are duplicate entries in the log. If this happens to be the end of one capture and the start of another, it will be impossible for me to determine which are part of the overlap and which are new entries. Splitting my log example in a different place demonstrates this problem:
First capture:
Code:
123.456.789.012 - "GET /info.html" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /images/logo.gif" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Second capture:
Code:
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
123.456.789.012 - "GET /" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
123.456.789.012 - "GET /style.css" HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-gb)"
There's no way to tell how many times the entry 987.654.321.098 - "GET /" HTTP/1.1" 200 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" would have appeared in the original log - it could be two, three, or four times. In such a case, I am forced to make an assumption. I'll opt for the lesser value.
crts - a very valid point. I was impressed to see that diff can do the job all by itself in this example, but I have been mindful that it is more of a comparison tool and you correctly show where it would slip up.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.