Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I have a come a bit late to this party but would like to submit an option that seems to work with the data from post #22 (ie I get the same output):
Code:
awk '!f{getline line < "file2";f=1}$0 == line{f=0}1;END{while(getline < "file2")print}' file1
Hi,
using the simplified data from post #25, the command produces
Code:
mars
jupiter
saturn
neptune
deimos
but in this case file2 should just have been appended to file1. As the OP states, the logs will overlap most of the time. But there is still the possibility that of 'clean' splits, i.e. no overlapping tails/heads.
So any solution should be able to 'recognize' if there was a clean split; if the clean split can be identified as such. Due to the nature of the problem this won't be always the case and sometimes it would be identifiable.
I'm happy to see that other responders were in sync with the OP and that issue was resolved. I was far out in left field on this ... cheers, makyo
PS For the problem I thought I was solving, the awk code in post # 14 to do uniq without sorting (and a union across many files) can be replaced by the far more succinct:
I also did not understand it at first, but the tail of file1 and the head of file2 in post #25 do not overlap.
Code:
mars
jupiter
saturn jupiter
neptune saturn
uranus
deimos
If they were to overlap then the third item of file2 would have to be neptune. The constellation in file1 and file2, however, indicates a 'clean' split of the logfile.
An example which does qualify as overlap would be
Code:
mars
neptune
jupiter jupiter
saturn saturn
uranus
deimos
Think of it like 'sliding' the data of file2 up until you get the biggest possible overlap of the *tail* of file1 with the *head* of file2.
This is not possible with the data in post #25. If you slide up until jupiter and saturn match, then it would look like this
Code:
mars
jupiter jupiter
saturn saturn
neptune uranus <-- tail of file1 does not match head of file2
deimos
neptune mismatches uranus in this case, hence the tail does not match the head.
When this thread started I also thought that this problem could be solved by a one- or two-liner. Right now, I am not sure about that. I guess, it can be done with awk but an awk script would still need a couple of lines. However, an awk solution *might* yield some advantage regarding execution time.
Hey crts, thanks for the explanation Can see where I was getting lost now although the requirement does seem a little obtuse now (but each to their own)
So it would seem you need to check the reverse line sort of file1 against file2 prior to processing (ugly).
I'll put my thinking cap back on
I'll bring the example home a little more - the system I'm working with is embedded and proprietary, I can't view the whole log (I doubt the device even keeps the whole log), but I can view the last few ~70ish entries of it at any given time. It's as if the log were scrolling up like the credits of a movie but with varying speed, depending on how busy the web server is, and once an entry has scrolled off the top it's gone forever. In an effort to produce a much more useful, browsable log, I am dumping each "screen" of data every few seconds to text files. To avoid missing any entries, I'm capturing at a rate that is faster than the log is every likely to scroll, which means that the text files I'm creating often heavily overlap.
Does the appliance give you shell access? Maybe
you *can* find a better way of handling this ...
awk '!f{getline line < "file2";f=1}line == $0{c++;f=0}c && line != $0{c=0;close("file2")}1;END{close("file2");while(c--)getline < "file2";while(getline < "file2")print}' file1
Close. Try it with file1 and file2 from post #11. This data set resembles the real scenario a bit closer.
file1
Code:
mercury
uranus
venus
jupiter
earth
mars
jupiter
saturn
file2
Code:
jupiter
saturn
uranus
neptune
mars
result
Code:
$ awk '!f{getline line < "file2";f=1}line == $0{c++;f=0}c && line != $0{c=0;close("file2")}1;END{close("file2");while(c--)getline < "file2";while(getline < "file2")print}' file1
mercury
uranus
venus
jupiter
earth
mars
jupiter
saturn
saturn <-- one saturn too much
uranus
neptune
mars
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.