Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi all,
This is my first post to the forum. I am very happy to see your contribution. I am eager to become part of it.
I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below
(please also see attached figure if the following table is confusing)
File 1 File 2
----------------- -------------------
Time sig_name sig_val Time sig_name sig_val
Given the two files in the above format, how can i print out the following table from "efficient" file comparison. Efficiency is required as file size is over 3GB
No there is no option to do SQL.
I believe this should be pretty straight with grep, sed and awk?
Any expert's opinion?
Yes, there IS an option to use SQL. If you have the data in a text file, write a quick little program to read it, and import it into a MySQL (or whatever) database/table. Perform your operations on it there. That's the quickest way to do it. You *CAN* stick something together with grep, etc., but for 3GB files, it'll take a good bit of time to work, although the script itself may be simple.
If this is something you have to do every now and then, writing the import routine would be the best option. Plus, you then have more flexibility in what you can look at, and how. If it's SQL, you can hit it with Excel/OpenOffice, get charts/graphs, and do ad-hoc reporting very easily.
Not necessarily. Maybe he does not have administrative rights to install a SQL server (or any other software for that matter). It may not be cost effective for them to install SQL support this single task. There could be any number of reasons why SQL is not an option.
I do agree that pipelining commands will take a long time given that the files are 3GB in size. In that case "efficiency" is really meaningless.
I would suggest writing a custom program for this if runtime, memory usage, etc. is a primary concern.
I don't want to write a standard-utility-using script because I see some potential headaches in implementing it, and I don't like headaches. That said, this simple pipeline should give a starting point for a script. It removes all the lines that are identical--leaving one line per file where there is a difference.
The script then needs to parse the results two lines at a time, identify the signal with the inconsistency, add 1 to a counter for that signal, record the time the inconsistency occurred, and print out the fancy report. Parsing gives me a headache and associative arrays give me a headache
I'd go the SQL route myself, though, for the aforementioned reasons. Even if the OP doesn't have admin rights, an administrator can create one database in MySQL, and grant full access to one user, so the mysql statements can work, even if you're just a 'regular' user, and let them get things done.
I think it's the best way to go, since you could then have lots of flexibility in your reporting and access. Even a Windows user could hit it with the ODBC driver from Excel, and perform their own reports. But, this is up to you to figure out....it has to be what you are comfortable with.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.