Bash Script Help Request
I am new at writing bash scripts. Would anyone please assist me in writing one? I would like to compare two files and determine if lines are not present in one of them but are in the other. The lines are not sequential and I would like to ignore the first word in each line.
Example: FileA: Code:
created test1 Code:
same test1 Code:
grep -Fvf FileB FileA Code:
created test1 Code:
created test3 Thanks! |
A quick search on the interwebs for "linux diff specific column" came up with this answered question:
http://unix.stackexchange.com/questi...lumn-in-a-file With a little tweaking, the top answer solves your query, but I'll let you work out what that tweaking is. ;-) |
https://duckduckgo.com/?q=files+diff...xquestions.org
Ask if you have issues applying any solution from that search. |
Quote:
Also the columns in the text file I want to compare have a space before them, will that affect what is considered a column? Thanks for the help. Code:
$ awk 'NR==FNR{c[$2]++;next};c[$2] == 0' filea fileb |
A search for "awk comparing two files" came up with this interesting YouTube video looking at how to use FNR and NR:
https://www.youtube.com/watch?v=hnT4WTz9dR8 Given that your real-life date isn't the same as the example you gave, perhaps you should show some sample real-life data, anonymising any sensitive data? |
The columns are (white-)space separated; there MUST be a space.
The first given link tries to explain the code. It also tells to run it twice. Code:
awk 'NR==FNR {c[$2]++; next}; !($2 in c)' filea fileb The lookup is an implicit if clause, and no { action } means an implicit { print }. -- Please give an example of a not working input! |
Quote:
However, given that there is added complexity to the OP's real-life data which wasn't reflected in the example he/she gave, the real solution will probably prove to be slightly more complex itself. |
I really appreciate your patience with me. I have pasted below the actual sample of data from files being compared. I should have done so to begin with, I apologize! I have posted the results below after executing the awk command.
FILEA Code:
create d 775 1000/63 11030238 share/ Code:
create d2775 1000/1016 4096 share/limited/docs Code:
create 775 1000/63 23 share/recipes.txt Code:
awk 'NR==FNR{c[$2]++;next};c[$2] == 0' fileb filea |
Before everything else, qombi, the "d" and "d2" could prove difficult. They appear to be part of a column that doesn't always have data, thus meaning for example that in
Code:
create d 775 1000/63 11030238 share/ More than that, the first line in FileB appears to have no space on the first line between the d2 and the 775. The first problem with data comparison like this is often to clean the data up and ensure that it is consistent. We can get around the variable columns issue if the 775 always starts on the same character position on each line - is this the case? Or, if the only thing that changes between a corresponding line for the same file in FileA and FileB is "create" and "same", then we can carry out a sed before doing the comparison. Is the filename always unique and the rest of the line static (apart from the create/same), so that we can just compare the filenames and forget the rest? In other words, how much do we really need to check from corresponding lines to be sure that they are the same or that one doesn't exist? |
Quote:
Quote:
Code:
775 Code:
same Code:
pool Code:
create Code:
create d Code:
create Code:
pool Code:
same Quote:
Quote:
I would like to compare fileb to filea and learn if any bolded information below has changed. filea Code:
Code:
Code:
create d 775 1000/63 11030238 share/ |
so you don't mind the first 9 chars. In that case you need to remove them, because that awk script works on full lines and also cannot really handle that d (if exists or missing).
As a simple solution you can use the command cut: Code:
cut -c 10- input_file > output_file |
Given that the file permissions start at the same character position (offset 10) each time, and you want to compare everything else in the line starting at that point, the original awk can be modified to work on a substring of each line:
Code:
awk 'NR==FNR{c[substr($0,10)]++;next};c[substr($0,10)] == 0' fileb filea |
do you mean substr($0, 10, 3) ?
|
Quote:
|
Quote:
|
All times are GMT -5. The time now is 09:55 PM. |