Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I also have a set of files (eg. file1.txt , file2.txt ... ) .
what I would like to do is to list the lines that do not appear the fourth column ( ddd , mmm , jjj ) of master.txt in any files, that mean search ddd , mmm , jjj in all .txt file ,if do not appear , then output the line.
Assume mmm , jjj is not appear in any .txt file , then output the below result.
Okay then, this might work. As I understand you, for each line in master.txt, I first need to find the 4th field, and then to check if it appears in any of the other files of file*.txt. And finally, if it doesn't appear in file*.txt, then we can print that line, elsewise discard.
I wasn't crystal clear though... so try this small script, and then check the contents of output1.txt and output2.txt. They have grepped the files in file*.txt slightly differently, and hopefully one of them will provide the output that you want...
Code:
shopt -s nullglob
rm output*.txt 2>/dev/null
# Each line in master.txt is read and parsed individually
while IFS= read SHEEP ; do
# let COWS equal just the fourth field of the current line.
COWS="$(echo $SHEEP | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d' ' )"
# If $COWS does not appear in any part of any files
# in file*.txt, then add current line to output1.txt
grep -q "$COWS" file*.txt || echo "$SHEEP" >>output1.txt
# If $COWS does not appear in the fourth field of any file
# in file*.txt, then add the current line to output2.txt
cat file*.txt | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d ' ' \
| grep -q "$COWS" || echo "$SHEEP" >>output2.txt
done < master.txt
Last edited by dru8274; 06-28-2012 at 02:12 AM.
Reason: fixup
Thank you Pan64, I've added your suggestions to the script. Nice compact regexps. So, version 2 of this script...
Code:
shopt -s nullglob
rm output*.txt 2>/dev/null
# Each line in master.txt in read and parsed individually
while IFS= read SHEEP ; do
# let COWS equal just the fourth field of the current line.
COWS="$(echo $SHEEP | sed -n 's/^[ "]*\([^ ]\+ \+\)\{3\}\([^ ]\+\) \+[^ ].*$/\2/p')"
if [[ $COWS != "" ]] ; then
# If $COWS does not appear in any part of any files in file*.txt, then add to output1.txt
grep -q "$COWS" file*.txt || echo "$SHEEP" >>output1.txt
# If $COWS isn't found in the fourth field of any file in file*.txt, then add to output2.txt
grep -q '^[ "]*\([^ ]\+ \+\)\{3\}'"$COWS"' \+[^ ]\+[ "]\+$' file*.txt || echo "$SHEEP" >>output2.txt
fi
done < master.txt
Happy with solution ... mark as [SOLVED]
If you want to say thanks => click the "Add to Reputation" button.
You have provided some extra details, as reposted from here
Quote:
Originally Posted by ust
I have a master file , the content is as below.
#vi master.txt
"d"""""""" aaa bbb ccc ddd eee " " "
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "
I also have some text files ,
eg.
#vi file1.txt
aaa
#vi file2.txt
bbb
#vi file3.txt
ccc
#vi file4.txt
ddd
I would like to check if the fourth column ( eg. ddd mmm jjj ) is exist in these text files , if not exist , then output this line , in this case , output the below ( as ddd exists in file4.txt )
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "
The requirement is check
1) after 9 " , and
2) the 4th column of the string that delimited by space
I'm pretty sure this cannot be done with a simple one-liner, some kind of loop is needed. So I'm gonna give this one more shot, elsewise someone may have a better answer.
Firstly, some commands to create the master.txt and file*.txt files from your example. And a short while-done loop that compares the 4th field, and only prints the current line if absent in file*.txt. It should work.
Code:
cat >master.txt << EOF
"d"""""""" aaa bbb ccc ddd eee " " "
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "
EOF
echo "aaa" >file1.txt
echo "bbb" >file2.txt
echo "ccc" >file3.txt
echo "ddd" >file4.txt
while IFS= read SHEEP ; do
# Find the 4th field in the current line
COWS="$(echo $SHEEP | sed -n 's/^\([^"]*"[^"]*\)\{8\}"\( \+[^ ]\+\)\{3\} \([^ ]\+\) .*$/\3/p' )"
# Print the current line, only if 4th field isn't found in file*.txt
[[ $COWS != "" ]] && ! grep -q "$COWS" file*.txt && echo "$SHEEP"
done < master.txt
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.