cut column

ust · 06-27-2012, 06:02 AM

I have a file the content is as below

"""" aaa bbb ccc ddd"""
"""" xxx yyy xxx 444"""
"""" ooo ttt sss uuu"""

can advise if I want to cut the column of that delimited by " and space , the output is as below .

output
======
ddd
444
uuu

can advise what can i do ?

thx

sycamorex · 06-27-2012, 06:26 AM

One way would be to print only column 5 with awk (ddd""") and then remove quotes with eg. cut/sed.

dru8274 · 06-27-2012, 06:43 AM

I think this might do the trick.

Code:

$ cat data.dat 
"""" aaa bbb ccc ddd"""
"""" xxx yyy xxx 444"""
"""" ooo ttt sss uuu"""
$ sed -n 's/^.* \([^ "]\+\)".*$/\1/p' data.dat 
ddd
444
uuu

ust · 06-27-2012, 08:55 PM

thanks reply,

May be I was not express my requirement clearly and sorry to I would like to change a bit requirement .

I have a file as below ( eg. master.txt) , it delimited by " and space

#vi master.txt
""""""""" aaa bbb ccc ddd eee " " "
""""""""" xxx yyy zzz mmm ooo " " "
""""""""" ggg hhh iii jjj kkk " " "

I also have a set of files (eg. file1.txt , file2.txt ... ) .

what I would like to do is to list the lines that do not appear the fourth column ( ddd , mmm , jjj ) of master.txt in any files, that mean search ddd , mmm , jjj in all .txt file ,if do not appear , then output the line.

Assume mmm , jjj is not appear in any .txt file , then output the below result.

""""""""" xxx yyy zzz mmm ooo " " "
""""""""" ggg hhh iii jjj kkk " " "

can advise how can I do ?

thanks

dru8274 · 06-28-2012, 01:40 AM

Okay then, this might work. As I understand you, for each line in master.txt, I first need to find the 4th field, and then to check if it appears in any of the other files of file*.txt. And finally, if it doesn't appear in file*.txt, then we can print that line, elsewise discard.

I wasn't crystal clear though... so try this small script, and then check the contents of output1.txt and output2.txt. They have grepped the files in file*.txt slightly differently, and hopefully one of them will provide the output that you want...

Code:

shopt -s nullglob
rm output*.txt 2>/dev/null

# Each line in master.txt is read and parsed individually

while IFS= read SHEEP ; do

    # let COWS equal just the fourth field of the current line.

    COWS="$(echo $SHEEP | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d' ' )"

    # If $COWS does not appear in any part of any files 
    # in file*.txt, then add current line  to output1.txt

    grep -q "$COWS" file*.txt || echo "$SHEEP" >>output1.txt

    # If $COWS does not appear in the fourth field of any file
    # in file*.txt, then add the current line to output2.txt

    cat file*.txt | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d ' ' \
        | grep -q "$COWS" || echo "$SHEEP" >>output2.txt

done < master.txt

pan64 · 06-28-2012, 03:02 AM

I do not like those pipe chains, like echo|sed|cut or cat|sed|cut|grep, please try to simplify it.

Code:

#instead of
#COWS="$(echo $SHEEP | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d' ' )"
COWS="$(echo $SHEEP | sed  ' s/^[ "]*\([^ ]\+ \+\)\{3\}\([^ ]\+\) \+[^ ].*$/\2/ '"

#also instead of
# cat file*.txt | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d ' ' | grep -q "$COWS" 

grep -q '^[ "]*([^ ]+ +){3}'"$COWS"' +[^ ].*$' file*.txt || echo ...

this is not really tested, so maybe won't work, but "theoretically" it should

dru8274 · 06-28-2012, 07:03 AM

Thank you Pan64, I've added your suggestions to the script. Nice compact regexps. So, version 2 of this script...

Code:

shopt -s nullglob
rm output*.txt 2>/dev/null

# Each line in master.txt in read and parsed individually

while IFS= read SHEEP ; do

    # let COWS equal just the fourth field of the current line.
    COWS="$(echo $SHEEP | sed -n 's/^[ "]*\([^ ]\+ \+\)\{3\}\([^ ]\+\) \+[^ ].*$/\2/p')"

    if [[ $COWS != "" ]] ; then

        # If $COWS does not appear in any part of any files in file*.txt, then add to output1.txt
        grep -q "$COWS" file*.txt || echo "$SHEEP" >>output1.txt

        # If $COWS isn't found in the fourth field of any file in file*.txt, then add to output2.txt
        grep -q '^[ "]*\([^ ]\+ \+\)\{3\}'"$COWS"' \+[^ ]\+[ "]\+$' file*.txt || echo "$SHEEP" >>output2.txt
    fi

done < master.txt

Happy with solution ... mark as [SOLVED]
If you want to say thanks => click the "Add to Reputation" button.

dru8274 · 07-02-2012, 10:42 PM

You have provided some extra details, as reposted from here

Quote:

Originally Posted by ust

I have a master file , the content is as below.
#vi master.txt
"d"""""""" aaa bbb ccc ddd eee " " "
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "

I also have some text files ,
eg.
#vi file1.txt
aaa
#vi file2.txt
bbb
#vi file3.txt
ccc
#vi file4.txt
ddd

I would like to check if the fourth column ( eg. ddd mmm jjj ) is exist in these text files , if not exist , then output this line , in this case , output the below ( as ddd exists in file4.txt )
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "

The requirement is check
1) after 9 " , and
2) the 4th column of the string that delimited by space

I'm pretty sure this cannot be done with a simple one-liner, some kind of loop is needed. So I'm gonna give this one more shot, elsewise someone may have a better answer.

Firstly, some commands to create the master.txt and file*.txt files from your example. And a short while-done loop that compares the 4th field, and only prints the current line if absent in file*.txt. It should work.

Code:

cat >master.txt << EOF
"d"""""""" aaa bbb ccc ddd eee " " "
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "
EOF

echo "aaa" >file1.txt
echo "bbb" >file2.txt
echo "ccc" >file3.txt
echo "ddd" >file4.txt

while IFS= read SHEEP ; do

    # Find the 4th field in the current line
    COWS="$(echo $SHEEP | sed -n 's/^\([^"]*"[^"]*\)\{8\}"\( \+[^ ]\+\)\{3\} \([^ ]\+\) .*$/\3/p' )"

    # Print the current line, only if 4th field isn't found in file*.txt
    [[ $COWS != "" ]] && ! grep -q "$COWS" file*.txt && echo "$SHEEP"

done < master.txt

grail · 07-02-2012, 11:02 PM

Working on the new information, how about something like:

Code:

awk 'FILENAME != "master.txt"{f[$0];next}!($5 in f)' file*.txt master.txt