LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 06-27-2012, 06:02 AM   #1
ust
Senior Member
 
Registered: Mar 2003
Location: fasdf
Distribution: Debian / Suse /RHEL
Posts: 1,129

Rep: Reputation: 30
cut column


I have a file the content is as below

"""" aaa bbb ccc ddd"""
"""" xxx yyy xxx 444"""
"""" ooo ttt sss uuu"""

can advise if I want to cut the column of that delimited by " and space , the output is as below .

output
======
ddd
444
uuu

can advise what can i do ?

thx
 
Old 06-27-2012, 06:26 AM   #2
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,535
Blog Entries: 1

Rep: Reputation: 999Reputation: 999Reputation: 999Reputation: 999Reputation: 999Reputation: 999Reputation: 999Reputation: 999
One way would be to print only column 5 with awk (ddd""") and then remove quotes with eg. cut/sed.
 
Old 06-27-2012, 06:43 AM   #3
dru8274
Member
 
Registered: Oct 2011
Location: New Zealand
Distribution: Debian
Posts: 105

Rep: Reputation: 36
I think this might do the trick.
Code:
$ cat data.dat 
"""" aaa bbb ccc ddd"""
"""" xxx yyy xxx 444"""
"""" ooo ttt sss uuu"""
$ sed -n 's/^.* \([^ "]\+\)".*$/\1/p' data.dat 
ddd
444
uuu
 
Old 06-27-2012, 08:55 PM   #4
ust
Senior Member
 
Registered: Mar 2003
Location: fasdf
Distribution: Debian / Suse /RHEL
Posts: 1,129

Original Poster
Rep: Reputation: 30
thanks reply,

May be I was not express my requirement clearly and sorry to I would like to change a bit requirement .

I have a file as below ( eg. master.txt) , it delimited by " and space

#vi master.txt
""""""""" aaa bbb ccc ddd eee " " "
""""""""" xxx yyy zzz mmm ooo " " "
""""""""" ggg hhh iii jjj kkk " " "

I also have a set of files (eg. file1.txt , file2.txt ... ) .

what I would like to do is to list the lines that do not appear the fourth column ( ddd , mmm , jjj ) of master.txt in any files, that mean search ddd , mmm , jjj in all .txt file ,if do not appear , then output the line.

Assume mmm , jjj is not appear in any .txt file , then output the below result.

""""""""" xxx yyy zzz mmm ooo " " "
""""""""" ggg hhh iii jjj kkk " " "


can advise how can I do ?

thanks

Last edited by ust; 06-27-2012 at 09:46 PM.
 
Old 06-28-2012, 01:40 AM   #5
dru8274
Member
 
Registered: Oct 2011
Location: New Zealand
Distribution: Debian
Posts: 105

Rep: Reputation: 36
Okay then, this might work. As I understand you, for each line in master.txt, I first need to find the 4th field, and then to check if it appears in any of the other files of file*.txt. And finally, if it doesn't appear in file*.txt, then we can print that line, elsewise discard.

I wasn't crystal clear though... so try this small script, and then check the contents of output1.txt and output2.txt. They have grepped the files in file*.txt slightly differently, and hopefully one of them will provide the output that you want...

Code:
shopt -s nullglob
rm output*.txt 2>/dev/null

# Each line in master.txt is read and parsed individually

while IFS= read SHEEP ; do

    # let COWS equal just the fourth field of the current line.

    COWS="$(echo $SHEEP | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d' ' )"

    # If $COWS does not appear in any part of any files 
    # in file*.txt, then add current line  to output1.txt

    grep -q "$COWS" file*.txt || echo "$SHEEP" >>output1.txt

    # If $COWS does not appear in the fourth field of any file
    # in file*.txt, then add the current line to output2.txt

    cat file*.txt | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d ' ' \
        | grep -q "$COWS" || echo "$SHEEP" >>output2.txt

done < master.txt

Last edited by dru8274; 06-28-2012 at 02:12 AM. Reason: fixup
 
Old 06-28-2012, 03:02 AM   #6
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,500

Rep: Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221Reputation: 1221
I do not like those pipe chains, like echo|sed|cut or cat|sed|cut|grep, please try to simplify it.
Code:
#instead of
#COWS="$(echo $SHEEP | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d' ' )"
COWS="$(echo $SHEEP | sed  ' s/^[ "]*\([^ ]\+ \+\)\{3\}\([^ ]\+\) \+[^ ].*$/\2/ '"

#also instead of
# cat file*.txt | sed 's/^[ "]*//; s/[ "]*$//' | cut -f4 -d ' ' | grep -q "$COWS" 

grep -q '^[ "]*([^ ]+ +){3}'"$COWS"' +[^ ].*$' file*.txt || echo ...
this is not really tested, so maybe won't work, but "theoretically" it should
 
Old 06-28-2012, 07:03 AM   #7
dru8274
Member
 
Registered: Oct 2011
Location: New Zealand
Distribution: Debian
Posts: 105

Rep: Reputation: 36
Thank you Pan64, I've added your suggestions to the script. Nice compact regexps. So, version 2 of this script...
Code:
shopt -s nullglob
rm output*.txt 2>/dev/null

# Each line in master.txt in read and parsed individually

while IFS= read SHEEP ; do

    # let COWS equal just the fourth field of the current line.
    COWS="$(echo $SHEEP | sed -n 's/^[ "]*\([^ ]\+ \+\)\{3\}\([^ ]\+\) \+[^ ].*$/\2/p')"

    if [[ $COWS != "" ]] ; then

        # If $COWS does not appear in any part of any files in file*.txt, then add to output1.txt
        grep -q "$COWS" file*.txt || echo "$SHEEP" >>output1.txt

        # If $COWS isn't found in the fourth field of any file in file*.txt, then add to output2.txt
        grep -q '^[ "]*\([^ ]\+ \+\)\{3\}'"$COWS"' \+[^ ]\+[ "]\+$' file*.txt || echo "$SHEEP" >>output2.txt
    fi

done < master.txt
Happy with solution ... mark as [SOLVED]
If you want to say thanks => click the "Add to Reputation" button.
 
Old 07-02-2012, 10:42 PM   #8
dru8274
Member
 
Registered: Oct 2011
Location: New Zealand
Distribution: Debian
Posts: 105

Rep: Reputation: 36
You have provided some extra details, as reposted from here
Quote:
Originally Posted by ust View Post
I have a master file , the content is as below.
#vi master.txt
"d"""""""" aaa bbb ccc ddd eee " " "
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "

I also have some text files ,
eg.
#vi file1.txt
aaa
#vi file2.txt
bbb
#vi file3.txt
ccc
#vi file4.txt
ddd

I would like to check if the fourth column ( eg. ddd mmm jjj ) is exist in these text files , if not exist , then output this line , in this case , output the below ( as ddd exists in file4.txt )
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "

The requirement is check
1) after 9 " , and
2) the 4th column of the string that delimited by space
I'm pretty sure this cannot be done with a simple one-liner, some kind of loop is needed. So I'm gonna give this one more shot, elsewise someone may have a better answer.

Firstly, some commands to create the master.txt and file*.txt files from your example. And a short while-done loop that compares the 4th field, and only prints the current line if absent in file*.txt. It should work.

Code:
cat >master.txt << EOF
"d"""""""" aaa bbb ccc ddd eee " " "
"""y""d"""" xxx yyy zzz mmm ooo " " "
""f""""""" ggg hhh iii jjj kkk " " "
EOF

echo "aaa" >file1.txt
echo "bbb" >file2.txt
echo "ccc" >file3.txt
echo "ddd" >file4.txt

while IFS= read SHEEP ; do

    # Find the 4th field in the current line
    COWS="$(echo $SHEEP | sed -n 's/^\([^"]*"[^"]*\)\{8\}"\( \+[^ ]\+\)\{3\} \([^ ]\+\) .*$/\3/p' )"

    # Print the current line, only if 4th field isn't found in file*.txt
    [[ $COWS != "" ]] && ! grep -q "$COWS" file*.txt && echo "$SHEEP"

done < master.txt
 
1 members found this post helpful.
Old 07-02-2012, 11:02 PM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,430

Rep: Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878
Working on the new information, how about something like:
Code:
awk 'FILENAME != "master.txt"{f[$0];next}!($5 in f)' file*.txt master.txt
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
compare second column of a file then print the first column of it in a ne fil if true java_girl Linux - Newbie 2 03-16-2012 04:50 AM
How-to cut specific text from cell and paste into new column ivn Linux - Newbie 5 12-17-2011 08:53 PM
cut column an edit data!!!!! jacky29 Programming 6 03-26-2011 05:53 AM
cut column ust Linux - Newbie 10 01-09-2008 03:28 AM


All times are GMT -5. The time now is 07:51 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration