LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help with the cut command..or should I be using the grep command? (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-the-cut-command-or-should-i-be-using-the-grep-command-4175458584/)

ProAm500 04-18-2013 02:46 AM

Need help with the cut command..or should I be using the grep command?
 
Complete newb to linux and I'm working on some simple tasks using commands and so forth. I've been banging my head against the wall for a day a half with this last problem and hoping I someone can provide some help.I'm working with 2 files each containing some lines of text.
File A contains:
Code:

cat ClassA
Smith, Joe    SR CIS
Smith, Karen  JR CS
Clark, Sue    SR CIS
Brown, Steve  JR CIS
Duck, Donald  SR CIS
Mouse, Mickey  SR CIS
Hall, Maureen  JR CS
Simon, Sam    JR CIS
Wells, John    JR CS

File B contains:
Code:

cat ClassB
Jones, Susan  JR CS
Smith, Joe    SR CIS
Smith, Karen  JR CS
Clark, Sue    SR CIS
Brown, Steve  JR CIS
Duck, Donald  SR CIS
Johnson, Bob  SR CS
Hall, Maureen  JR CS
Simon, Sam    JR CIS

The task is to in one command line "Print just the names of the Junior CIS majors that are enrolled in both A and B. (They must be enrolled in both classes to be on the list. Only print their name once)"

I'm not exactly sure how to basically match the field that would just contain JR AND CIS but not display duplicate names. I think I'm supposed to use the cut command or grep then pipe into cut...I'm not exactly sure where to start. Any help that can at least get me going in the right direction would be great. Thanks.

druuna 04-18-2013 03:06 AM

I would use grep (have a look at the -f option) to get all the similar lines and pipe it through awk to get the JR CIS parts and print their name.

Zzipo 04-18-2013 03:28 AM

Another option, without using awk.

Is like in three steps.

For each file:

First: (grep) create a new file (file1new and file2new) based on the deletion of lines that don't contain "JR CIS" -> O(n)

Second: (sort) the lines based on the name (first column) -> O(nlog(n))

Third: (comm -1 -2 file1new file2new) print just the common lines. -> O(n)


So, O(nlog(n)) finally.

catkin 04-18-2013 05:40 AM

How about grepping both files for "JR CS" and using uniq (with the magic --repeated option)?

In essence that works but needs a little refinement. The most elegant I got was three commands in a pipeline with no temporary files.

Zzipo 04-18-2013 06:10 AM

I didn't know about uniq.

But, I don't know how can you do it without sorting before.

I reached this:
Code:

grep "JR CIS" -h test1file.txt test2file.txt | sort | uniq -d
But (uniq -d) without doing first (sort) doesn't work.

Ok, this sequence is also in 3 steps and also O(nlog(n)).

Is there any way to do it in just two steps?
Because you said "grep and uniq".

Ah, when you said "JR CS" I think it was a typo by "JR CIS".

shivaa 04-18-2013 06:15 AM

Do you want something like this?
Code:

~$ grep 'CIS' fileA | grep 'JR' | cut -d" " -f1,2

druuna 04-18-2013 06:25 AM

Well, I thought this looked like homework and provided a hint in my previous post.

Now that I see multiple full examples, have a look at this:
Code:

grep -f fileA fileB | awk '/JR CIS/ { print $1, $2 }'

ProAm500 04-18-2013 12:52 PM

thanks guys I'll give them all a shot. And yes it is a question as past of an assignment, but like i said this is the last problem on it, and i've been agonizing over it for almost 2 days lol..I'm still a complete beginner to linux shell commands, but I'm getting it, and have some fun with it too.

chrism01 04-18-2013 07:37 PM

Here are some good links to bookmark & read
http://rute.2038bug.com/index.html.gz
http://tldp.org/LDP/Bash-Beginners-G...tml/index.html
http://www.tldp.org/LDP/abs/html/

eklavya 04-19-2013 05:23 AM

I know you got your solution but it seems an interesting problem, that's why I tried it.
You said you are trying with grep & cut so I tried using that.
Code:

grep -f ClassA ClassB | grep "JR.* CIS" | cut -d' ' -f1,2

ProAm500 04-20-2013 04:48 PM

Thanks guys. I tried pretty much all your guys solutions and different variations of them and for most them I kept getting the same answer in some form
Code:

Brown, Steve
Simon, Sam

I believe the answer should be something like or at least in some form......
Code:

Smith, Joe    SR CIS
Clark, Sue    SR CIS
Brown, Steve  JR CIS
Duck, Donald  SR CIS
Simon, Sam    JR CIS

My instructor handed it back and told me to give it another shot. Her hint was "so start with sorting the files, you need two grep commands to pull in only CIS and JR, you should only count the records once (the unique command) and the use the cut command to list only the names". I'm going to keep playing around with it and see what I get.

shivaa 04-20-2013 11:17 PM

For searching unique JR CIS, try this:
Code:

~$ grep 'JR' classA classB | grep 'CIS' | cut -d':' -f2 | sort -u
OR
~$ grep 'JR CIS' classA classB | cut -d':' -f2 | sort -u

Although same could be done easily using other commands like awk, but as per your instructor, you should try this.

ProAm500 04-30-2013 12:54 PM

Just to wrap this up and put a bow on this topic. I used the following command and am still not 100% sure its right, but I got full credit so it works for me for now, lol. I'm working on the second part of the assignment and will have some questions so be prepared..thanks for the help guys!!

Code:

grep 'JR CIS' ClassA ClassB| cut -d':' -f2 | sort -u

danielbmartin 04-30-2013 04:35 PM

This awk ...
Code:

awk 'NR==FNR{if ($0~"JR CIS") a[$1$2];next}
  $1$2 in a{print $1,$2}' $InFile1 $InFile2 >$OutFile

... produces this result ...
Code:

Brown, Steve
Simon, Sam

... and this awk ...
Code:

awk 'NR==FNR{if ($0~"JR CIS") a[$1$2];next}
  $1$2 in a{print}' $InFile1 $InFile2 >$OutFile

... produces this result ...
Code:

Brown, Steve  JR CIS
Simon, Sam    JR CIS

Daniel B. Martin

ProAm500 04-30-2013 05:42 PM

Quote:

Originally Posted by danielbmartin (Post 4942192)
This awk ...
Code:

awk 'NR==FNR{if ($0~"JR CIS") a[$1$2];next}
  $1$2 in a{print $1,$2}' $InFile1 $InFile2 >$OutFile

... produces this result ...
Code:

Brown, Steve
Simon, Sam



... and this awk ...
Code:

awk 'NR==FNR{if ($0~"JR CIS") a[$1$2];next}
  $1$2 in a{print}' $InFile1 $InFile2 >$OutFile

... produces this result ...
Code:

Brown, Steve  JR CIS
Simon, Sam    JR CIS

Daniel B. Martin

thats what my command got


All times are GMT -5. The time now is 01:07 PM.