LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Using grep to filter phrase before a space (https://www.linuxquestions.org/questions/linux-general-1/using-grep-to-filter-phrase-before-a-space-710417/)

ShanxT 03-10-2009 01:57 AM

Using grep to filter phrase before a space
 
Ok, I downloaded the lastest KNOPPIX cd, and thought it would be cool to do a md5sum on it, even though bittorrent ensures the checksums are verified.

So after doing a md5sum and saving it to "sum1", I do a
Code:

diff KNO*.md5 sum1
and it isn't the same. Simply because the md5 checksum file that comes with knoppix looks like this, :
Code:

d642d524dd2187834a418710001bbf82 *KNOPPIX_V6.0.1CD-2009-02-08-EN.iso
and "sum1" looks like this:
Code:

d642d524dd2187834a418710001bbf82 KNOPPIX_V6.0.1CD-2009-02-08-EN.iso
Notice the missing asterisk.

So how do I use grep, to tell it to take the data before a space occurs, so that in both cases it'll only take the checksum value, and not the *KNOP.. or KNOP..?

It obviously isn't a critically important question, I'm just learning my way around the commands.. So just playing around to see what's possible.. Can anyone help?

syg00 03-10-2009 02:24 AM

Have you considered adding the aster to sum1 (given that you created that file) ?. Saves creating new files to diff, and the regex for (that) using sed will be much simpler.

indeliblestamp 03-10-2009 02:27 AM

That asterisk comes because md5sum has been run with the '-b argument. Since you already have the md5 checksum file, you can directly verify the iso file by running
Code:

md5sum -c KNO*md5
And it should give you something like KNOPPIX_V6.0.1CD-2009-02-08-EN.iso: OK

SkyEye 03-10-2009 02:53 AM

You can use cut command to get only the text you want from the files. However it will not solve your problem as diff only takes files as parameters (and STDIN by using "-" parameter).

Eg:
$ cut -d" " -f1 KNO*.md5

The above command will give you the output of d642d524dd2187834a418710001bbf82

What it does is, take the the first column (-f1) of the file (KNO*.md5) where column delimiter (-d) is a space (" "). The command could be alternatively written $ cut -d\ -f1 KNO*.md5 (notice the extra space after the "\")


If you want to do more stuff with this, better to look into awk too.

ShanxT 03-12-2009 12:48 PM

syg00! Yes, I know I can manually add the asterisk! hehehe.. I'm just learning to manipulate the commands in linux, so wanted to know if it's possible to get a perfect diff without changing it manually..

@arungoodboy
I didn't know md5sum would check it automatically.. Thanks! I thought I would always have to at least visibly compare checksums.. Makes life much easier.. Who says linux isn't user-friendly?? :)

@SkyEye
Yes! That's exactly what I wanted! Thank you! And yes, you're right.. diff needs two file names as parameters.. thanks for pointing that out. Although I shall look into the 'cut' command. Looks very interesting.. I'm guessing in "cut -d\ -f1 KNO*.md5", the backslash is the escape character. I think 'grep' and 'cut' can work quite well with each other..
I started learning python mainly for scripting purposes, but I realised it's much more powerful than I thought.. I'll finish that first, but I'll definitely look into awk and sed...

SkyEye 03-13-2009 01:20 AM

@ShanxT, Yes the backslash is the escape character. And about sed and awk, it's be great to learn them. I never got to learn sed and awk much since when my scripting needs got complicated I switched to Ruby. I'm a happy camper as there's an awful lot of SysAdmin tools in Ruby (probably the best collection). Glad to know you are learning Python too.


All times are GMT -5. The time now is 10:40 PM.