LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 09-19-2010, 07:57 AM   #1
marietto
Member
 
Registered: Aug 2010
Posts: 96

Rep: Reputation: 17
Bash script to parse the Windows version written inside the license.rtf file


Hello,

Since I'm totally newbie in bash scripting,can someone help me to write a bash script to parse the version of Windows written inside the license.rtf file ? Thanks.

It starts with :

MICROSOFT SOFTWARE LICENSE TERMS

WINDOWS 7 ULTIMATE

These license terms are an agreement between Microsoft Corporation...

So I need to grab the version of Windows,in this case it is :

WINDOWS 7 ULTIMATE

Thanks.

thanks in advance.
 
Old 09-19-2010, 08:04 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
So, what you need is simply the second line? That should be a piece of cake using sed or a similar tool. There are only two difficulties I see that may need to be considered first.

1. Windows text files use a different line ending than unix files. The file would need to be converted before it could be properly parsed. This is easy to do.

2. Rich text is a kind of markup language, similar to html. What you see in the display is not exactly the text the file itself contains. The parsing would have to take this into account.

Can we see the contents of the file as displayed in a regular text editor?
 
Old 09-19-2010, 08:18 AM   #3
marietto
Member
 
Registered: Aug 2010
Posts: 96

Original Poster
Rep: Reputation: 17
I need to grab the string "Windows 7 ULTIMATE" or whatever version it is. It could be

"Windows 7 STARTER"
"Windows 7 HOME BASIC"
"Windows 7 HOME PREMIUM"
"Windows 7 PROFESSIONAL"
"Windows 7 ENTERPRISE"
"Windows 7 ULTIMATE"

or

"Windows Vista HOME BASIC"
"Windows Vista HOME PREMIUM"
"Windows Vista ULTIMATE"
"Windows Vista BUSINESS"
"Windows Vista ENTERPRISE"


I did a screenshot of the license.rtf file :

http://www.flickr.com/photos/2668797...n/photostream/

Last edited by marietto; 09-19-2010 at 08:23 AM.
 
Old 09-19-2010, 08:23 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Ok, that helps. Although it would be more convenient if you could copy the actual text here instead of making me work through a screenshot. We'd only need the text up to the line in question, and maybe one or two lines after it.

I see that the line you want is the third line. I'll work on that for now. But now the question becomes, does every version of the file conform to the same format? How flexible does the extraction parsing need to be?
 
Old 09-19-2010, 08:29 AM   #5
marietto
Member
 
Registered: Aug 2010
Posts: 96

Original Poster
Rep: Reputation: 17
These are the first lines of the file :

{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe2052\deftab360{\fonttbl{\f0\fswiss\fprq2\fcharset 0 Tahoma;}}
{\*\generator Msftedit 5.41.21.2508;}\viewkind4\uc1\pard\nowidctlpar\sb120\sa120\b\f0\fs20 MICROSOFT SOFTWARE LICENSE TERMS\par
\pard\brdrb\brdrs\brdrw10\brsp20 \nowidctlpar\sb120\sa120 WINDOWS 7 ULTIMATE\par
\pard\nowidctlpar\sb120\sa120\b0 These license terms are an agreement between Microsoft Corporation (or based on where you live, one of its affiliates) and $
\pard\nowidctlpar\sb120\sa120\tx0\'b7\tab updates,\par
\'b7\tab supplements,\par
\'b7\tab Internet-based services, and\par
\'b7\tab support services\par
\pard\nowidctlpar\sb120\sa120 for this software, unless other terms accompany those items. If so, those terms apply.\par
\b By using the software, you accept these terms. If you do not accept them, do not use the software. Instead, return it to the retailer for a refund or cre$
\b As described below, using the software also operates as your consent to the transmission of certain computer information during activation, validation an$
\pard\brdrt\brdrs\brdrw10\brsp20 \nowidctlpar\sb120\sa120 If you comply with these license terms, you have the rights below for each license you acquire.\par
\pard\nowidctlpar\sb120\sa120\tx360 1.\tab OVERVIEW.\par
 
Old 09-19-2010, 08:44 AM   #6
marietto
Member
 
Registered: Aug 2010
Posts: 96

Original Poster
Rep: Reputation: 17
whatever version it is,the format is the same,it is always on the third line,regardless of the different length.
 
Old 09-19-2010, 08:59 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Thank you. But please use [code][/code] tags around preformatted text in the future.

If you assume the text you want will always be on the third line, will always be in approximately the same format, and that it will always begin with the word WINDOWS (that's a lot of assumptions!), then this should do it.
Code:
sed -rn -e '1,3 s/.$//' -e '3 s/^.+(WINDOWS[^\]+)\\par$/\1/p' licence.rtf
The first -e expression converts the line endings on the first three lines to unix format, then the second -e extracts everything from the word WINDOWS to the final \par, if it exists, from the third line.

If you need it to be more flexible than that, then you'll have to show us all the possible variations you may encounter.

And here are a few useful sed references:
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt

Edit: After thinking about it a bit, I can make it even shorter. By taking the dos-mode carriage return into account inside the main expression, I can eliminate the first one entirely. This should work no matter which format it's in:
Code:
sed -rn '3 s/^.+(WINDOWS[^\]+)\\par\r?$/\1/p' licence.rtf
I'm still trying to figure out if there'd be an easy way to extract the text if it didn't start with the word "WINDOWS".

Last edited by David the H.; 09-19-2010 at 09:13 AM. Reason: as above
 
1 members found this post helpful.
Old 09-19-2010, 10:09 AM   #8
marietto
Member
 
Registered: Aug 2010
Posts: 96

Original Poster
Rep: Reputation: 17
Right now it works,thanks.
 
Old 09-19-2010, 12:13 PM   #9
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Just to follow up, I decided to try again with a slightly different track, and came up with an even more simplified version.
Code:
sed -rn '3 s/\\[^ ]+ ?|\r$//gp' licence.rtf
This one simply removes from the line any string that starts with a \, and possible trailing spaces, as well as any dos carriage returns. This also has the advantage in that it can be used on any line that carries this pattern.

There's no easy way to work with lines that have {} braces, however, because they can be nested, and that's really hard to deal with in sed. For more complex operations like that, you may want to look at something like unrtf instead. I just tried it and it appears to work well, although you have to compensate for the header it adds when filtering out the line(s) you want.
Code:
unrtf --nopict --text e_file.txt 2>/dev/null | sed -n 9p
The 2>/dev/null is only there to get rid of the program info that's displayed, but since that appears on stderr, it shouldn't affect scripting much anyway. The program could really use a non-verbose option of some kind.
 
Old 09-19-2010, 02:56 PM   #10
marietto
Member
 
Registered: Aug 2010
Posts: 96

Original Poster
Rep: Reputation: 17
this also work :

mario@mario-desktop:/media/FREEDOS/sh$ cat license.rtf | grep "WINDOWS" | cut -d'\' -f9 | cut -d' ' -f "2 3 4"

WINDOWS 7 ULTIMATE

now I have an easier problem :

I'm trying to parse the version of Windows XP written inside the eula.txt : this is the beginning :

Microsoft Windows XP Home Edition

END-USER LICENSE AGREEMENT

IMPORTANT-READ CAREFULLY: This End-User

the text I need to grab is always on the first line : in this case it is :

Microsoft Windows XP Home Edition

I did :

ver_windows=$(cat /mnt/sda1/Windows/system32/eula.txt | grep "Microsoft Windows")
echo $ver_windows

Microsoft Windows XP Home Edition

if [ "$ver_windows" = "Microsoft Windows XP Home Edition" ]; then ver_windows_min="xp"
echo home
fi

output : nothing

I think that $ver_windows is not = to "Microsoft Windows XP Home Edition" ,why ?
 
Old 09-19-2010, 03:50 PM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Well, there are probably a hundred different ways to extract the string using different tools. In general though, it's more efficient to use a single program than a chain of piped commands. sed can do everything grep and cut can do, and more flexibly, so I recommend it.

As for your new request, first of all, there's no need to use cat with grep, since it can read the file directly. The majority of text-based tools can do this.
Code:
grep "Microsoft Windows" eula.txt
But I'll bet you anything that you're running across the line-ending problem again. You see, unix uses LF (line feed) for it's line endings, while dos uses CRLF (carriage return+line feed). When you grepped the file, you most likely grabbed the invisible CR along with the text, which is why the test fails. Try confirming it with this:
Code:
echo $ver_windows |cat -A
It will probably show you this: Microsoft Windows XP Home Edition^M$. ^M is the CR.

Sed is again the better tool here. With it's ability to change text and also target lines based on line number, it can do everything at once.
Code:
ver_windows=$(sed -n '1 s/\r$//p' /mnt/sda1/Windows/system32/eula.txt)
There are other options, such as using read to grab the first line from the file, then strip off the invisible carriage return with a parameter substitution. Indeed, this would be the most efficient method, since it works entirely within bash.
Code:
read ver_windows </mnt/sda1/Windows/system32/eula.txt
ver_windows=${ver_windows%$'\r'}
#note that the extquote shell option muse be enabled for the above to work.
Or, since you seem to need to use Windows files a lot, consider simply converting those files over to unix mode before using them. There are several tools that can do this, such as tofrodos or flip, or the sed command I used above (just remove the "1" to make it affect the whole file, and use the -i option to edit in place).
 
Old 09-19-2010, 05:03 PM   #12
marietto
Member
 
Registered: Aug 2010
Posts: 96

Original Poster
Rep: Reputation: 17
Thanks for your help. I really don't understand.

ver_windows=$(sed -n '1 s/\r$//p' /mnt/sda1/Windows/system32/eula.txt)

--> Microsoft Windows XP Home Edition

echo $ver_windows |cat -A

--> Microsoft Windows XP Home Edition$

if [ "$ver_windows" = "Microsoft Windows XP Home Edition" ]; then ver_windows_min="xp"
echo home
fi

if [ "$ver_windows" = "Microsoft Windows XP Professional Edition" ]; then ver_windows_min="xp"
echo pro
fi

echo $ver_windows_min

---> nothing is displayed.

FIXED : there was a stupid hidden space at the end of the first line.

Last edited by marietto; 09-19-2010 at 05:38 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help needed for using awk to parse a file to make array for bash script tallmtt Programming 12 04-14-2012 01:16 PM
bash script and grep inside file? agrinog Programming 3 04-02-2010 02:26 PM
PHP Script to parse Word/RTF Documents saravanan1979 Programming 10 02-18-2010 07:25 AM
Parse RPM version string in Bash jimwelc Linux - Newbie 1 02-28-2005 05:22 PM
Need help with perl/bash script to parse PicBasic file cmfarley19 Programming 13 11-18-2004 05:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration