[SOLVED] Bash script to parse the Windows version written inside the license.rtf file
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Bash script to parse the Windows version written inside the license.rtf file
Hello,
Since I'm totally newbie in bash scripting,can someone help me to write a bash script to parse the version of Windows written inside the license.rtf file ? Thanks.
It starts with :
MICROSOFT SOFTWARE LICENSE TERMS
WINDOWS 7 ULTIMATE
These license terms are an agreement between Microsoft Corporation...
So I need to grab the version of Windows,in this case it is :
So, what you need is simply the second line? That should be a piece of cake using sed or a similar tool. There are only two difficulties I see that may need to be considered first.
1. Windows text files use a different line ending than unix files. The file would need to be converted before it could be properly parsed. This is easy to do.
2. Rich text is a kind of markup language, similar to html. What you see in the display is not exactly the text the file itself contains. The parsing would have to take this into account.
Can we see the contents of the file as displayed in a regular text editor?
Ok, that helps. Although it would be more convenient if you could copy the actual text here instead of making me work through a screenshot. We'd only need the text up to the line in question, and maybe one or two lines after it.
I see that the line you want is the third line. I'll work on that for now. But now the question becomes, does every version of the file conform to the same format? How flexible does the extraction parsing need to be?
{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe2052\deftab360{\fonttbl{\f0\fswiss\fprq2\fcharset 0 Tahoma;}}
{\*\generator Msftedit 5.41.21.2508;}\viewkind4\uc1\pard\nowidctlpar\sb120\sa120\b\f0\fs20 MICROSOFT SOFTWARE LICENSE TERMS\par
\pard\brdrb\brdrs\brdrw10\brsp20 \nowidctlpar\sb120\sa120 WINDOWS 7 ULTIMATE\par
\pard\nowidctlpar\sb120\sa120\b0 These license terms are an agreement between Microsoft Corporation (or based on where you live, one of its affiliates) and $
\pard\nowidctlpar\sb120\sa120\tx0\'b7\tab updates,\par
\'b7\tab supplements,\par
\'b7\tab Internet-based services, and\par
\'b7\tab support services\par
\pard\nowidctlpar\sb120\sa120 for this software, unless other terms accompany those items. If so, those terms apply.\par
\b By using the software, you accept these terms. If you do not accept them, do not use the software. Instead, return it to the retailer for a refund or cre$
\b As described below, using the software also operates as your consent to the transmission of certain computer information during activation, validation an$
\pard\brdrt\brdrs\brdrw10\brsp20 \nowidctlpar\sb120\sa120 If you comply with these license terms, you have the rights below for each license you acquire.\par
\pard\nowidctlpar\sb120\sa120\tx360 1.\tab OVERVIEW.\par
Thank you. But please use [code][/code] tags around preformatted text in the future.
If you assume the text you want will always be on the third line, will always be in approximately the same format, and that it will always begin with the word WINDOWS (that's a lot of assumptions!), then this should do it.
Code:
sed -rn -e '1,3 s/.$//' -e '3 s/^.+(WINDOWS[^\]+)\\par$/\1/p' licence.rtf
The first -e expression converts the line endings on the first three lines to unix format, then the second -e extracts everything from the word WINDOWS to the final \par, if it exists, from the third line.
If you need it to be more flexible than that, then you'll have to show us all the possible variations you may encounter.
Edit: After thinking about it a bit, I can make it even shorter. By taking the dos-mode carriage return into account inside the main expression, I can eliminate the first one entirely. This should work no matter which format it's in:
Code:
sed -rn '3 s/^.+(WINDOWS[^\]+)\\par\r?$/\1/p' licence.rtf
I'm still trying to figure out if there'd be an easy way to extract the text if it didn't start with the word "WINDOWS".
Last edited by David the H.; 09-19-2010 at 09:13 AM.
Reason: as above
Just to follow up, I decided to try again with a slightly different track, and came up with an even more simplified version.
Code:
sed -rn '3 s/\\[^ ]+ ?|\r$//gp' licence.rtf
This one simply removes from the line any string that starts with a \, and possible trailing spaces, as well as any dos carriage returns. This also has the advantage in that it can be used on any line that carries this pattern.
There's no easy way to work with lines that have {} braces, however, because they can be nested, and that's really hard to deal with in sed. For more complex operations like that, you may want to look at something like unrtf instead. I just tried it and it appears to work well, although you have to compensate for the header it adds when filtering out the line(s) you want.
Code:
unrtf --nopict --text e_file.txt 2>/dev/null | sed -n 9p
The 2>/dev/null is only there to get rid of the program info that's displayed, but since that appears on stderr, it shouldn't affect scripting much anyway. The program could really use a non-verbose option of some kind.
Well, there are probably a hundred different ways to extract the string using different tools. In general though, it's more efficient to use a single program than a chain of piped commands. sed can do everything grep and cut can do, and more flexibly, so I recommend it.
As for your new request, first of all, there's no need to use cat with grep, since it can read the file directly. The majority of text-based tools can do this.
Code:
grep "Microsoft Windows" eula.txt
But I'll bet you anything that you're running across the line-ending problem again. You see, unix uses LF (line feed) for it's line endings, while dos uses CRLF (carriage return+line feed). When you grepped the file, you most likely grabbed the invisible CR along with the text, which is why the test fails. Try confirming it with this:
Code:
echo $ver_windows |cat -A
It will probably show you this: Microsoft Windows XP Home Edition^M$. ^M is the CR.
Sed is again the better tool here. With it's ability to change text and also target lines based on line number, it can do everything at once.
There are other options, such as using read to grab the first line from the file, then strip off the invisible carriage return with a parameter substitution. Indeed, this would be the most efficient method, since it works entirely within bash.
Code:
read ver_windows </mnt/sda1/Windows/system32/eula.txt
ver_windows=${ver_windows%$'\r'}
#note that the extquote shell option muse be enabled for the above to work.
Or, since you seem to need to use Windows files a lot, consider simply converting those files over to unix mode before using them. There are several tools that can do this, such as tofrodos or flip, or the sed command I used above (just remove the "1" to make it affect the whole file, and use the -i option to edit in place).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.