Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Not sure it's a newbie, but here we go...
This is a script related question.
I'm trying to get a specific sentence which resides in a line.
for example - "this is a very long sentence with 9 words" and i would like to use 'awk' (or any other tool for that matter) which can print out the following: "this is a very long". first word and last word never change (they are actually symbols like '*' or '|' in my case)
but the word count in between these 2 words changes from one line to another. I'm trying 'sed' now - but not sure it would help.
It can be done easily with awk, but there must be some matching pattern, so a range of variables to print can be defined in cmd. However once share a sample file for good understanding of your requirement. And also share what you've tried so far.
Last edited by shivaa; 11-26-2012 at 09:17 AM.
Reason: Typo
Well, I found a delimiter I can use! although it is: "³". (CTRL+, guys... it is a very small 3 which is actually (according to what I've found) a Unicode charecter and it's details are:
char: Unicode character U+00B3
Name: Superscript Three
Chart: Latin-1 Supplement
Decimal: 179
Hexadecimal: U+00B3
and so I have 2 questions:
1. Is it possible to use it "as is"? meaning - actually use ' awk -F "³" ' in order to define block selections? It's working for me though I'm not sure it'll work in any environment because ASCII translation might change from one terminal to another?
2. If not possible - is it possible to define a Unicode character as a delimiter meaning something like ' awk -F "U+00b3" or something?
The actual like that I'm trying to parse is:
Rule No 1 ³active ³Server1 ³any ³oneway
and I need just the " Rule No 1 " part. Unfortunately, white-spaces are possible..:-/
The bash shell and most of the other gnu tools are fully utf-8 compatible these days, as long as the environment is set up for it. You can just cut&paste the values in.
One thing that used to be difficult though is getting the shell to generate non-ascii text. But as of bash v4.2+ this has been solved. echo -e, printf %b, and the ansi-c style $'..' quoting pattern all expand "\uNNNN" unicode codepoints to their proper values.
Code:
$ echo -e $'\u00B3'
³
#to use it in a command
awk -F $'\u00B3' ....
Although as shown this is unneeded in awk or sed, as they also have similar ability built in.
In earlier bash shells you have to encode the characters as multi-byte utf-8 hex values (not raw unicode hex!), as so:
Code:
$ echo -e '\xC2\xB3'
³
On another point, how is this string being stored and supplied? If it has already been stored in a shell variable, then it should be trivial to parse it out using built-in parameter substitution or some other kind of string manipulation.
Just enable the extquote shell option first to allow you to use the ansi-c quotes inside parameter substitutions.
Code:
$ string='Rule No 1 ³active ³Server1 ³any ³oneway'
$ echo "${string%% ³*}"
Rule No 1
$ shopt -s extquote
$ echo "${string%%$' \u00B3'*}"
Rule No 1
Last edited by David the H.; 11-28-2012 at 12:16 PM.
Reason: small correction
Although as I mentioned, with awk it's probably better using it's own built-in character interpreting instead of relying on the shell (see the post above mine by Turbocapitalist).
Please mark the thread as "solved".
Edit: after a couple of tests, awk apparently doesn't accept unicode points, but it can expand multi-byte strings, in the same manner as earlier versions of bash.
Code:
awk -F '\xC2\xB3'
Last edited by David the H.; 12-05-2012 at 06:59 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.