using 'awk' to parse From word To word
Hi,
Not sure it's a newbie, but here we go...:) This is a script related question. I'm trying to get a specific sentence which resides in a line. for example - "this is a very long sentence with 9 words" and i would like to use 'awk' (or any other tool for that matter) which can print out the following: "this is a very long". first word and last word never change (they are actually symbols like '*' or '|' in my case) but the word count in between these 2 words changes from one line to another. I'm trying 'sed' now - but not sure it would help. Any ideas? |
You need something like that?
Code:
echo "this is a very long sentence with 9 words" | cut -d" " -f1-n |
It can be done easily with awk, but there must be some matching pattern, so a range of variables to print can be defined in cmd. However once share a sample file for good understanding of your requirement. And also share what you've tried so far.
|
Can you give a few lines of sample data showing what you really have to work with?
|
Well, I found a delimiter I can use! although it is: "³". (CTRL+, guys...:) it is a very small 3 which is actually (according to what I've found) a Unicode charecter and it's details are:
char: Unicode character U+00B3 Name: Superscript Three Chart: Latin-1 Supplement Decimal: 179 Hexadecimal: U+00B3 and so I have 2 questions: 1. Is it possible to use it "as is"? meaning - actually use ' awk -F "³" ' in order to define block selections? It's working for me though I'm not sure it'll work in any environment because ASCII translation might change from one terminal to another? 2. If not possible - is it possible to define a Unicode character as a delimiter meaning something like ' awk -F "U+00b3" or something? The actual like that I'm trying to parse is: Rule No 1 ³active ³Server1 ³any ³oneway and I need just the " Rule No 1 " part. Unfortunately, white-spaces are possible..:-/ Thanks. |
awk should work as-is. Or you can write out the character like this:
Code:
awk -F "\xb3" '{print $1}' |
The bash shell and most of the other gnu tools are fully utf-8 compatible these days, as long as the environment is set up for it. You can just cut&paste the values in.
One thing that used to be difficult though is getting the shell to generate non-ascii text. But as of bash v4.2+ this has been solved. echo -e, printf %b, and the ansi-c style $'..' quoting pattern all expand "\uNNNN" unicode codepoints to their proper values. Code:
$ echo -e $'\u00B3' In earlier bash shells you have to encode the characters as multi-byte utf-8 hex values (not raw unicode hex!), as so: Code:
$ echo -e '\xC2\xB3' Just enable the extquote shell option first to allow you to use the ansi-c quotes inside parameter substitutions. Code:
$ string='Rule No 1 ³active ³Server1 ³any ³oneway' |
David, Thank you for the detailed answer!
It was very helpful! :) script is working just fine with Code:
awk -F $'\u00B3' |
Glad it's working for you.
Although as I mentioned, with awk it's probably better using it's own built-in character interpreting instead of relying on the shell (see the post above mine by Turbocapitalist). Please mark the thread as "solved". Edit: after a couple of tests, awk apparently doesn't accept unicode points, but it can expand multi-byte strings, in the same manner as earlier versions of bash. Code:
awk -F '\xC2\xB3' |
All times are GMT -5. The time now is 04:06 AM. |