LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-26-2012, 07:28 AM   #1
malony101
LQ Newbie
 
Registered: Nov 2012
Posts: 5

Rep: Reputation: Disabled
Question using 'awk' to parse From word To word


Hi,

Not sure it's a newbie, but here we go...
This is a script related question.
I'm trying to get a specific sentence which resides in a line.
for example - "this is a very long sentence with 9 words" and i would like to use 'awk' (or any other tool for that matter) which can print out the following: "this is a very long". first word and last word never change (they are actually symbols like '*' or '|' in my case)
but the word count in between these 2 words changes from one line to another. I'm trying 'sed' now - but not sure it would help.

Any ideas?
 
Old 11-26-2012, 07:37 AM   #2
Velotrol
LQ Newbie
 
Registered: Apr 2011
Location: Rio, Brazil
Distribution: Gentoo
Posts: 15

Rep: Reputation: Disabled
You need something like that?
Code:
echo "this is a very long sentence with 9 words" | cut -d" " -f1-n
Where n is the number of words printed. In a script you must assing a number to a variable n, and that depends on your needs.
 
Old 11-26-2012, 09:07 AM   #3
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
It can be done easily with awk, but there must be some matching pattern, so a range of variables to print can be defined in cmd. However once share a sample file for good understanding of your requirement. And also share what you've tried so far.

Last edited by shivaa; 11-26-2012 at 09:17 AM. Reason: Typo
 
Old 11-26-2012, 02:18 PM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Can you give a few lines of sample data showing what you really have to work with?
 
Old 11-28-2012, 02:53 AM   #5
malony101
LQ Newbie
 
Registered: Nov 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Well, I found a delimiter I can use! although it is: "³". (CTRL+, guys... it is a very small 3 which is actually (according to what I've found) a Unicode charecter and it's details are:

char: Unicode character U+00B3
Name: Superscript Three
Chart: Latin-1 Supplement
Decimal: 179
Hexadecimal: U+00B3

and so I have 2 questions:

1. Is it possible to use it "as is"? meaning - actually use ' awk -F "³" ' in order to define block selections? It's working for me though I'm not sure it'll work in any environment because ASCII translation might change from one terminal to another?
2. If not possible - is it possible to define a Unicode character as a delimiter meaning something like ' awk -F "U+00b3" or something?

The actual like that I'm trying to parse is:

Rule No 1 ³active ³Server1 ³any ³oneway

and I need just the " Rule No 1 " part. Unfortunately, white-spaces are possible..:-/

Thanks.
 
Old 11-28-2012, 08:10 AM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
awk should work as-is. Or you can write out the character like this:

Code:
awk -F "\xb3" '{print $1}'
 
Old 11-28-2012, 12:07 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
The bash shell and most of the other gnu tools are fully utf-8 compatible these days, as long as the environment is set up for it. You can just cut&paste the values in.

One thing that used to be difficult though is getting the shell to generate non-ascii text. But as of bash v4.2+ this has been solved. echo -e, printf %b, and the ansi-c style $'..' quoting pattern all expand "\uNNNN" unicode codepoints to their proper values.

Code:
$ echo -e $'\u00B3'
³

#to use it in a command
awk -F $'\u00B3' ....
Although as shown this is unneeded in awk or sed, as they also have similar ability built in.

In earlier bash shells you have to encode the characters as multi-byte utf-8 hex values (not raw unicode hex!), as so:

Code:
$ echo -e '\xC2\xB3'
³
On another point, how is this string being stored and supplied? If it has already been stored in a shell variable, then it should be trivial to parse it out using built-in parameter substitution or some other kind of string manipulation.

Just enable the extquote shell option first to allow you to use the ansi-c quotes inside parameter substitutions.

Code:
$ string='Rule No 1 ³active ³Server1 ³any ³oneway'
$ echo "${string%% ³*}"
Rule No 1

$ shopt -s extquote
$ echo "${string%%$' \u00B3'*}"
Rule No 1

Last edited by David the H.; 11-28-2012 at 12:16 PM. Reason: small correction
 
Old 12-03-2012, 02:33 AM   #8
malony101
LQ Newbie
 
Registered: Nov 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
David, Thank you for the detailed answer!
It was very helpful!
script is working just fine with
Code:
awk -F $'\u00B3'
Thanks!!
 
Old 12-05-2012, 06:20 PM   #9
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Glad it's working for you.

Although as I mentioned, with awk it's probably better using it's own built-in character interpreting instead of relying on the shell (see the post above mine by Turbocapitalist).


Please mark the thread as "solved".


Edit: after a couple of tests, awk apparently doesn't accept unicode points, but it can expand multi-byte strings, in the same manner as earlier versions of bash.

Code:
awk -F '\xC2\xB3'

Last edited by David the H.; 12-05-2012 at 06:59 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Find/Replace shell script that replaces word with other word in text, filenames yanom Programming 8 09-12-2012 12:29 AM
bash shell script read file word by word part 2 justina Programming 7 01-25-2011 01:19 PM
PHP Script to parse Word/RTF Documents saravanan1979 Programming 10 02-18-2010 07:25 AM
I need to parse a word: awk or sed? mehesque Programming 5 07-27-2004 04:23 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration