LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-05-2008, 04:19 AM   #1
grishaoks
LQ Newbie
 
Registered: Dec 2008
Posts: 7

Rep: Reputation: 0
Bash scripts


Hello,

I am new to Bash scripts and I have a question.

I have a txt file and I need to take all the words there that begin or end with a regular English letter and copy all these words to another txt file.
Now I know how to read from a file line by line, the question is how can I go to a specific word or letter in that line?
As I see it, new word begins after "a blank space" and ends before "blank space". Maybe something with String?

Thanks in advance!

Greg
 
Old 12-05-2008, 06:02 AM   #2
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 96
Hi.

I think awk is your friend here, or maybe perl. Here's how I'd do it with awk:
Code:
awk '{for (i = 1; i <= NF; i++) {if ($i ~ /^[A-Za-z].*/) printf "%s", $i " "}; printf "%s", "\n"}' InputFile | sed 's/ $//' > OutputFile
awk operates on words (strings separated by spaces/tabs) by default, assigning each word in a line to a '$X' variable, so you can loop through each variable ('for (i = 1; i <= NF; i++)'), and check whether it starts with a normal letter ('if ($i ~ /^[a-zA-Z].*/'), and print the word and a space if it does ('printf "%s", $i " "'). Once we're done, we need to strip the trailing space on each line ('sed 's/ $//').

Apologies if that made absolutely no sense. I think the line does what you want, though.

Dave

Last edited by ilikejam; 12-05-2008 at 06:10 AM. Reason: Edge case in original awk line - if $NF stated with a non-[a-zA-Z], there would be no newline before the next line.
 
Old 12-05-2008, 07:34 AM   #3
grishaoks
LQ Newbie
 
Registered: Dec 2008
Posts: 7

Original Poster
Rep: Reputation: 0
Thanks!

Quote:
Originally Posted by ilikejam View Post
Hi.

I think awk is your friend here, or maybe perl. Here's how I'd do it with awk:
Code:
awk '{for (i = 1; i <= NF; i++) {if ($i ~ /^[A-Za-z].*/) printf "%s", $i " "}; printf "%s", "\n"}' InputFile | sed 's/ $//' > OutputFile
awk operates on words (strings separated by spaces/tabs) by default, assigning each word in a line to a '$X' variable, so you can loop through each variable ('for (i = 1; i <= NF; i++)'), and check whether it starts with a normal letter ('if ($i ~ /^[a-zA-Z].*/'), and print the word and a space if it does ('printf "%s", $i " "'). Once we're done, we need to strip the trailing space on each line ('sed 's/ $//').

Apologies if that made absolutely no sense. I think the line does what you want, though.

Dave
Thank you very much Dave, it made a lot of sense and helped me a lot!
Can you just tell me what do you mean strip the trailing space on each line? What does it do?
In the end I made the code like this:

Code:
awk '{for (i = 1; i <= NF; i++) 
	{if (($i ~ /^[a-zA-Z].*/) && ($i ~ /[a-zA-Z]$.*/) && ("$i" -le 10)) 
		printf "%s", $i "\n"};
     }' INPUT FILE | sed 's/ $//' > OUTPUT FILE
I wanted the words to end with normal letters also ($i ~ /[a-zA-Z]$.*/) but I also want the words to be no longer then 10 letters ("$i" -le 10) though I dont know why it still gives me words longer than 10 letters long..You have any idea what I did wrong here?
 
Old 12-05-2008, 07:58 AM   #4
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 96
With the code I gave, every word is printed with a space after it, and keeping the carriage returns in the same places as the original file. Since every word, including the last, was printed with a space after it, that space had to be removed.
I see you're going for a one-word-per-line format, so this no longer applies - you can simplify it by removing the sed part, and using 'print' instead of 'printf'

Your regex for the 'ends in a letter' match is odd it should be: /[a-zA-Z]$/
My 'starts with a letter' regex had some unnecessary stuff in it too. /^[a-zA-Z]/ would do.

You can get the length of a string with length(string).

So:
Code:
awk '{for (i = 1; i <= NF; i++) 
	{if (($i ~ /^[a-zA-Z]/) && ($i ~ /[a-zA-Z]$/) && (length($i) < 10)) 
		print $i
        }
     }' INPUT FILE > OUTPUT FILE
should do the business.

Last edited by ilikejam; 12-05-2008 at 08:05 AM.
 
Old 12-05-2008, 08:14 AM   #5
grishaoks
LQ Newbie
 
Registered: Dec 2008
Posts: 7

Original Poster
Rep: Reputation: 0
Yep that does the trick Well, thanks again for your help!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need some help with BASH scripts mcdcyex Linux - Newbie 13 04-01-2008 12:50 PM
I need some help with BASH scripts. BoB4ik Programming 10 12-20-2007 05:06 PM
bash scripts hoffmanyew Programming 3 08-11-2005 02:27 AM
Need Help With Bash Scripts the_woelf Linux - Software 4 06-30-2004 10:09 AM
$? in Bash scripts clinton Linux - Newbie 4 02-20-2004 12:15 PM


All times are GMT -5. The time now is 04:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration