LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Bash scripts (https://www.linuxquestions.org/questions/linux-newbie-8/bash-scripts-688475/)

grishaoks 12-05-2008 04:19 AM

Bash scripts
 
Hello,

I am new to Bash scripts and I have a question.

I have a txt file and I need to take all the words there that begin or end with a regular English letter and copy all these words to another txt file.
Now I know how to read from a file line by line, the question is how can I go to a specific word or letter in that line?
As I see it, new word begins after "a blank space" and ends before "blank space". Maybe something with String?

Thanks in advance!

Greg

ilikejam 12-05-2008 06:02 AM

Hi.

I think awk is your friend here, or maybe perl. Here's how I'd do it with awk:
Code:

awk '{for (i = 1; i <= NF; i++) {if ($i ~ /^[A-Za-z].*/) printf "%s", $i " "}; printf "%s", "\n"}' InputFile | sed 's/ $//' > OutputFile
awk operates on words (strings separated by spaces/tabs) by default, assigning each word in a line to a '$X' variable, so you can loop through each variable ('for (i = 1; i <= NF; i++)'), and check whether it starts with a normal letter ('if ($i ~ /^[a-zA-Z].*/'), and print the word and a space if it does ('printf "%s", $i " "'). Once we're done, we need to strip the trailing space on each line ('sed 's/ $//').

Apologies if that made absolutely no sense. I think the line does what you want, though.

Dave

grishaoks 12-05-2008 07:34 AM

Thanks!
 
Quote:

Originally Posted by ilikejam (Post 3365280)
Hi.

I think awk is your friend here, or maybe perl. Here's how I'd do it with awk:
Code:

awk '{for (i = 1; i <= NF; i++) {if ($i ~ /^[A-Za-z].*/) printf "%s", $i " "}; printf "%s", "\n"}' InputFile | sed 's/ $//' > OutputFile
awk operates on words (strings separated by spaces/tabs) by default, assigning each word in a line to a '$X' variable, so you can loop through each variable ('for (i = 1; i <= NF; i++)'), and check whether it starts with a normal letter ('if ($i ~ /^[a-zA-Z].*/'), and print the word and a space if it does ('printf "%s", $i " "'). Once we're done, we need to strip the trailing space on each line ('sed 's/ $//').

Apologies if that made absolutely no sense. I think the line does what you want, though.

Dave

Thank you very much Dave, it made a lot of sense and helped me a lot! :)
Can you just tell me what do you mean strip the trailing space on each line? What does it do?
In the end I made the code like this:

Code:

awk '{for (i = 1; i <= NF; i++)
        {if (($i ~ /^[a-zA-Z].*/) && ($i ~ /[a-zA-Z]$.*/) && ("$i" -le 10))
                printf "%s", $i "\n"};
    }' INPUT FILE | sed 's/ $//' > OUTPUT FILE

I wanted the words to end with normal letters also ($i ~ /[a-zA-Z]$.*/) but I also want the words to be no longer then 10 letters ("$i" -le 10) though I dont know why it still gives me words longer than 10 letters long..You have any idea what I did wrong here?

ilikejam 12-05-2008 07:58 AM

With the code I gave, every word is printed with a space after it, and keeping the carriage returns in the same places as the original file. Since every word, including the last, was printed with a space after it, that space had to be removed.
I see you're going for a one-word-per-line format, so this no longer applies - you can simplify it by removing the sed part, and using 'print' instead of 'printf'

Your regex for the 'ends in a letter' match is odd it should be: /[a-zA-Z]$/
My 'starts with a letter' regex had some unnecessary stuff in it too. /^[a-zA-Z]/ would do.

You can get the length of a string with length(string).

So:
Code:

awk '{for (i = 1; i <= NF; i++)
        {if (($i ~ /^[a-zA-Z]/) && ($i ~ /[a-zA-Z]$/) && (length($i) < 10))
                print $i
        }
    }' INPUT FILE > OUTPUT FILE

should do the business.

grishaoks 12-05-2008 08:14 AM

Yep that does the trick :) Well, thanks again for your help!


All times are GMT -5. The time now is 09:26 AM.