LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Seeking advice with shell script (homework) (https://www.linuxquestions.org/questions/linux-newbie-8/seeking-advice-with-shell-script-homework-4175576825/)

sobey 04-06-2016 09:23 AM

Seeking advice with shell script (homework)
 
My situation is the same as stated in the forum link below:

http://www.linuxquestions.org/questi...rk-4175485733/

Below is what I have compiled thus far in my research. I need to make this all one line and have it work but so far I can not get any of it to work... What am I doing wrong?

Also I ask for kindness, yes this is homework and I am willing to do the work I just need guidance on where to go with it so please do not be rude.


Code:

#!/bin/bash
# Bash TestScript

#a. Remove punctuation
sed -e 's|^[[:punct:]]*||; s|[[:punct:]]*$||;' -i gasoline

#b. Make all characters lowercase
tr '[:upper:]' '[:lower:]' < gasoline > ScriptResults

#c. Put each word on a line by itself
tr ' ' '\n' < gasoline

#d. Remove blank lines
$ sed '/^$/d' gasoline > ScriptResults

#e. Sort the text to pull all lines containing the same word on adjacent lines
sort ScriptTest | uniq -c | sort -rn | head -n 12 | sed -E 's/^ *[0-9]+ //g'

#f. Remove duplicate words from the text
sed -ri ‘s/(.* )1/1/g’ ScriptResults

#g. List most used words in the file first
cat ScriptResults | tr ' '  '\012' | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]' | sort | uniq -c | sort -rn | head

exit 0


BW-userx 04-06-2016 09:39 AM

you have just about the same thing but you keep NOT adding your input data only a blank file that goes by the name of 'gasoline' . what is it you're actually working with here?

Habitual 04-06-2016 09:40 AM

Quote:

Originally Posted by sobey (Post 5527066)
I can not get any of it to work... What am I doing wrong?

First, describe better what "I can not get any of it to work" means exactly?
In detail, step-by-painful-step.
Any output? Error code? Permission messages? Does not exist messages?
I/O errors? Something besides "I can not get any of it to work"

Welcome to LQ!

sobey 04-06-2016 09:41 AM

2. Create and save a new text file called Gasoline that consists of the following content:

Gas prices rose only half a penny a gallon in the past two weeks, continuing an unusual 20-week trend of mostly steady prices.

3. Create a script file called TestScript that completes the following tasks for the Gasoline file. Hint: Add one command at a time, save the TestScript file, run it, and debug it, before adding the next command.

a. Remove punctuation
b. Make all characters lowercase
c. Put each word on a line by itself
d. Remove blank lines
e. Sort the text to pull all lines containing the same word on adjacent lines
f. Remove duplicate words from the text
g. List most used words in the file first h. Send the output of this script to a file named ScriptResults
4. Give the TestScript file execute permission and run it. Important: When you are done, leave the ScriptResult file in your home directory for grading.

Hints:
 This script can be written as one continuous line of several commands, where the output of one command is piped into the next command.
 Gasoline should only appear once, as input to the first command.
 ScriptResults should only appear once, as output of the last command.

sobey 04-06-2016 09:45 AM

I get an invalid command message... my last script I did I got this and it had to do with the placement of the switches and words. thus why I am asking for guidance... Am I on the right track? to indepth/ complicated?

grail 04-06-2016 10:32 AM

Based on the information you have provided, I would think the 'Hints' are actually requirements and therefore no hints at all.

a. Remove punctuation - You need to re-look at this one. Your current sed is trying to remove punctuation from either the start or end of the line (denoted by the ^ and $), so the comma in your example will not be removed

b, c and d look ok

e. Sort the text to pull all lines containing the same word on adjacent lines - I think you incorporated part of step 'f' into this one. On reading the statement a few times, it reads that you are only sorting the data here to be prepared to remove adjacent words, not actually do the removing here

f. Remove duplicate words from the text - the action here seems quite clear, but your sed does not appear to make any sense (could be some backslashes are missing to use capturing??

g. List most used words in the file first h - firstly, not sure if the 'h' at the end is a typo? (if not then we are missing part of this statement) I think I would need more information on this one.
If you are to in fact use a single piped together set of commands, the only way to know the most occurring words, which you removed in the previous step, is to count them, but any count used in a prior command to the pipe could be potentially lost. You could add a count to your uniq, but this then requires additional removal of the numbers at the end. Unless maybe the teacher wants to see the count (might be implied but it has not been said)

Quote:

I get an invalid command message
Please show the full error

sobey 04-06-2016 11:06 AM

This information has been extremely helpful, give me some time to work on this and I will post my findings. Again thank you.

sobey 04-06-2016 09:10 PM

I worked on this and got responses from all except the very first command sequence, results below:


Code:

  -bash-3.2$ sed -e 's|^[[:punct:]]*||; s|[[:punct:]]*$||;' -i Gasoline
-bash-3.2$ tr '[:upper:]' '[:lower:]' < Gasoline
gas prices rose only half a penny a gallon in the past two weeks, continuing an                                                                                                                                                              unusual 20-week trend of mostly steady prices
-bash-3.2$ tr ' ' '\n' < Gasoline
Gas
prices
rose
only
half
a
penny
a
gallon
in
the
past
two
weeks,
continuing
an
unusual
20-week
trend
of
mostly
steady
prices
-bash-3.2$ sed '/^$/d' Gasoline
Gas prices rose only half a penny a gallon in the past two weeks, continuing an                                                                                                                                                              unusual 20-week trend of mostly steady prices
-bash-3.2$ sort Gasoline | uniq -c | sort -rn | head -n 12 | sed -e 's/^ *[0-9]+                                                                                                                                                              //g'
      1 Gas prices rose only half a penny a gallon in the past two weeks, contin                                                                                                                                                            uing an unusual 20-week trend of mostly steady prices
-bash-3.2$ sed -ri .s/(.* )1/1/g. Gasoline
-bash: syntax error near unexpected token `('
-bash-3.2$ cat Gasoline | tr ' '  '\012' | tr '[:upper:]' '[:lower:]' | tr -d '[                                                                                                                                                            :punct:]' | grep -v '[^a-z]' | sort | uniq -c | sort -rn | head
      2 prices
      2 a
      1 weeks
      1 unusual
      1 two
      1 trend
      1 the
      1 steady
      1 rose
      1 penny
-bash-3.2$


sobey 04-06-2016 09:22 PM

The first command did not give me any results but the below command sequence works.

tr -d '[:punct:]' < Gasoline

combined with the other results I now need to combine all commands in one line.

Question, can I use the tr command to remove blank lines and remove duplicate words from text?

grail 04-07-2016 12:00 AM

I would probably use sed to remove the blank lines and uniq for the duplicate words. You could use sort for the duplicate words but part of the requirement seemed to be an indicator to show which
words had been repeated the most, ie. a count, so uniq can do this

BW-userx 04-07-2016 07:03 AM

Quote:

Originally Posted by sobey (Post 5527390)
The first command did not give me any results but the below command sequence works.

tr -d '[:punct:]' < Gasoline

combined with the other results I now need to combine all commands in one line.

Question, can I use the tr command to remove blank lines and remove duplicate words from text?

here is a little how to on using tr

The tr Command

sobey 04-07-2016 09:23 AM

Quote:

Originally Posted by grail (Post 5527442)
I would probably use sed to remove the blank lines and uniq for the duplicate words. You could use sort for the duplicate words but part of the requirement seemed to be an indicator to show which
words had been repeated the most, ie. a count, so uniq can do this

I like this thought, especially since the sed command sequence for the duplicate words errors, How would I combine them? something like the below command string perhaps?

sed '/^$/d' | uniq -u

I know the -u switch removes duplicate lines... is lines and words the same thing?

sobey 04-07-2016 09:28 AM

Quote:

Originally Posted by BW-userx (Post 5527557)
here is a little how to on using tr

The tr Command

This link will be helpful as I research this, I was trying to use the -s switch and a few others without much success. I think that maybe I am getting twisted on the terminology, example, I am tasked to remove duplicate words but a command to remove duplicate characters can be viewed as doing the same thing but I am not piecing that together... If I am over thinking and/ or over complicating this then help me simplify it.

BW-userx 04-07-2016 09:51 AM

Quote:

Originally Posted by sobey (Post 5527620)
This link will be helpful as I research this, I was trying to use the -s switch and a few others without much success. I think that maybe I am getting twisted on the terminology, example, I am tasked to remove duplicate words but a command to remove duplicate characters can be viewed as doing the same thing but I am not piecing that together... If I am over thinking and/ or over complicating this then help me simplify it.

I know string compare works but it seems your teacher has limited you in what you can use being just sed and tr. Is that correct?

Code:

The -d option is used to delete every instance of the string (i.e.,
 sequence of characters) specified in set1. Thus, for example, the following would
 remove every instance of the word soft from a copy of the text in a file named
file11 and write the modified text to a file named file12:


    cat file11 | tr -d 'soft' > file12


The quotation marks are necessary for tr to treat the argument as a string. If they
 are not used, everything in the argument is instead treated as individual
 characters. Thus, if the above example were rewritten without the quotation marks,
 it would remove every instance of the letters s, o, f and t. Interestingly, the
 quotation marks cannot be used to treat arguments as strings when not using the -d
 option.


Among the few remaining options is -c, which causes tr to work on the complement of
 the specified characters, that is, on the characters that are not in the given set.


tr contains much of most basic functionality of the command line program sed, which
 is used to perform basic editing on streams of text supplied by a pipe. However,
it often advantageous to use tr instead of sed because the former is simpler and
requires less typing and because it is easier to incorporate into scripts.

sed : is what you should use instead.

sobey 04-07-2016 10:18 AM

Quote:

Originally Posted by BW-userx (Post 5527631)
I know string compare works but it seems your teacher has limited you in what you can use being just sed and tr. Is that correct?

Code:

The -d option is used to delete every instance of the string (i.e.,
 sequence of characters) specified in set1. Thus, for example, the following would
 remove every instance of the word soft from a copy of the text in a file named
file11 and write the modified text to a file named file12:


    cat file11 | tr -d 'soft' > file12


The quotation marks are necessary for tr to treat the argument as a string. If they
 are not used, everything in the argument is instead treated as individual
 characters. Thus, if the above example were rewritten without the quotation marks,
 it would remove every instance of the letters s, o, f and t. Interestingly, the
 quotation marks cannot be used to treat arguments as strings when not using the -d
 option.


Among the few remaining options is -c, which causes tr to work on the complement of
 the specified characters, that is, on the characters that are not in the given set.


tr contains much of most basic functionality of the command line program sed, which
 is used to perform basic editing on streams of text supplied by a pipe. However,
it often advantageous to use tr instead of sed because the former is simpler and
requires less typing and because it is easier to incorporate into scripts.

sed : is what you should use instead.


The teacher has not given much help in any of this, her response so far is google it... This is the second time during our class term that we have been tasked to script... She has given us access to a command line driven Linux server called pyrite which we will be doing the work but the entire class has been watch the videos and google it thus my dilemma... I am willing to do the work (produced all this thus far) I just need more guidance and understanding... The results from the sed command sequence that is suppose to remove duplicate words gave an error, how am I to fix that?

Code:

-bash-3.2$ sed -ri .s/(.* )1/1/g. Gasoline
-bash: syntax error near unexpected token `('



All times are GMT -5. The time now is 12:11 PM.