Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
This is my first time joining a forum and asking for help so forgive me if I did something incorrectly. I'm in school for networking and I am taking my first ubuntu class, I am stuck on this creating our first bash shell script assignment. In the assignment we downloaded a pdf of a bunch of team names and then the instructions say
Create a bash shell script, named uniq-teams.sh that uses pdftotext, sed, and perl -p -i -e (note: you may not need the -i) to create a sorted list of non-duplicated team names from the selected pages of the PDF file.
It must take the following parameters:
input pdf filename
start page
end page
If any of the parameters are missing, your program should print usage instructions.
Your script must remove the following:
Must remove blank/empty lines
Page breaks
Must remove duplicates
Must remove non-team lines, e.g. Area and Ballroom
could anyone guide me, i'm a complete noob, I have ubuntu running in puTTy. So far I entered vim uniq-teams.sh to bring up the editor then type #!/bin/bash and thats where im at. Please guide me in the right direction. Thank you so much
Welcome to LQ, hope you like it here. Odd your assignment wasn't accompanied by a lecture, introduction or assignment notes BTW?.. What you want is to first read some stuff.
A script is a way to automate actions, eg instead of typing each command by hand you just run the script.
Here, the pdftotext command is the main action (other commands will filter its output)
Try to familiar yourself with pdftotext, eg do some test on pdf files and see how it works, then go ahead to filter the output it produces
So one way to work out the script, is to figure out each step (line) manually one step at a time.
When you've figured out the options and got that step working, enter it into your script's file. If you set some debugging options in the second line of your script, it can help:
Code:
set -x -v
For some background information on each program, look at the manual pages for each one. They can be overwhelming so focus on the options that the instructor has provided, or look for options that provide the function asked for.
Code:
man pdftotext
man sed
man perlrun
man perlre
I wrote a short post on "sed" which has a few useful links though is aimed more at people that have already been using "sed" a bit.
"perl" is a more powerful scripting language than "bash" but it looks like the assignment is to use it for one-liners if the -p and -e options are being recommended. That's ok too. The -p wraps a loop around what you have in -e. The biggest advantage of perl is its pattern matching. You can get the full reference with man perlre. However, a lot of guides and books are available. Some of those books may be in your school library.
Last edited by Turbocapitalist; 11-06-2016 at 09:20 AM.
also if you have already created a script just post it and we will discuss it with you - and also will help you to improve it, but we won't write it for you.
Additionally you may try this site: www.shellcheck.net to check the script you wrote.
This class is all online, with a online textbook, im considering dropping it and taking it in person, I am very confused and am having no luck at all in which seems to be a easy assignment. I dont know, could anyone drop there email and I email them asking for further help as my teacher takes a while to answer back? Appreciate the help everyone
when your programing or scripting, beginning or not, take it and do it in steps.
pdftotext, sed, and perl -p -i -e
frist get your pdf file, and only use pdftotext on it then using a copy of your pdf file, experiment on it via your command line to get it to do what you need it to do. then take them commands and add it to your script file.
next learn sed
figure out how to get sed to extract what you need via the command line from the text file you created using pdftotext. when you get that output you need, take them commands and put them into your script file after the pdftotext commands, now on to your perl in whatever it does, I have no idea on perl and what it does. but I am thinking it is for formatting your output to the final result your teacher is wanting.
just take the output of sed and | pipe it into perl then let perl do what it is suppose to in order to give you the end results.
hint to what to use with perl were already given, -p -i -e
then with what you've completed their I am user if post your work on here their are others that can help you complete your task in figuring out how to put it al together within the script.
but if you can figure it all out by using the command line, on into the next then that is all you need to have and put that into your script file.
piping it a useful tool in command line arguments sending output into the input of another app/program so it can manipulate it further.
done.
it is a process one step at a time.
you're still going to have to do all of the steps regardless if it is on line or not.
Code:
userx@voided1.what~/Documents/Linux-how-tos/The Hacker's Manual 2015>> pdftotext -f 70 -l 71 'The Hacker'\''s Manual 2015.pdf'
userx@voided1.what~/Documents/Linux-how-tos/The Hacker's Manual 2015>> ls
'The Hacker'\''s Manual 2015.pdf' 'The Hacker'\''s Manual 2015.txt'
This class is all online, with a online textbook, im considering dropping it and taking it in person, I am very confused and am having no luck at all in which seems to be a easy assignment. I dont know, could anyone drop there email and I email them asking for further help as my teacher takes a while to answer back? Appreciate the help everyone
Sorry, but do you realize how incredibly rude this is???
We are happy to help you, happy to explain things, or assist if you're stuck...but you've flat-out posted a homework question, showed us NO effort of your own (not even the beginnings of a script), and have been given links to many bash scripting tutorials, man pages, and other things to help get you going. And you then want us to give you our personal email addresses, so you can mail us questions DIRECTLY because you don't want to WAIT for your teacher to answer, is a bit beyond the scope here. We will help you in the forums...but asking us to be your personal, FREE, one-on-one tutors is a bit much.
Please...can you show us what YOU have written/done/tried on your own so far, and tell us where you're stuck? BW-userx gave some solid advice...so start to THINK about things one step at a time:
Your assignment tells you that you need three things to function: an input file name, start page, and end page. That tells you that your script needs to take three command line arguments. Look up how to read command line arguments from the many resources you've been given.
It tells you that if they do NOT provide three, to give instructions to the user. Look up how to check if a variable is empty or not.
Read the instructions on pdftotext that tell you how to remove page breaks
Do a search on how to remove duplicates...you can do it with sed or perl. A simple perl example:
Code:
perl -ne '{$H{$_}++ or print $_'
Look up how to handle arrays in the bash script...you know you have a list of team names; that is one array. So read everything else in, and if its NOT in that array, don't print it.
We can help...but we will not do this for you. It is time for you to show us some effort of your own.
Well.. I don't use perl, but here is my quick untested idea:
Code:
#!/bin/bash
# print instructions function
function usage(){
echo "USAGE: uniq-teams.sh [options]"
echo " -f <int> : first page to convert"
echo " -l <int> : last page to convert"
echo " -p <filename> : pdf filename to convert"
echo " -t <filename> : text filename to output"
}
# if number of arguments less than 8 print instructions
if [[ $# -lt 8 ]]; then usage; exit; fi
# grab option arguments into variables
while getopts ":f:l:p:t:" opt; do
case $opt in
f) first="$OPTARG"
;;
l) last="$OPTARG"
;;
p) pdf="$OPTARG"
;;
t) text="$OPTARG"
;;
\?) usage; exit;
;;
esac
done
# convert pdf to text
pdftotext -f $first -l $last $pdf $text
# remove blank lines
sed -i '/^&/d' $text
# remove page breaks ^L
sed -i 's/\f//' $text
# remove Area and Ballroom lines
sed -i '/Area/d' $text
sed -i '/Ballroom/d' $text
# remove duplicates
sort -u $text -o $text.sorted
I keep a 'bash scripting skeleton' for writing bash scripts which has some error handling, functions, temp files and that kind of thing. Has some decent ideas in it, if you'd like to take a look for reference.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.