LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Help with a bash script (https://www.linuxquestions.org/questions/programming-9/help-with-a-bash-script-826370/)

wasup 08-15-2010 12:58 PM

Help with a bash script
 
Awesome forum!

I am trying to figure out a way to pull http links out of text files and then output the results in a log. The text files are in folders like this inside a source directory.

/source
./folder1
...folder1.txt
./folder2
...folder1.txt
./folder3
...folder1.txt

So basically I would like to get the output to look something like this.

folder1
http://example.com

folder2
http://example.com

folder3
http://example.com

I just can't wrap my head around how to do this.

Thanks in advance

David the H. 08-15-2010 01:25 PM

Sounds like a possible homework question, so just some basic advice for the moment.

Break it down into the steps you need to perform. Figure out how to do each one individually, then you can combine them at the end.

First you need to compile a list of filenames. Take a look at the find command.

Next, you need to figure out how to extract the links you need from each file. This depends on the exact format of the text, but the usual tools are grep, sed, or awk.

Finally, create a loop to process each file in the list and output the desired format to your log file.

wasup 08-16-2010 11:59 PM

David thanks for the help.

Give the folder names:
ls ~/folder

Would give the path of the text files:
find ~/folder -name *txt

This would extract the http link out of the txt files:
grep "http://" /folder/name.txt | sed 's/^.*http:/http:/' | sed 's/\s.*$//' | sort

The thing that stumps me is how to do the loops.

The output I am looking for would be...

folder
http://www.link.com

folder1
http://www.link.com

etc

grail 08-17-2010 12:07 AM

Hey wassup

Quote:

ls ~/folder
There are a few potential issues with using ls (although possibly not here). The one issue I can see though is that your ls will
make no distinction between files and directories, so assuming you only want directories it will fail.
Quote:

find ~/folder -name *txt
This is a better start and if you run it on the command line you will see that you also get the folder names (hint)
Quote:

grep "http://" /folder/name.txt | sed 's/^.*http:/http:/' | sed 's/\s.*$//' | sort
Here I will be a little harsher and say why??? Now I realise you are newish so the reason for the why is because sed can search just like grep can
so it is a waste to have it in there. Also, maybe you could show us a before and after of the line you want as the sed's seem quite over the top as well.
Quote:

The thing that stumps me is how to do the loops.
Lastly, go to your favorite source for bash (I like this one) and look up either a for or while loop

konsolebox 08-17-2010 01:12 AM

Hello. Try this one.
Code:

cd ~/folder; find -type f -iname '*txt' -exec grep -o "http://[^[:blank:]\"']\+" {} \;


All times are GMT -5. The time now is 05:40 PM.