Bash Script: parse active process stderr, strip, dump into variables, use variables

TimeFade · 02-12-2010, 11:52 AM

I think some of my problem is trying to describe what I am trying to do. The script runs a process which dumps a lot info to stderr. While the process is still running I want to strip data from the stderr, dump it into variables, act upon the variables, then repeat the strip, dump, act.

I have a command that is the start of this, but I want to expand this.

The line I strip out has 3 parts. A http address, a file location, and a md5 hash. At present it strips, downloads the file to the location desired, and saves the md5 in filename.md5.
I want to prevent re-downloading the same files by testing to see if the filename.md5 exists and compare hash.

Code:

$ ./process 2>&1 |  sed -un 's/.*http/http/p' | grep --line-buffered -v "'" | awk '{ system("curl --create-dirs " $1 " -o " $2 " ; echo " $3 " > " $2".md5") }'

./process 2>&1 |
redirects stderr to stdout
sed -un 's/.*http/http/p' |
pulls the line we need and strips off any beginning garbage. -u is for unbuffered
grep --line-buffered -v "'" |
a catch for some bad http lines. Again unbuffered
awk '{ system("curl --create-dirs " $1 " -o " $2 " ; echo " $3 " > " $2".md5") }'
parse the line and pass the 1st segement to curl as download address, 2nd segment as file location, and put the 3rd agrument into same location as 2nd segment with a .md5 extension

Suggestions on how to test for making this into a proper script, and how to expand it to test to see if filename.md5 exists then compare the hashs if so?

Here is my attempt at doing it in the awk subsection. Is there a more elegant way?

Code:

./process 2>&1 |  sed -un 's/.*http/http/p' | grep --line-buffered -v "'" | awk '{ system("if [ -e " $2".md5 ] ; then if [ $(cat " $2".md5) != " $3 " ] ; then curl --create-dirs " $1 " -o " $2 " ; echo " $3 " > " $2".md5 ; fi ; else curl --create-dirs " $1 " -o " $2 " ; echo " $3 " > " $2".md5 ; fi") }'

carltm · 02-13-2010, 06:09 AM

Let me ask, are you trying to pull down a set of files from a website
and keep an md5 sum file for each? If so, there is a much easier way
to do this with website mirroring tools. If not, can you briefly
explain what you trying to do?