-   Linux - Software (
-   -   Help for simple bash script - searching strings (

lpallard 09-12-2011 06:36 PM

Help for simple bash script - searching strings
OK I've tried this one for too long and Im just beginning in Bash scripting and I need help...

I have this application running on my router which monitor and lists the data bandwidth in/out of my WAN connection. This application has its own web page that I can access to see the amount of data that was uploaded & downloaded.

Its a simple PHP page. Nothing fancy. Just text.

What I want to do is to program a simple bash script that I can run manually (or via cron every hour for example) and extract the value that corresponds to my bandwidth.

To do so, I first tried with curl to "download" the content of the page and using grep I can list the line where the value I am searching for is located. The problem is that I dont know how to extract the value from that line of text.

So more info:

I am using this command to get the page content and extract the line of interest:


curl -k -silent https://localrouter/vnstat2/ | grep "This month"
The result would be something like:


<tr><td class="label_even">This month</td><td class="numeric_even">5.10 GB</td><td class="numeric_even">110.30 MB</td><td class="numeric_even">5.21 GB</td></tr>
I highlighted the value of interest in bold (5.21 GB). How do I extract this value? The "GB" is not necessary. Please note the position of the first character of this value could change as the digits of the other numbers before (5.10 & 110.30) could very well change... The number of digits of the value itself can also change. I have no control over this... The PHP script does it.

Any bash, sed, awk, or whatever guru's out there?


macemoneta 09-12-2011 08:39 PM

Here's one way.

curl -k -silent https://localrouter/vnstat2/ | grep "This month" | sed -e 's/^.*numeric_even">//;s/<.*$//'

lpallard 09-12-2011 09:31 PM

That's perfect! Just before you posted I was trying to achieve the same but my command was wayyy longer and did not even produce the good result...

Thanks a lot!

I'm gonna keep the thread open for a little while because I'm not done with the script and I might need help later on...

Thanks again!

grail 09-12-2011 11:20 PM

And awk:

curl -k -silent https://localrouter/vnstat2/ | awk -F"[<>]*" '/This month/{print $(NF-2)}'

lpallard 09-13-2011 06:01 AM

When it's time to process text streams like this, which applications are best suited generally? Sed or awk? I understand that if I had to modify strings like replacing expressions , removing characters, etc sed a stream editor would be the best...

Do you guys have a good reference for learning these tools?

grail 09-13-2011 07:51 AM

It is not generally a good idea to say always this tool or that but rather the best suited for the job or some times the one you are most adept with.

As for references :-

David the H. 09-13-2011 09:37 AM

I'm afraid I don't understand what this means:

Please note the position of the first character of this value could change as the digits of the other numbers before (5.10 & 110.30) could very well change...
Also, is the output itself always uniform in format? And do you always only want the 3rd/last number?

Here are a couple of other solutions I thought of, assuming the above.

The first just loads the string into a variable, then strips off the unwanted parts, giving you the last number.

num=$( curl -k -silent https://localrouter/vnstat2/ | grep "This month" )
num=${num% [KMG]B*}

echo "$num"

The second runs it through a second grep to extract all "nn.nn" style number strings, and loads the results into an array. All the numbers are thus available to you, if you need them.


nums=( $( curl -k -silent https://localrouter/vnstat2/ | grep "This month" | grep -Eo '[0-9]+\.[0-9]+' ) )

echo "${nums[2]}"

Edit: Regarding the last question; probably the most practical thing you can do first is to become familiar with regular expressions. This will give you more flexibility with all sorts of tools. Regex is supported by more applications than you know.

Then learn the basics of both sed and awk. Each has it's own strengths and weaknesses. sed is line (actually stream) based, and can often more easily do substitutions, deletions, and regex pattern applications on individual lines and whole files. awk, on the other hand, is field-based, and is often easier to use when the text can be split into sections based on characters or patterns of characters. On the other hand, it's also a full scripting language capable of doing very complex text manipulations.

A lot of people forget that there are also a number of other, more specialized, tools available, like cut, tr, head, tail, paste, and fold. Many of these are faster and easier to use than sed and awk within their own areas of expertise.

And finally, there's the shell itself, which has many powerful string manipulation tools, like the parameter expansion and arrays I used above.

grail 09-13-2011 10:37 AM

Just in case you want a ruby solution too:

curl -k -silent https://localrouter/vnstat2/ | ruby -ne 'puts $_.scan(/.*This month.*>([\d.]+)/)[0][0]'

lpallard 10-01-2011 05:13 PM

All these solutions were helpful! At least Ive learned some more!

Now I'd like to write a script to do the following tasks:
  • Search a specific folder for sub-folders that contains certain strings;
  • Rename these folders a certain way (by removing some stuff and reorganizing the content of the file name);
  • Enter the sub-folder and rename a specific file inside (there should be only one file per sub folder) the same way as its parent folder;
  • If need be, delete all other files from the subfolder;
  • Move the renamed file to a certain location;
  • Delete the subfolder...

OK an example:




So for this example, subfolder was renamed with removal of "sub1" & "hello3", and the string "hello" was placed in front of "just-a-subfolder"

Then the text file was renamed exactly as its parent folder i.e. "just-a-subfolder-hello" while conserving its extension.

Also all other files except the one we just renamed were deleted. Finally, "just-a-subfolder-hello.txt" will be moved to another location on the system, and folder "./test/just-a-subfolder-hello" will be deleted. Not the ./test folder!

Anybody has a suggestion for me? I kinda played around trying to write a script, but I have problems playing with recursive operations... I'd normally try again but this time I am in a rush. I prefer bash because it does not require anything exotic but if a perl, ruby or any other language is better, please do not hesitate!


grail 10-02-2011 03:45 AM

Well the first suggestion is, what have you tried and where are you stuck?
If the following is correct:

All these solutions were helpful! At least Ive learned some more!
Then you need to demonstrate what you have learned. The idea is not for others to do all the work for you.

lpallard 10-08-2011 12:14 PM

OK sorry about the long delay in replying, I had to drop this for a few days but I just returned and had a chance to play a bit more with this...

The task is rapidly overgrowing my capacity to code... There is too many scenarios with folder naming. I need to keep learning cause I'm pretty bad :(

So far I adopted the baby-steps approach. Starting with a handful of folders each containing a file, I wrote a script to recursively enter each folder whose name contain a certain string, then do something in that folder. For the purpose of the first trial, I decided the script would create a new sub-folder in the folders matching the string search. It works.

Now the problem I am facing is to deal with folders that would be named pretty randomly. In the example I described at post 9 above, the folder in the example was named "sub1-hello-just-a-subfolder-hello3" but in real life, there is no guaranteed pattern for the folder nam, just guarantee that the name will contain certain strings. The order is not known and there could be more strings or less strings in the folder name. For example, "sub1" could be at beginning or end or somewhere else in the filename, there could be no "hello" and very likely spaces or other stupid characters in the filename... These folders are created by windows users... They use all kind of characters and sometimes more characters than enough... For the initial search of the folders, this should not pose any problems as even if folders were named like this:


tretretretre_8989789++_  sub1 -hello-just_a_subfolder HELLO! efdsfdsf hello...3
searching for the string "sub1" would still return the folder in the results. Its renaming the files based on the folder's name that pose a problem. Instead of starting with the untouched folder name and removing strings after strings until I get something clean like "sub1-hello3" I think it would be better to remove everything EXCEPT certain strings.

That would mean from:


tretretretre_8989789++_  sub1 -hello-just_a_subfolder HELLO! efdsfdsf hello...3
removing everything except "sub1" & "hello3" to get:


sub1 hello3
then use the result to rename the file. It however would require adding spaces between the strings so I dont get "sub1hello3" but "sub1 hello3" instead.

My script so far, very primitive.




cd /home/lpallard/test

find . -type d | grep sub1 | while read d
        d=$(echo $d | sed 's/^..//')
        cd "$d"
        find . -type f | grep .txt | while read f
                d=$(sed -e '/String1toremove/d' -e '/String2toremove/d' -e '/String3toremove/d' $d)
                mv $f $d
        cd ..

Booster please ? :)
Thanks guys!

lpallard 10-08-2011 12:22 PM

Looking at the real deal here (the actual folders & files), I believe it is simpler than I thought.

The current folders are more or less named like this:


StringA ### #### #### RandomStringA StringB RandomStringB
What I want is to
-Keep StringA
-Keep the # (representing numbers 0-9) but add a dash (-) in between them (so from 512 6654 7878 to 512-6654-7878)
-Keep RandomStringA
-Delete StringB
-Delete RandomStringB
-Add spaces between resulting strings

so the file would be renamed


StringA ###-####-#### RandomStringA.txt
I will keep trying more stuff. I hope this post will clarify a bit.

grail 10-09-2011 02:25 AM

So I am struggling to understand where you are going with this :(

Post #11 would be easily solved as you seem to already know what you want to rename the file / folder to so no need to extract anything just use what you know.

As for post #12, if we assume that RandomStringA is unknown and there are only 2 spaces prior to it you can use parameter substitution to remove the last
2 strings. Then you probably need something like sed to insert the dashes between the numbers.

lpallard 10-09-2011 08:02 AM

OK I added post 12 because I thought #11 was confusing but it may have had the opposite effect... If you got the idea on post 11, then can we proceed from there?

Lets adopt the baby steps so I can get the point.

At this point what I *think* I have to do is to remove certain strings (the garbage identified as RandomStringA & B) and reorganize the other portions of the filename.

Lets start with step 1: I tried to use sed to collect the numerals (##). It works but I got only so far as extracting the last X digits when the numbers are either in front of the whole string or at the end...

Like "Hello 1978 1986" or "4521 2352 Hello".

This did not prove too useful at first because I am not verifying the existence of the string but extracting from it (if it exists). What I need is something that will search for the existence of a pattern. Goggling for this did not prove too successful.

So in my case, I need to search for the existence of a pattern of "[0-9][0-9][0-9] [0-9][0-9][0-9][0-9] [0-9][0-9][0-9][0-9]" if it exists, insert dashes instead of whitespaces and append to StringA. TO extract string A I can use sed to collect the X first characters of the filename, or similarly to the numeral search, search for a specific keyword.

Am I confusing you?

grail 10-09-2011 10:08 AM

Assuming I do understand (could be a big if), let us use this example and see if we are on the same page:

# we store folder name in variable x
x='StringA 123 4567 4568 RandomStringA StringB RandomStringB'

# We need the first string
first=${x%% *}

# We want all the digits (we assume here there are none elsewhere with the same pattern)
digits=$(echo $x | sed -rn 's/[^ ]* ([0-9]{3}) ([0-9]{4}) ([0-9]{4}).*/\1-\2-\3/p')

Throw in some echoes for checking and let me know if we are on the right page?

All times are GMT -5. The time now is 11:44 PM.