LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 09-12-2011, 06:36 PM   #1
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Rep: Reputation: 44
Help for simple bash script - searching strings


OK I've tried this one for too long and Im just beginning in Bash scripting and I need help...

I have this application running on my router which monitor and lists the data bandwidth in/out of my WAN connection. This application has its own web page that I can access to see the amount of data that was uploaded & downloaded.

Its a simple PHP page. Nothing fancy. Just text.

What I want to do is to program a simple bash script that I can run manually (or via cron every hour for example) and extract the value that corresponds to my bandwidth.

To do so, I first tried with curl to "download" the content of the page and using grep I can list the line where the value I am searching for is located. The problem is that I dont know how to extract the value from that line of text.

So more info:

I am using this command to get the page content and extract the line of interest:

Code:
curl -k -silent https://localrouter/vnstat2/ | grep "This month"
The result would be something like:

Code:
<tr><td class="label_even">This month</td><td class="numeric_even">5.10 GB</td><td class="numeric_even">110.30 MB</td><td class="numeric_even">5.21 GB</td></tr>
I highlighted the value of interest in bold (5.21 GB). How do I extract this value? The "GB" is not necessary. Please note the position of the first character of this value could change as the digits of the other numbers before (5.10 & 110.30) could very well change... The number of digits of the value itself can also change. I have no control over this... The PHP script does it.

Any bash, sed, awk, or whatever guru's out there?

Thanks!

Last edited by lpallard; 09-12-2011 at 06:39 PM.
 
Old 09-12-2011, 08:39 PM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
Here's one way.
Code:
curl -k -silent https://localrouter/vnstat2/ | grep "This month" | sed -e 's/^.*numeric_even">//;s/<.*$//'
 
1 members found this post helpful.
Old 09-12-2011, 09:31 PM   #3
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Original Poster
Rep: Reputation: 44
That's perfect! Just before you posted I was trying to achieve the same but my command was wayyy longer and did not even produce the good result...

Thanks a lot!

I'm gonna keep the thread open for a little while because I'm not done with the script and I might need help later on...

Thanks again!
 
Old 09-12-2011, 11:20 PM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,428

Rep: Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877
And awk:
Code:
curl -k -silent https://localrouter/vnstat2/ | awk -F"[<>]*" '/This month/{print $(NF-2)}'

Last edited by grail; 09-13-2011 at 06:50 PM.
 
1 members found this post helpful.
Old 09-13-2011, 06:01 AM   #5
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Original Poster
Rep: Reputation: 44
When it's time to process text streams like this, which applications are best suited generally? Sed or awk? I understand that if I had to modify strings like replacing expressions , removing characters, etc sed a stream editor would be the best...

Do you guys have a good reference for learning these tools?
 
Old 09-13-2011, 07:51 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,428

Rep: Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877
It is not generally a good idea to say always this tool or that but rather the best suited for the job or some times the one you are most adept with.

As for references :-

http://www.gnu.org/software/gawk/man...ode/index.html
http://www.grymoire.com/Unix/Sed.html
 
1 members found this post helpful.
Old 09-13-2011, 09:37 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
I'm afraid I don't understand what this means:
Quote:
Please note the position of the first character of this value could change as the digits of the other numbers before (5.10 & 110.30) could very well change...
Also, is the output itself always uniform in format? And do you always only want the 3rd/last number?

Here are a couple of other solutions I thought of, assuming the above.

The first just loads the string into a variable, then strips off the unwanted parts, giving you the last number.
Code:
num=$( curl -k -silent https://localrouter/vnstat2/ | grep "This month" )
num=${num% [KMG]B*}
num=${num##*>}

echo "$num"
The second runs it through a second grep to extract all "nn.nn" style number strings, and loads the results into an array. All the numbers are thus available to you, if you need them.

Code:
nums=( $( curl -k -silent https://localrouter/vnstat2/ | grep "This month" | grep -Eo '[0-9]+\.[0-9]+' ) )

echo "${nums[2]}"
Edit: Regarding the last question; probably the most practical thing you can do first is to become familiar with regular expressions. This will give you more flexibility with all sorts of tools. Regex is supported by more applications than you know.

Then learn the basics of both sed and awk. Each has it's own strengths and weaknesses. sed is line (actually stream) based, and can often more easily do substitutions, deletions, and regex pattern applications on individual lines and whole files. awk, on the other hand, is field-based, and is often easier to use when the text can be split into sections based on characters or patterns of characters. On the other hand, it's also a full scripting language capable of doing very complex text manipulations.

A lot of people forget that there are also a number of other, more specialized, tools available, like cut, tr, head, tail, paste, and fold. Many of these are faster and easier to use than sed and awk within their own areas of expertise.

And finally, there's the shell itself, which has many powerful string manipulation tools, like the parameter expansion and arrays I used above.

Last edited by David the H.; 09-13-2011 at 10:00 AM. Reason: as stated.
 
1 members found this post helpful.
Old 09-13-2011, 10:37 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,428

Rep: Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877
Just in case you want a ruby solution too:
Code:
curl -k -silent https://localrouter/vnstat2/ | ruby -ne 'puts $_.scan(/.*This month.*>([\d.]+)/)[0][0]'

Last edited by grail; 09-13-2011 at 06:50 PM.
 
1 members found this post helpful.
Old 10-01-2011, 05:13 PM   #9
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Original Poster
Rep: Reputation: 44
All these solutions were helpful! At least Ive learned some more!

Now I'd like to write a script to do the following tasks:
  • Search a specific folder for sub-folders that contains certain strings;
  • Rename these folders a certain way (by removing some stuff and reorganizing the content of the file name);
  • Enter the sub-folder and rename a specific file inside (there should be only one file per sub folder) the same way as its parent folder;
  • If need be, delete all other files from the subfolder;
  • Move the renamed file to a certain location;
  • Delete the subfolder...

OK an example:

./test/
----|
----./sub1-hello-just-a-subfolder-hello3
-------|
-------file1.txt
-------junk.dat
-------junk.src
-------junk.cpp

to

./test/
---|
---./just-a-subfolder-hello
-------|
-------just-a-subfolder-hello.txt

So for this example, subfolder was renamed with removal of "sub1" & "hello3", and the string "hello" was placed in front of "just-a-subfolder"

Then the text file was renamed exactly as its parent folder i.e. "just-a-subfolder-hello" while conserving its extension.

Also all other files except the one we just renamed were deleted. Finally, "just-a-subfolder-hello.txt" will be moved to another location on the system, and folder "./test/just-a-subfolder-hello" will be deleted. Not the ./test folder!


Anybody has a suggestion for me? I kinda played around trying to write a script, but I have problems playing with recursive operations... I'd normally try again but this time I am in a rush. I prefer bash because it does not require anything exotic but if a perl, ruby or any other language is better, please do not hesitate!

Thanks!

Last edited by lpallard; 10-01-2011 at 05:14 PM.
 
Old 10-02-2011, 03:45 AM   #10
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,428

Rep: Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877
Well the first suggestion is, what have you tried and where are you stuck?
If the following is correct:
Quote:
All these solutions were helpful! At least Ive learned some more!
Then you need to demonstrate what you have learned. The idea is not for others to do all the work for you.
 
Old 10-08-2011, 12:14 PM   #11
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Original Poster
Rep: Reputation: 44
OK sorry about the long delay in replying, I had to drop this for a few days but I just returned and had a chance to play a bit more with this...

The task is rapidly overgrowing my capacity to code... There is too many scenarios with folder naming. I need to keep learning cause I'm pretty bad

So far I adopted the baby-steps approach. Starting with a handful of folders each containing a file, I wrote a script to recursively enter each folder whose name contain a certain string, then do something in that folder. For the purpose of the first trial, I decided the script would create a new sub-folder in the folders matching the string search. It works.

Now the problem I am facing is to deal with folders that would be named pretty randomly. In the example I described at post 9 above, the folder in the example was named "sub1-hello-just-a-subfolder-hello3" but in real life, there is no guaranteed pattern for the folder nam, just guarantee that the name will contain certain strings. The order is not known and there could be more strings or less strings in the folder name. For example, "sub1" could be at beginning or end or somewhere else in the filename, there could be no "hello" and very likely spaces or other stupid characters in the filename... These folders are created by windows users... They use all kind of characters and sometimes more characters than enough... For the initial search of the folders, this should not pose any problems as even if folders were named like this:

Code:
tretretretre_8989789++_  sub1 -hello-just_a_subfolder HELLO! efdsfdsf hello...3
searching for the string "sub1" would still return the folder in the results. Its renaming the files based on the folder's name that pose a problem. Instead of starting with the untouched folder name and removing strings after strings until I get something clean like "sub1-hello3" I think it would be better to remove everything EXCEPT certain strings.

That would mean from:

Code:
tretretretre_8989789++_  sub1 -hello-just_a_subfolder HELLO! efdsfdsf hello...3
removing everything except "sub1" & "hello3" to get:

Code:
sub1 hello3
then use the result to rename the file. It however would require adding spaces between the strings so I dont get "sub1hello3" but "sub1 hello3" instead.

My script so far, very primitive.

Code:
#!/bin/bash

clear

cd /home/lpallard/test

find . -type d | grep sub1 | while read d
do
	d=$(echo $d | sed 's/^..//')
	cd "$d"
	find . -type f | grep .txt | while read f
	do
		d=$(sed -e '/String1toremove/d' -e '/String2toremove/d' -e '/String3toremove/d' $d)
		mv $f $d
	cd ..
done
Booster please ?
Thanks guys!

Last edited by lpallard; 10-08-2011 at 12:26 PM.
 
Old 10-08-2011, 12:22 PM   #12
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Original Poster
Rep: Reputation: 44
Looking at the real deal here (the actual folders & files), I believe it is simpler than I thought.

The current folders are more or less named like this:

Code:
StringA ### #### #### RandomStringA StringB RandomStringB
What I want is to
-Keep StringA
-Keep the # (representing numbers 0-9) but add a dash (-) in between them (so from 512 6654 7878 to 512-6654-7878)
-Keep RandomStringA
-Delete StringB
-Delete RandomStringB
-Add spaces between resulting strings

so the file would be renamed

Code:
StringA ###-####-#### RandomStringA.txt
I will keep trying more stuff. I hope this post will clarify a bit.
 
Old 10-09-2011, 02:25 AM   #13
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,428

Rep: Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877
So I am struggling to understand where you are going with this

Post #11 would be easily solved as you seem to already know what you want to rename the file / folder to so no need to extract anything just use what you know.

As for post #12, if we assume that RandomStringA is unknown and there are only 2 spaces prior to it you can use parameter substitution to remove the last
2 strings. Then you probably need something like sed to insert the dashes between the numbers.
 
Old 10-09-2011, 08:02 AM   #14
lpallard
Member
 
Registered: Nov 2008
Location: Milky Way
Distribution: Slackware (various releases)
Posts: 970

Original Poster
Rep: Reputation: 44
OK I added post 12 because I thought #11 was confusing but it may have had the opposite effect... If you got the idea on post 11, then can we proceed from there?

Lets adopt the baby steps so I can get the point.

At this point what I *think* I have to do is to remove certain strings (the garbage identified as RandomStringA & B) and reorganize the other portions of the filename.

Lets start with step 1: I tried to use sed to collect the numerals (##). It works but I got only so far as extracting the last X digits when the numbers are either in front of the whole string or at the end...

Like "Hello 1978 1986" or "4521 2352 Hello".

This did not prove too useful at first because I am not verifying the existence of the string but extracting from it (if it exists). What I need is something that will search for the existence of a pattern. Goggling for this did not prove too successful.

So in my case, I need to search for the existence of a pattern of "[0-9][0-9][0-9] [0-9][0-9][0-9][0-9] [0-9][0-9][0-9][0-9]" if it exists, insert dashes instead of whitespaces and append to StringA. TO extract string A I can use sed to collect the X first characters of the filename, or similarly to the numeral search, search for a specific keyword.

Am I confusing you?

Last edited by lpallard; 10-09-2011 at 08:12 AM.
 
Old 10-09-2011, 10:08 AM   #15
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,428

Rep: Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877Reputation: 1877
Assuming I do understand (could be a big if), let us use this example and see if we are on the same page:
Code:
# we store folder name in variable x
x='StringA 123 4567 4568 RandomStringA StringB RandomStringB'

# We need the first string
first=${x%% *}

# We want all the digits (we assume here there are none elsewhere with the same pattern)
digits=$(echo $x | sed -rn 's/[^ ]* ([0-9]{3}) ([0-9]{4}) ([0-9]{4}).*/\1-\2-\3/p')
Throw in some echoes for checking and let me know if we are on the right page?
 
1 members found this post helpful.
  


Reply

Tags
awk


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Searching and replacing strings in a file with strings in other files xndd Linux - Newbie 16 07-29-2010 02:40 PM
Bash - Searching strings for array elements... Phier Programming 18 05-09-2010 04:37 AM
How to parse strings in bash script crimson08 Linux - Newbie 17 05-11-2009 11:29 AM
Want to compare strings in bash script IsharaComix Programming 6 10-28-2008 08:49 PM
bash script help (arrays and strings from files) nkoplm Programming 14 12-02-2005 09:50 AM


All times are GMT -5. The time now is 02:04 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration