LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   bash: how to discard unwanted stdin? (https://www.linuxquestions.org/questions/programming-9/bash-how-to-discard-unwanted-stdin-815614/)

catkin 06-22-2010 04:08 AM

bash: how to discard unwanted stdin?
 
Hello :)

Often in bash we read lines from stdin in a loop and implicitly discard the remaining stdin by terminating the loop. Is it possible to discard it without terminating the loop? It could lead to smaller code.

Here's an example which uses two loops and below is the same algorithm assuming unwanted stdin can be discarded
Code:

found=
while read destination gateway _
do
    [[ $destination = default ]] && found=yes && break
done <<< "$( route )"

if [[ $found ]]; then
    found=
    ping -c 1 -q $gateway >/dev/null # populate ARP cache
    while read address _ hw_address _
    do
        [[ $address = $gateway ]]  && found=yes && break
    done <<< "$( arp )"
fi

Code:

found=
while read destination gateway _
do
    if [[ $destination = default ]]; then
        ping -c 1 -q $gateway >/dev/null # populate ARP cache
        <discard unwanted stdin>
        while read address _ hw_address _
        do
            [[ $address = $gateway ]] && found=yes && break
        done <<< "$( arp )"
    fi
done <<< "$( route )"


konsolebox 06-22-2010 04:23 AM

Quote:

Originally Posted by catkin (Post 4011126)
Code:

found=
while read destination gateway _
do
    if [[ $destination = default ]]; then
        ping -c 1 -q $gateway >/dev/null # populate ARP cache
        <discard unwanted stdin>
        while read address _ hw_address _
        do
            [[ $address = $gateway ]] && found=yes && break
        done <<< "$( arp )"
    fi
done <<< "$( route )"


If I get what you meant, perhaps that can be done this way:
Code:

while read destination gateway _
do
    if [[ $destination = default ]]; then
        ping -c 1 -q $gateway >/dev/null # populate ARP cache

        while read something; do continue; done  # add some conditions here.  there's a trick if you need the last input to the following loop

        while read address _ hw_address _
        do
            [[ $address = $gateway ]] && found=yes && break
        done <<< "$( arp )"
    fi
done <<< "$( route )"

or the other way around
Code:

while read -u 3 destination gateway _
do
    if [[ $destination = default ]]; then
        ping -c 1 -q $gateway >/dev/null # populate ARP cache

        while read -u 4 address _ hw_address _
        do
            [[ $address = $gateway ]] && found=yes && break
        done 4< <(exec arp)
    fi
done 3< <(exec route)

What input from route are you trying to discard by the way?

catkin 06-22-2010 04:39 AM

Thanks konsolebox, the second method using unit/fd numbers is elegant :)

It works without the "exec"s in the process substitution subshells too, as in done 4< <(arp) instead of done 4< <(exec arp)

EDIT: I'm only interested in the default gateway line from route output; once that is found the rest is irrelevant.

konsolebox 06-22-2010 04:50 AM

Quote:

Originally Posted by catkin (Post 4011149)
It works without the "exec"s in the process substitution subshells too, as in done 4< <(arp) instead of done 4< <(exec arp)

It's really just optional. I just prefer not to summon another subprocess.
Quote:

EDIT: I'm only interested in the default gateway line from route output; once that is found the rest is irrelevant.
You might also find this pattern helpful (just in case):
Code:

while read LINE; do
    if [[ <expression to find line> ]]; then
        while
            < run the statements here that will process LINE >
            read LINE
        do
            continue  # or just : if you like
        done
    fi
done


catkin 06-22-2010 09:58 AM

Quote:

Originally Posted by konsolebox (Post 4011155)
It's really just optional. I just prefer not to summon another subprocess.

Ah, I get you; with the bare (arp) a shell will be forked into the subshell and will then fork+exec to run arp, whereas using (exec arp), a shell will be forked into the subshell and will then exec arp thus saving a fork call. One more word in the code, one less (relatively expensive) system call in the execution. Nice.

Quote:

Originally Posted by konsolebox (Post 4011155)
You might also find this pattern helpful

Very helpful -- experimenting with it I found my OP was solving a problem that didn't exist; bash is smart enough to save and restore "here strings" :)

First the experimental/demonstration script:
Code:

#! /bin/bash

# How do nested read while loops behave with "here string"s?

while read LINE
do
    echo "Outer: LINE is '$LINE'"
    if [[ $LINE = 2 ]]; then
        while read LINE
        do
            echo "Inner: LINE is '$LINE'"
            [[ $LINE = B ]] && break
        done <<< "$( printf 'A\nB\nC\n' )"
    fi
done <<< "$( printf '1\n2\n3\n' )"

The output is
Code:

Outer: LINE is '1'
Outer: LINE is '2'
Inner: LINE is 'A'
Inner: LINE is 'B'
Outer: LINE is '3'

Thus showing that bash saved the outer "here string" when it started the inner loop and restored it when it exited the inner loop.

Taking advantage of this feature, a better version of the get_default_gateway_MAC.sh script (better = simpler and does not need process substitution so more portable):
Code:

found=
while read destination gateway _
do
    if [[ $destination = default ]]; then
        ping -c 1 -q $gateway >/dev/null    # Populate ARP cache
        while read address _ hw_address _
        do
            [[ $address = $gateway ]] && found=yes && break 2
        done <<< "$( exec arp )"
    fi
done <<< "$( exec route )"


konsolebox 06-22-2010 11:58 PM

Quote:

Originally Posted by catkin (Post 4011466)
Code:

<<< "$( exec route )"

With larger outputs we should also be careful with this method since bash will most probably (as I've found out with other shells as well) allocate all the output of route first then store it in memory. It always depends on the implementation but it doesn't hurt to be more careful.

grail 06-23-2010 02:39 AM

So try not to smack me about too much, but as opposed to the question asked about discarding stdin I thought I would just give an alternative for the script:
Code:

def=$(route | awk '/default/{print $2}')

ping -c 1 -q $def >/dev/null

arp | awk -v chk=$def '$1 == chk{print "found"}'


catkin 06-23-2010 07:55 AM

Quote:

Originally Posted by konsolebox (Post 4012196)
With larger outputs we should also be careful with this method since bash will most probably (as I've found out with other shells as well) allocate all the output of route first then store it in memory. It always depends on the implementation but it doesn't hurt to be more careful.

Thanks for the perspective -- so better to use process substitution when the command output is large.

catkin 06-23-2010 08:21 AM

Quote:

Originally Posted by grail (Post 4012325)
So try not to smack me about too much, but as opposed to the question asked about discarding stdin I thought I would just give an alternative for the script:
Code:

def=$(route | awk '/default/{print $2}')

ping -c 1 -q $def >/dev/null

arp | awk -v chk=$def '$1 == chk{print "found"}'


Hello grail, nice to see you passing by the thread and thanks for the alternative :)

Is that alternative functionally equivalent, though? The intention is to get the default gateway's hardware address, not its IP address. Does it handle the case where there is no "default" in the routing table? Does it set $found to indicate the validity of the address?

The default gateway's hardware address will be used by a netbook's boot script as a "good enough" way of identifying the LAN so it can do LAN-specific initialisation. Being boot script code, performance is significant so the sweet spot between coding complexity and the fork+exec count is more towards the coding side.

Your alternative is tighter and easier to maintain than mine (natch!) but will take more resources to run, which matters when booting an Atom-based system!

grail 06-23-2010 09:12 AM

hmmmm ... I am guessing there must be a lot more to the code that I have not seen as my output is the same, except I have output the word found instead
of yes and not attached to a variable which I know you would have no trouble in doing.

I would be curious, and I will premiss this by saying I am no guru, about using more resources in mine than yours.
Are we saying that the use of awk is a bigger hit than the 2 while loops being iterated over?

And in answer to your questions, again from the original script presented:
Quote:

The intention is to get the default gateway's hardware address, not its IP address.
I see the hardware address being obtained but not used, hence I only returned the true statement 'found'
Quote:

Does it handle the case where there is no "default" in the routing table?
To be honest, in its current state there is obviously no control structure, but again, a simple if can handle this.
Also I would point out that if your script hits a route with no 'default' it will exit and do nothing.
Quote:

Does it set $found to indicate the validity of the address?
Again back to my original statement that supplying the last line as an assignment to a variable, or better yet a control, perhaps 'if', can take care of all that.

I guess my initial point was that maybe instead of discarding values we can maybe be smarter about what values we are getting?

I hope I am being constructive as I always value your opinions :) and very happy to learn

cheers
grail

catkin 06-24-2010 12:09 PM

Quote:

Originally Posted by grail (Post 4012652)
hmmmm ... I am guessing there must be a lot more to the code that I have not seen as my output is the same, except I have output the word found instead
of yes and not attached to a variable which I know you would have no trouble in doing.

Here it is in context
Code:

    # Get the default gateway hardware address
    found=
    while read destination gateway _
    do
        if [[ $destination = default ]]; then
            ping -c 1 -q $gateway >/dev/null # populate ARP cache
            while read address _ hw_address _
            do
                [[ $address = $gateway ]] && found=yes && break
            done <<< "$( exec arp )"
        fi
    done <<< "$( exec route )"

    # LAN-specific actions, based on identification by default gateway hardware address
    if [[ $found ]]; then
        case $hw_address in
            '<some MAC address, not published on the Internet!>' )
                echo 'Mounting "home" LAN networked file systems'
                <some network file system mounts>
                sleep 1
                # Start the OpenSSH SSH daemon
                if [ -x /etc/rc.d/rc.sshd ]; then
                    echo 'Starting OpenSSH SSH daemon'
                    /etc/rc.d/rc.sshd start
                fi
                ;;
            * )
                echo "Gateway HW address $hw_address not configured in rc.local"
        esac
    else
        echo 'Gateway HW address not found' >&2
    fi

EDIT: the above code fails if the default gateway IP address is resolved to a name that is longer than the column width provided by the route command. The following modification solves this problem by not resolving names
Code:

    # Get the default gateway hardware address
    found=
    while read destination gateway _
    do
        if [[ $destination = '0.0.0.0' ]]; then
            ping -c 1 -q $gateway >/dev/null # populate ARP cache
            while read address _ hw_address _
            do
                [[ $address = $gateway ]] && found=yes && break
            done <<< "$( exec arp -n )"
        fi
    done <<< "$( exec route -n )"


Quote:

Originally Posted by grail (Post 4012652)
I would be curious, and I will premiss this by saying I am no guru, about using more resources in mine than yours.
Are we saying that the use of awk is a bigger hit than the 2 while loops being iterated over?

Yes :)

fork+exec is "expensive" and bash runs fast when processing small (say < 5 kB) strings. To be sure (and to get a feel for where awk's faster processing of large amounts of text begins to outweigh the fork+exec cost) testing would be necessary. The ping would have to be removed (very slow compared to the rest of the code) and to be measurable the code would have to be put in a loop; in the case of this code I don't think there are any buffering effects that would affect the results (as there would be when doing file I/O).

Quote:

Originally Posted by grail (Post 4012652)
And in answer to your questions, again from the original script presented:

I see the hardware address being obtained but not used, hence I only returned the true statement 'found'

To be honest, in its current state there is obviously no control structure, but again, a simple if can handle this.
Also I would point out that if your script hits a route with no 'default' it will exit and do nothing.

Again back to my original statement that supplying the last line as an assignment to a variable, or better yet a control, perhaps 'if', can take care of all that.

All true! You ave me bang to rights guvnor, an no mistake!

Doing nothing when the routing table has no default route is intended -- this is a "convenience" facility so it's not worth analysing rc.inet1.conf to see if there should be a default route or re-running DHCP client as might be appropriate on a mission-critical, robust system.

Quote:

Originally Posted by grail (Post 4012652)
I guess my initial point was that maybe instead of discarding values we can maybe be smarter about what values we are getting?

I hope I am being constructive as I always value your opinions :) and very happy to learn

If there was a lot of data to search then I would go with being smarter about getting the values but for the small amount of data output by the route command (on a personal netbook) I believe that trawling through it line by line until getting what is needed is faster without requiring overly complex code.

Always a pleasure to debate with you; debate is a great way to learn different ways, to see other perspectives, to increase the breadth and depth of understanding and thus become better at what we do. Just so long as you remember that I'm always right! :D :D :D

grail 06-24-2010 08:51 PM

I was talking to a network guy here at work (i am still learning) and he has said that to the best of his knowledge the 'default' will never not be in the routing table
(just passing on what was said :) )

So to this end, this might help for later testing, this is the same with the changes:
Code:

def=$(route | awk '/default/{print $2}')

ping -c 1 -q $def >/dev/null

case $(arp | awk -v chk=$def '$1 == chk{print $3}') in # I assume we are using case as there may be other addresses later??
    '<some MAC address, not published on the Internet!>' )
        echo 'Mounting "home" LAN networked file systems'
        <some network file system mounts>
        sleep 1
        # Start the OpenSSH SSH daemon
        if [ -x /etc/rc.d/rc.sshd ]; then
            echo 'Starting OpenSSH SSH daemon'
            /etc/rc.d/rc.sshd start
        fi
        ;;
    * )
        echo "Gateway HW address $hw_address not configured in rc.local"
esac

Assuming my friend here is correct your last echo would never get executed, so I have left it off.

Quote:

Just so long as you remember that I'm always right!
For now ... lol

catkin 06-24-2010 09:30 PM

Quote:

Originally Posted by grail (Post 4014241)
I was talking to a network guy here at work (i am still learning) and he has said that to the best of his knowledge the 'default' will never not be in the routing table
(just passing on what was said :) )

That is not correct as shown by this terminal session
Code:

root@CW8:~# route
Kernel IP routing table
Destination    Gateway        Genmask        Flags Metric Ref    Use Iface
localnet        *              255.255.255.0  U    0      0        0 eth0
loopback        *              255.0.0.0      U    0      0        0 lo
default        192.168.168.1  0.0.0.0        UG    1      0        0 eth0
root@CW8:~# route del default
root@CW8:~# route
Kernel IP routing table
Destination    Gateway        Genmask        Flags Metric Ref    Use Iface
localnet        *              255.255.255.0  U    0      0        0 eth0
loopback        *              255.0.0.0      U    0      0        0 lo

That is slightly artificial but could happen by accident. More realistically, we have had several examples here on LQ in which questioner's troubles were traced to a missing default gateway.

Regards the case statement -- yes, more cases are envisaged.


All times are GMT -5. The time now is 06:31 AM.