[SOLVED] Loop , use sed to remove and update loop final count

pedropt · 10-14-2019, 11:08 AM

Looks complicated but it is not .
Here it is the thing , i want to search and remove directly to the file using sed , but the loop sequence will be updated after sed deleted some lines .

Imagining that loop starts inicialy a 1 and supose to end at 100 , but then some pattern was found and sed have to delete 10 lines , then loop will end at new variable witch is 100 -10 = 90 , by this the initial variable in loop is updated , but this happens consequently in this loop because sed is removing if found what i am searching .

Now the problem is that i stay in a endless loop without change because sed is not replacing the original file with the new updates .

I already had created before a similar thread , but this one a very different .

The Code

Code:

# count the lines of file to loop
cntfr=$(wc -l "$cmlog" | awk '{print$1}')

# start sequence until variable cntfr

for i in $(seq "$cntfr")
do

# read the line and grab the ip 
# ip will be xxx.xxx.xxx.xxx:port , so this next code
# will remove everything after the : and will get only 
# the cleaned ip

var1=$(sed -n ${i}p < $cmlog | awk '{print$11}' | cut -f1 -d":")

# this one i learned here , witch is put something like
# ex: 111.111.111.111 to 111.111.111.0/24
# this will be to see if subnet exists on a file ahead

ip2="${var1%.*}.0/24"

# fireips is a file containing all the blocked ips and 
# subnets in the firewall
# search for the subnet in fireips file
cksb=$(grep "$ip2" < "$path/fireips")

# case something was found
if [[ ! -z "$cksb" ]]
then

# remove with sed lines containing that IP from cmlog
sed -i -e '/"$var1"/d' $cmlog 

# inform user that ip subnet was found and was deleted

echo " - Existent Subnet : $ip2  for IP : $var1 - Cleaned"

fi

# objective of this next line is to check if the file #changed still have the initial number of lines , in case
#does not then update its value in the loop above

cntfr=$(wc -l "$cmlog" | awk '{print$1}')
done

An example from cmlog can be found here
https://pastebin.com/apxg0DrB

What happens in this code is that grep finds a value but then sed ahead do not remove the lines directly in file .

There is no error in code that could give any clew .

pan64 · 10-14-2019, 11:25 AM

sed will not evaluate $var1 inside ' ' (single quotes). You have to use " " for that.

pedropt · 10-14-2019, 11:44 AM

yup , thank you .
Was that .
However i have another issue .
The problem is the value in the loop sequence .
If after sed finishes its job and after the count to get in the loop i get a number inferior than the $i then i get an error :

sed: -e expression #1, char 0: no previous regular expression

the problem here is not sed , the problem is that sed have nothing to find .
I think i have to set an if statement in the beginning of loop to check if "cntfr" is inferior to "i" then to set i at "cntfr" .

pedropt · 10-14-2019, 11:51 AM

Fixed .

Code:

cntfr=$(wc -l "$cmlog" | awk '{print$1}')
for i in $(seq "$cntfr")
do

if [[ "$cntfr" -le "$i" ]]
then
cntrf="$i"
else
var1=$(sed -n ${i}p < $cmlog | awk '{print$11}' | cut -f1 -d":")
ip2="${var1%.*}.0/24"
cksb=$(grep "$ip2" < "$path/fireips")
if [[ ! -z "$cksb" ]]
then
sed -i -e "/$var1/d" $cmlog 
echo " - Existent Subnet : $ip2  for IP : $var1 - Cleaned"
fi
cntfr=$(wc -l "$cmlog" | awk '{print$1}')
fi
done

scasey · 10-14-2019, 12:00 PM

This is mostly a style comment.
When doing a task like this, I will usually not use the -i option to sed, but rather direct the output to a new file. This would eliminate the problems with the line count changing while the loop is running.

The last line of the script would then mv the new file to the original file, replacing it. Just food for thought.

pan64 · 10-14-2019, 12:36 PM

additionally you can avoid pipe chains (like sed|awk|cut), usually they can be replaced by a single command, but that is probably not really important.

pedropt · 10-15-2019, 04:41 PM

yup , but i prefer this way because i am dealing with logs with 70M for a single day , and if i dont check 1 day and i take 2 or 3 days to check the logs then they will be huge .
Also because when you have a single string to remove then your way would be simple , but when you have more than 1000 ips to remove from a log because they are already blocked and then there is no need to see what they were doing , then the best approach and faster way for me is this one .

Firerat · 10-15-2019, 11:45 PM

Quote:

Originally Posted by pedropt

yup , but i prefer this way because i am dealing with logs with 70M for a single day , and if i dont check 1 day and i take 2 or 3 days to check the logs then they will be huge .
Also because when you have a single string to remove then your way would be simple , but when you have more than 1000 ips to remove from a log because they are already blocked and then there is no need to see what they were doing , then the best approach and faster way for me is this one .

but your script is slow

sed | awk | cut ... is just silly

replace the
wc -l, for loop , sed

with while read loop

replace the awk/cut with bash

replace testing the grabbed output of grep with testing the exit code of grep
( a match is exit 0

)

Code:

#!/bin/bash
# blind as I have no idea what the cmlog looks like,
# but I can guess from your awk ;)
while read -a cmlogline
do
    grep -q "${cmlogline[10]/\.+([0-9]):*/.0\/24}" "$path/fireips" \
        && printf "Existent Subnet : %s for IP : %s - Cleaned\n" \
            ${cmlogline[10]/\.+([0-9]):*/.0\/24} \
            ${cmlogline[10]%:*} \
        || cat - <<<"${cmlogline[@]}" >> "${newcmlog}"
done < $cmlog
mv "${newcmlog}" "$cmlog"

ok, probably not easy for you to read

this is a bit more steppy

Code:

#!/bin/bash
while read -a cmlogline
do
    foo2="${cmlogline[10]%:*}"
    foo1="${foo2%.*}.0/24"
    grep -q "$foo1" "$path/fireips" \
        && printf "Existent Subnet : %s for IP : %s - Cleaned\n" \
            ${foo1} \
            ${foo2} \
        || cat - <<<"${cmlogline[@]}" >> "${newcmlog}"
done < $cmlog
mv "${newcmlog}" "$cmlog"

binary conditionals can be messy, a traditional if then else is safer

Code:

#!/bin/bash
while read -a cmlogline
do
    foo2="${cmlogline[10]%:*}"
    foo1="${foo2%.*}.0/24"
    if ( grep -q "$foo1" "$path/fireips" )
    then
        printf "Existent Subnet : %s for IP : %s - Cleaned\n" ${foo1} ${foo2}
    else
        cat - <<<"${cmlogline[@]}" >> "${newcmlog}"
    fi
done < $cmlog
mv "${newcmlog}" "$cmlog"

now, if the majority of the lines 'Stay' it might be quicker to run a ( sed -i s/$var/d ) than catting each line to a new file

you could "save up" your removes and run a single sed at the end

Code:

#!/bin/bash
CheckLogs () {
removethese="" # zero length
while read -a cmlogline
do
    foo2="${cmlogline[10]%:*}"
    foo1="${foo2%.*}.0/24"
    if ( grep -q "$foo1" "$path/fireips" )
    then
        removethese+="${foo2}\|"
        # if you want to slow it down print stuff
    fi
done < $cmlog
return ${#removethese}
}
CleanLog () {
sed -i "/${removethese%\|}/d" "$cmlog"
# since removethese is a var should be ok with ARG_MAX
}

CheckLogs || CleanLog
# the count of removethese is used as return value of CheckLogs(),
# if it is none zero run CleanLog()

my moto, if you are going to use bash, use bash!

as a bonus

Code:

while read ip
do
    echo "${ip}"
    echo "${ip%.*}.0/24"
done < <(awk '{sub(/:.*/,"",$11);print $11}' "$cmlog")

this is probably not going to work

Code:

while read ip
do
    echo "${ip}"
    echo "${ip%.*}.0/24"
done < <(sed -E 's/.* ([0-9]{1,3}(\.[0-9]{1,3}){3}):[0-9]+.*/\1/' "$cmlog" )

Code:

while read ip
do
    echo "${ip%:*}"
    echo "${ip%.*}.0/24"
done < <(grep -Eo "([0-9]{1,3}(\.[0-9]{1,3}){3}):[0-9]+" "$cmlog" )

awk is probably the most useful, you could write the script in awk and do away with bash
but unless you are doing some serious number crunching, bash is going to be much easier.

pan64 · 10-16-2019, 09:04 AM

Quote:

Originally Posted by pedropt

yup , but i prefer this way because

What I would like to say is: try to simplify your script. I did not suggest you to modify the logic, just it looks a bit complicated for me. Firerat already suggested a few ideas you can apply. And we can give others if you are interested.
Anyway, it is your script and you do not need to modify it for me.

pedropt · 10-16-2019, 12:52 PM

Quote:

What I would like to say is: try to simplify your script. I did not suggest you to modify the logic, just it looks a bit complicated for me. Firerat already suggested a few ideas you can apply. And we can give others if you are interested.
Anyway, it is your script and you do not need to modify it for me.

I have to admit , you are right pan64 , it is great code indeed by firerat .

Interesting way to deal with the problem , and as i see it , it is more faster .
I will implement it in my main script .

My baby , a tool that facilitates every network administrator .
I build this tool to check things remotely on server .
https://i.postimg.cc/zDQbrkXx/natm.jpg

1277 lines of code until now , and 2 or 3 options are not yet finished as i want them.
I write when i got some free time , and when i get doubts or hard stuff i come here .