LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-04-2018, 01:03 PM   #1
blason
Member
 
Registered: Feb 2016
Posts: 122

Rep: Reputation: Disabled
Need help on achieving logic in bash script


Hi there,

I am working on one script which I am unable to achieve and would need community help.

The below [should do] script does 2 things

I enter certain domains in $1
Then these domains are taken and parsed; done sanity check and entered in a temp file.
Then I already have one file /var/www/html/today.txt. So this tempfile get concatenated with today.txt and then uniq entries are added in final file.

This part is working fine

Now I am introducing one more function to avoid the double work. While entering the domains it will prompt for TAG. Now this TAG could be separate every time. Again same procedure will follow and new .yaml file should be created under /var/www/html/doms.yaml

While doing that I am stuck up at sorting out uniq entries and then "only those" newly added domains should get tagged per $tag and appended to /var/www/html/doms.yaml

like this.
"789.com": "VIRUS"
"ikm.net": "APT"
"itgb.net": "TROJAN"
"qaz.net": "PHISHING"
"ujm.com": "PHISHING"
"wsx.net": "PHISHING"

BUt I am getting this

"789.com": "PHISHING"
"ikm.net": "PHISHING"
"itgb.net": "PHISHING"
"qaz.net": "PHISHING"
"ujm.com": "PHISHING"
"wsx.net": "PHISHING"

One thing is "/opt/maldoms" will contain new domains or few domains may repated I want eventually an uniq entries in /var/www/html/doms.yaml but with different TAGS. Not sure where my logic is going wrong

Can someone pls help?


MY CURRENT SCRIPT
Quote:
#!/bin/bash
### ALL VARIABLES
CURRENT=`cat /var/www/html/today.txt | wc -l`
TODAYTEMP="/opt/todaytemp"
IN_TAGGING_DOMS="/opt/maldoms"
CP_TAGGING_DOMS="/opt/maldoms-ORIG"
OUT_TAGGING_DOMS="/opt/maldomOUT"
FINAL_TAG_DICT="/var/www/html/doms.yaml"
TODAYTAGDOMS="/opt/tempmaldom"

### VARIABLES ENDS HERE

#### TAGGING PROMPT
echo -n "Tag : "
read tag
##### TAGGING END

echo -e "Current Entries in database are \033[5;31;47m$CURRENT\033[0m"
cat $1 | grep -v -e '^$' | sed 's/http\:\/\///g' | sed 's/https\:\/\///g' | tr "/" " " | awk '{print $1}' | tr '[:upper:]' '[:lower:]' | sed -e 's/www\.//g' | sed -e 's/\*\.//g' | sed -e 's/\[//g' | sed -e 's/\]//g' | sed -e 's/\;1//g' | sed -e 's/\;419//g' | grep -v '|' | grep -v "hxxp:" | grep -v "hxxps:" | grep -v -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort | uniq > $TODAYTEMP

## TAGGING CODES
NEWRESP=`echo $tag | tr '[:lower:]' '[:upper:]'`
> $FINAL_TAG_DICT
cat $TODAYTEMP >> $CP_TAGGING_DOMS
cat $CP_TAGGING_DOMS | sed -e 's/\;1//g' | sed -e 's/\;419//g' | tr ":" " " | awk '{print $1}' | sort | uniq > $OUT_TAGGING_DOMS

cat $OUT_TAGGING_DOMS | while read line;do echo ""$line":" ""$NEWRESP"";done >> $FINAL_TAG_DICT
> $IN_TAGGING_DOMS
#### TAGGING CODES ENDS

cat /var/www/html/today.txt >> $TODAYTEMP
cat $TODAYTEMP | sed -e 's/\;1//g' | sed -e 's/\;419//g' | tr ":" " " | awk '{print $1}' | sort | uniq > /var/www/html/today.txt
echo "Processing..pls wait..."
sleep 2
echo "Published current URLs in Database..."
sleep 1
AFTERPUB=`cat /var/www/html/today.txt | wc -l`
echo -e "After Publishing Entries in database are \033[5;31;47m$AFTERPUB\033[0m"
TOT=$(expr $AFTERPUB - $CURRENT)
echo -e "Total \033[5;31;47m$TOT\033[0m New Entries Added in DB"
 
Old 08-04-2018, 02:24 PM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
There are several lines like this which could and should be boiled down to just AWK + sort:

Code:
cat $1 | grep -v -e '^$' | sed 's/http\:\/\///g' | sed 's/https\:\/\///g' | tr "/" " " | awk '{print $1}' | tr '[:upper:]' '[:lower:]' | sed -e 's/www\.//g' | sed -e 's/\*\.//g' | sed -e 's/\[//g' | sed -e 's/\]//g' | sed -e 's/\;1//g' | sed -e 's/\;419//g' | grep -v '|' | grep -v "hxxp:" | grep -v "hxxps:" | grep -v -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort | uniq > $TODAYTEMP
Can you provide some more specific input data samples from the file being fed into $1 please?

Code:
cat $1 | awk '/^http[s]*:/ || /^gopher:/ {print tolower($2)}' FS='[/]+' | sort -u > $TODAYTEMP
 
Old 08-04-2018, 11:17 PM   #3
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Hi there,

Thanks for the help.

Here is I what I am feeding into the file. There are just URLs domains. Those get parsed, done sanity checks, then uniq domains are extracted.
Code:
http ://doc-japan[.]com
http ://oksir[.]com:80/application/language_5_june/6siX/
http ://uai.projetosvp[/url][.]http://com.br/Lpncg 
4http ://www.51wh[.]top/ewV4
http ://macrospazio[.]it/oJl
http ://barocatch.com/uGXYU6
http ://barocatch.com/uGXYU6
http ://cm2.com.br/oS
http ://dfinformatica.com.br/site/wp-includes/images/crystal/gT
http ://doc-japan.com
http ://ekuvshinova.com/udfQrgHr
http ://experimental.co.za/BAlc
http ://feitosaefujita.adv.br/MVgPzBH
http ://frankbruk.pl/2c41pAl
http ://kamin-sauna.com.ua/whVeJ8l

Last edited by astrogeek; 08-05-2018 at 01:20 PM. Reason: Noparse, code tags on links, broken with spaces
 
Old 08-05-2018, 12:12 AM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Ok. Then I'd guess that the first awk would be thus:

Code:
cat $1 | awk '$1~/http[s]*:/ {sub(/\[\.\]/,".",$2); print tolower($2)}' FS='[/]+' | sort -u
You might have a mistake in one redirect, a > should be >> to append instead of overwriting:

Code:
NEWRESP=$(echo $tag | tr '[:lower:]' '[:upper:]') >> $FINAL_TAG_DICT
 
Old 08-05-2018, 01:28 AM   #5
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Hey ok - The main objective for me is not to disturb the existing codes however the script is initially being used for 2 purpose one get those urls/domains and put it in a file i.e. /var/www/html/today.txt

Other it would prompt data operator to enter tag and build a separate dict.yaml file with the appropriate TAG appended. And this is not working.

TAG could be one ata time let me tell you

first circle

TAG: phishing

It would take all those domains and create the yaml file however next time again if new TAG is entered all those domains get tagged with new one and phishing wipes off.
 
Old 08-05-2018, 02:22 AM   #6
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Final result would look like this

/var/www/html/today.txt =>

doc-japan.com
oksir.com
uai.projetosvp
com.br
4www.51wh.top
macrospazio.it
barocatch.com
cm2.com.br
dfinformatica.com.br
doc-japan.com
ekuvshinova.com
experimental.co.za
feitosaefujita.adv.br
frankbruk.pl
kamin-sauna.com.ua


While dict.yaml

"doc-japan.com": "TROJAN"
"oksir.com": "TROJAN"
"uai.projetosvp": "TROJAN"
"com.br": "TROJAN"
"4www.51wh.top": "TROJAN"
"macrospazio.it": "TROJAN"
"barocatch.com": "TROJAN"
"cm2.com.br": "TROJAN"
"dfinformatica.com.br": "TROJAN"
"doc-japan.com": "TROJAN"
"ekuvshinova.com": "TROJAN"
"experimental.co.za": "TROJAN"
"feitosaefujita.adv.br": "TROJAN"
"frankbruk.pl": "TROJAN"
"kamin-sauna.com.ua": "TROJAN"

Next time new domains/URLS entered then
Let say I entered

test.net
123.com
456.net

Then
/var/www/html/today.txt would contain

doc-japan.com
oksir.com
uai.projetosvp
com.br
4www.51wh.top
macrospazio.it
barocatch.com
cm2.com.br
dfinformatica.com.br
doc-japan.com
ekuvshinova.com
experimental.co.za
feitosaefujita.adv.br
frankbruk.pl
kamin-sauna.com.ua
test.net
123.com
456.net

While TAG for
test.net
123.com
456.net is say PHISHING

then dct.yaml should contain

"doc-japan.com": "TROJAN"
"oksir.com": "TROJAN"
"uai.projetosvp": "TROJAN"
"com.br": "TROJAN"
"4www.51wh.top": "TROJAN"
"macrospazio.it": "TROJAN"
"barocatch.com": "TROJAN"
"cm2.com.br": "TROJAN"
"dfinformatica.com.br": "TROJAN"
"doc-japan.com": "TROJAN"
"ekuvshinova.com": "TROJAN"
"experimental.co.za": "TROJAN"
"feitosaefujita.adv.br": "TROJAN"
"frankbruk.pl": "TROJAN"
"kamin-sauna.com.ua": "TROJAN"
"test.net": "PHISHING"
"123.com": "PHISHING"
"456.ne": "PHISHING"
 
Old 08-05-2018, 04:39 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Please explain how $1 is entered using post #6 as example input? (ie. how are you passing this to the script?)

Using post #6, please show what current output is as we assume it is not what you are asking for?

You will want to get out of bad habits like hiding while loops on a single line.
And whilst you have said:
Quote:
The main objective for me is not to disturb the existing codes
Are we to take it you are not interested in learning or you just want someone to add to the existing (perhaps poor) code?
 
1 members found this post helpful.
Old 08-05-2018, 03:07 PM   #8
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Not sure if bash would be the first choice for this type of program, at first glance with perl you could reduce code considerably and use appropriate data types to hold values

Now bash 4 introduced associative arrays, so maybe you could use this feature and use whatever you want to be uniq as key

Following the logic of last post, you can use domains as keys and tag as values
Then adding domain / tag is a matter of adding new key/value pair to array
 
1 members found this post helpful.
Old 08-06-2018, 12:11 AM   #9
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Well to be frank I am not coding pro neither have bash script expertise but writing small script to automate my most of the daily tasks.

@grail - Its just a file which is passed. Like my original script is saved as by name publish in /usr/sbin; Hence I am running

root@xxxxxxx:~# publish domains.txt

Well I definitely want to learn but again not being a pro and these script were originally written by someone else and I am just taking over and learning would not want to break anything. Though I can always copy the stuff and play with those.
 
Old 08-06-2018, 11:19 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Based on the list of data in post #3, would you please provide what the parsed output would look like?
For instance, what would you expect back from:
Code:
http://uai.projetosvp[/url][.]http://com.br/Lpncg
Here you have 2 addresses, but I don't see why you would want:
Code:
uai.projetosvp

OR

com.br
Or do you actually need both?

Do i also presume correctly that the additional space after http in post #3 is in error??
 
Old 08-06-2018, 12:35 PM   #11
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
OOps..that is a typo while entering here. Infact the data is

http://uai.projetosvp
http://com.br/Lpncg

So it would parse as

uai.projetosvp
com.br
In one file

and while the same script I was thinking it would prompt to enter the TAG like TAG: phishing

so doms.yaml would look like
"uai.projetosvp": "PHISHING"
"com.br": "PHISHING"

THis much is achived. While at the next run if new domains are entered again those might have different TAG and this is where I am failing
Lets say next run will containg

http://123.net
http://fgb.com/test

So it will be parsed as
uai.projetosvp
com.br
123.net
fgb.com

And TAG is TROJAN
Eventually it should look like
"uai.projetosvp": "PHISHING"
"com.br": "PHISHING"
"123.net": "TROJAN"
"fgb.com": "TROJAN"

While with my script its happening like

"uai.projetosvp": "PHISHING"
"com.br": "PHISHING"
"123.net": "PHISHING"
"fgb.com": "PHISHING"
 
Old 08-06-2018, 02:08 PM   #12
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Here is an example with associative array
Code:
#!/bin/bash

declare -A domains

addTag() {
    for i in "${!domains[@]}"; do
        [[ ${domains[$i]} = "" ]] && domains[$i]=$1
    done
}

# add domain to array
# arguments:
#     domain
#     optional tag argument
addDomain() {
    local value=""
    [[ ! -z $2 ]] && value=$2
    
    [[ -v "domains[$1]" ]] || domains[$1]=$value
}

printTags() {
    for i in "${!domains[@]}"; do
        printf '"%s": "%s"\n' "$i" "${domains[$i]}"
    done
}

addDomain uai.projetosvp PHISHING
addDomain com.br PHISHING
addDomain 123.net
addDomain fgb.com

addTag TROJAN
printTags

Last edited by keefaz; 08-06-2018 at 02:09 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash conditional | getting logic wrong? the_gripmaster Programming 5 03-16-2013 03:16 AM
[SOLVED] bash script parameters and logic asistant Programming 17 04-19-2012 07:39 AM
nested loop-bash script- issue on logic yathin Linux - Newbie 6 05-31-2010 06:30 AM
BASH scripting. Is there a better way of achieving this... hacker supreme Programming 4 06-15-2007 09:18 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration