Load balancing heavily used and abused wordpress sites

Turbocapitalist · 08-10-2018, 08:13 AM

I have a question (two actually) about load balancing busy WordPress sites. Now that WordPress is > 30% of all web sites and ~ 60% of all CMS sites, perhaps there are some established best practices for load balancing both in response to 1) normal but heavy use and 2) concerted DDoS attacks targeting the backend resources (database I/O and CPU utilization) so as to be able to punch through with a cache miss for each query.

1) Almost all of the normal use is read-only queries of popular pages. I suppose the right way for that is to just offload that to two or more varnish instances on separate machines. Should there be just one WordPress instance behind several Vanish instances?

2) Most of the abuse seems to come from some Windows bot nets and there's not much to single out any given host but when they they query in salvos, they overload the proxy/cache and bring the backend with WordPress itself to its knees. I suppose there might be a way to spin up a mirror of the WordPress site during heavy CPU loads. What should the arrangement be to increase resiliene in that scenario?

business_kid · 08-11-2018, 02:29 PM

On the DDoS attacks, D.J. Bernstein has some interesting ideas on handling them. I've installed some of his software. He doesn't make it easy, but knows tcp intimately. View his packages as more 'theoretical implementations,' but they're fast, light and good. He offers money for hacks to his code - not many coders are that brave. Oh, from what I can gather, he's mad.http://cr.yp.to

scasey · 08-11-2018, 03:12 PM

Quote:

Originally Posted by business_kid

On the DDoS attacks, D.J. Bernstein has some interesting ideas on handling them. I've installed some of his software. He doesn't make it easy, but knows tcp intimately. View his packages as more 'theoretical implementations,' but they're fast, light and good. He offers money for hacks to his code - not many coders are that brave. Oh, from what I can gather, he's mad.http://cr.yp.to

Are you referring to the ucspi-tcp suite of programs specifically? I'm not sure I'd consider them theoretical...

I've been running qmail under tcpserver for years. I've refined the tcprules to the point where we're blocking around 70% of all (email) connection attempts ['tho as of right now, that number is at 83.3%]

Code:

  	SPAM Blocking Statistics:
  	Between 08/09/2018 at 20:00:07 and 08/11/2018 at 12:00:48 MST
  	Elapsed time: 40.00 hours 		Avg/hour 		Per day
  	Total Messages: 	4563 	114 		2737
  	Messages accepted: 	660 	16 	14.46% 	396
  	Messages denied: 	3903 	97 	85.54% 	2341
  	           Denied by us: 	2907 	72 	63.71% 	1744
  	           Denied using SORBS: 	0 	0 	0.00% 	0
  	           Denied using SBL-XBL: 	414 	10 	9.07% 	248
  	           Denied using SPAMCop: 	582 	14 	12.75% 	349

The details of the "Denied by us" is:

Code:

 	
Breakdown of the 2907 messages blocked from spamming servers in the last 40.00 hours.
Last updated: Aug 11, 2018 12:01:01 MST 	Reason/Country	Msgs			Reason/Country	Msgs	
	Italy (IT) 	1428	49.1%	  	Vietnam (VN) 	27	0.9%
	Known abusers	753	25.9%	  	Mexico (MX) 	24	0.8%
	China (CN) 	219	7.5%	  	Chile (CL) 	19	0.7%
	Spam Blocked	128	4.4%	  	Spain (ES) 	17	0.6%
	Misc. (44 Sources < 10 each)	108	3.7%	  	Brazil (BR) 	16	0.6%
	Invalid Contact Address	52	1.8%	  	Romania (RO) 	15	0.5%
	Czech Republic (CZ) 	36	1.2%	  	Malaysia (MY) 	12	0.4%
	Russian Federation (RU) 	29	1.0%	  	India (IN) 	11	0.4%

Note that these are email stats only...an IP blocked from delivering email could still access http.
(and I aplogize for the formatting. It's copy/paste from a web page that analyzes qmail logs and summarizes)

Having all http requests go through tcpserver would take some thinking to set up, but once done the blocking of specific IPs can happen quickly and easily.

The block list can be added to on the fly, without having to restart any servers or processes. A very short example of what those entries look like:

Code:

60.248.53.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Taiwan (TW) See: http://mydomain.com/nomail.pl?ip="
66.60.0-63.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Argentina (AR) See: http://mydomain.com/nomail.pl?ip="
80.85.144-151.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Russian Federation (RU) See: http://mydomain.com/nomail.pl?ip="
82.119.158.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Russian Federation (RU) See: http://mydomain.com/nomail.pl?ip="
86.56.190.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Austria (AT) See: http://mydomain.com/nomail.pl?ip="
89.37.96-103.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Romania (RO) See: http://mydomain.com/nomail.pl?ip="
193.205.160-191.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse Italy (IT) See: http://mydomain.com/nomail.pl?ip="
202.136.208-223.:allow,RBLSMTPD="-11/13/06:Mail Not Accepted due to abuse China (CN) See: http://mydomain.com/nomail.pl?ip="

There are 60,000+ lines in our tcp block list, but since it's compiled into a binary database, finding a match takes milliseconds. That logic in built into the process. I've built perl scripts that accept the IP range and the Country and CC.
The text following the hyphen is BOUNCED to the sending address with the sending IP appended. As can be seen, there's a link to report a false positive...that very seldom happens.

I presume one can configure a firewall to block IPs at the port level, but I don't know how to do that.
Would Fail2Ban help in that case? I've yet to get my head around that either.

Keep us post on your progress, please...it could happen to any one of us .

Footnote: Sorry for not responding to Turbocapitalist's question about load balancing...configuring tcpserver to filter would eliminate the need for that, because connections are denied and never impact the server at all.

PPS: Not sure how one would identify which IPs need to be blocked...with email, it's the netblock containing the IP that sent the spam. Log analysis counting connections by IP over time, perhaps (again, kinda like Fail2Ban)

business_kid · 08-12-2018, 04:13 AM

Bernstein's stuff is definitely not FHS compliant, but I'm glad to hear you're running it successfully. Personally, I admired his work, but if he sanitized it a little more, it would become mainstream.

He had some interesting ideas on DDoS, which I read up on. Basically, it was that once you realised you were being DDos'ed, send an ack, and drop it. The attacker then does more work than you. People criticized as denying legitimate customers, and in effect shutting down the site. It's a survival policy, and if he has some refinement of it in code, it certainly would be worth trying. I haven't been on cr.yp.to for a long time.

Turbocapitalist · 08-12-2018, 05:00 AM

Sending an ACK to everybody would just help the DDoS attack be even more successful. Remember, that the scenario in a distributed attack is that a very large number of separate hosts each take a small whack at the target. The amount of load from any given DDoS participant machine might even be a little less than from a legitimate visitor.

Are there any lists in addition to Emerging Threats to look at? It might be interesting to see if there is a lot of overlap between the lists and the attakers.

scasey · 08-12-2018, 12:06 PM

Quote:

Originally Posted by business_kid

Bernstein's stuff is definitely not FHS compliant, but I'm glad to hear you're running it successfully. Personally, I admired his work, but if he sanitized it a little more, it would become mainstream.

I had to look up FHS compliant:

Quote:

Filesystem Hierarchy Standard (FHS) defines the directory structure and directory contents in Linux distributions

I suppose that's technically true of the links in the /service directory, but all of the code and configuration files appear to be compliant...using /usr/local/bin or /etc or /var as appropriate.

Quote:

He had some interesting ideas on DDoS, which I read up on. Basically, it was that once you realised you were being DDos'ed, send an ack, and drop it. The attacker then does more work than you. People criticized as denying legitimate customers, and in effect shutting down the site. It's a survival policy, and if he has some refinement of it in code, it certainly would be worth trying. I haven't been on cr.yp.to for a long time.

I'd like to read that, but the only paper I found had a charge. Not that that's a show-stopper.

My earlier post was not, of course, about DDoS, but about rejecting spam by source, so rejecting the email will hopefully cause problems for the sender.

I'd think that if one could set up the same kind of system with DJBs software to drop connections, it could be effective. I wouldn't know how to do that...and again I (we?) are drifting away from Turbocapitalist's original question/idea.

Turbocapitalist · 08-13-2018, 09:23 AM

I got a hint that for the DDOS one way is to automate a static mirror of the site and switch to that when load gets excessive, but there should be other options.

MariaDB can use Galera or similar clustering, but would the extra WordPress node have to stay online and just idle until needed? Or can it be spun up from another VM at the same provider?

Habitual · 08-13-2018, 03:34 PM

Quote:

Originally Posted by Turbocapitalist

1) Almost all of the normal use is read-only queries of popular pages. I suppose the right way for that is to just offload that to two or more varnish instances on separate machines. Should there be just one WordPress instance behind several Vanish instances?

When I had to "deal with it", I didn't want to "think about it. Just be over!"
Thought about Cloudflare?

or https://wordpress.org/plugins/vcaching/ (by Senior backend developer" at 100 shops.)
Seems maybe check it out?

Last incident I had was 4040'ish hits on the books of 1 site.
in 3 seconds.
4 Errors I think
All the rest were 200s

What's normal use?
No one could tell me and the boss, well, all Icould say is "IDK". Because I didn't.
What is normal any more?

Surf the site, we'll capture. Never got to it and 4040'ish AND most them were "200"...???
It got real concise in about .10 of a second.

Good Luck. I believe "you are not the first" and will meet with much success.
I suggest you not knee-jerk it and take your time finding a workable solution to the predicament.
See also https://codex.wordpress.org/High_Tra..._For_WordPress