The Problem
Every couple weeks or so, my WordPress site would fail with “Error Establishing Database Connection.” I would restart the server and everything would work fine - for another couple weeks. I was using Digital Ocean’s WordPress on LAMP one-click install. (Huge fan of Digital Ocean BTW)
The Short Answer
Apache is getting overloaded, crashing, restarting and amidst all the chaos, MySQL crashes too. If your website doesn’t get a lot of traffic, you probably have bots brute-force-attacking or overpinging your site. The solution is to take measures to block the bot traffic and more importantly, adjust apache configuration to not allow itself to get overburdened and crash (and take MySQL with it).
This Post TL;DR
Basically, we have two problems to deal with: Problem 1: Apache’s configuration is making it vulnerable to crashes in traffic spikes. Problem 2: We’re using up a lot of resources dealing with bots.
Problem 1: Apache’s configuration is making it vulnerable to crashes in traffic spikes.
Run Apache2Buddy to diagnose current configuration.
1
|
|
Make adjustments to Apache’s MPM-Prefork configurations based on Apache2Buddy’s report
1 2 3 4 5 6 7 8 9 10 |
|
Problem 2: We’re using up a lot of resources dealing with bots.
Install IP Geo Block to block a bunch of traffic. And / or install Brute Force Login Protection to block IP’s that fail too many logins.
In-Depth Explanation of Problem 1: Apache’s configuration causes it to crash under heavy traffic
Step 1: Use Apache JMeter to stress test the server, replicate the error
I used this article, https://www.digitalocean.com/community/tutorials/how-to-use-apache-jmeter-to-perform-load-testing-on-a-web-server to install and set up Apache JMeter on my Mac. JMeter is a tool that will automate requests to your site and record the results. It’s pretty easy to set up and hugely useful in diagnosis.
The strategy now is to try to crash your site with JMeter. (Take a snapshot of your droplet, spin up a new droplet of that snapshot for pennies / hour). I set my Thread Group to do 150 requests every 3 seconds, repeat “forever,” and to “Stop Thread” on a crash. Before you start the test, SSH into your server and start htop ($ htop
- you might have to install it). You’ll see a representation of your CPU, memory usage, Swap memory, and Processes. Now start the test and watch it go nuts. It might not fail right away, go ahead and give it some time. If you have “Stop Thread” selected, the test will stop pinging once the server fails.
Step 2: [SOLUTION] Reduce MaxRequestWorkers (MaxClients) in Apache configs to conserve memory
Depending on your server’s RAM and your application’s size, you’ll be able to handle different numbers of Apache RequestWorkers. There’s a sweet little tool called ApacheBuddy (Apache2Buddy for Apache2 users) that will make a good guess as to what your server can handle.
Since I’m using Apache2, I’d run $ curl -sL https://raw.githubusercontent.com/richardforth/apache2buddy/master/apache2buddy.pl | perl
and it spits out a report.
At the time of this writing, the default setting for the Digital Ocean’s One-Click install has MaxRequestWorkers set to 150, which is way too high. My app is ~30MB and with ~900MB available memory. It’s given reign to request 150 * ~30 = 4.5 GB of memory. Ubuntu is pretty good at memory management, but funny things happen under stress. Apache2Buddy recommended me to set MaxRequestWorkers to something like 22 - 25.
This is dony by editing /etc/apache2/mods-available/mpm_prefork.conf
and setting MaxRequestWorkers to the recommended value in the Apache2Buddy report and restart apache. (Older versions of Apache2 use the name “MaxClients.” Same variable, different name. Just use whatever’s there.)
1 2 3 4 5 6 7 8 9 10 |
|
Now when you run a stress test with JMeter, you’ll see the memory bar stays much more in a safe range.
For more information on optimizing Apache, the Digital Ocean staff wrote a great article here: https://www.digitalocean.com/community/tutorials/how-to-optimize-apache-web-server-performance. Adjusting the above configuration should fix our problem, but the article is a great read.
Step 3: (Optional) Add Swap memory for more wiggle-room
Again the folks at Digital Ocean wrote a great article about adding a Swap file here: https://www.digitalocean.com/community/tutorials/how-to-add-swap-on-ubuntu-14-04. Our app will now stay within our memory constraints better, but the added Swap gives some extra security.
In-Depth Explanation of Problem 2: Dealing with Bots
Step 1: Confirm hackers / bots
To confirm this, you just have to browse your apache access logs. The key thing we’re looking for is hundreds / thousands of requests from the same IP, often many per second, sometimes pointed at nonsensical URLs. You’ll know it when you see it.
1
|
|
You might see stuff like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Now how to block them? The XMLRPC attack is well-known and documented. There are a number of methods, my preferred plugin is IP Geo Block. With IP Geo Block, you can block all XMLRPC connections, guard against “Zero Day Attack” in /wp-content/plugins/...
or wp-content/themes/...
, even block all connections from anywhere outside whitelisted countries.
There is also a WordPress Plugin that is elegant in its simplicity: Brute Force Login Protection locks users out at the .htaccess level after a given number of login failures. The plugin as it exists in the WordPress directory does not work for Apache versions >= 2.4 but here you can download a copy that will work for all Apache versions. (Simply click “Clone or Download” -> “Download ZIP”).