Simple Tools

Simple and Elegant Tools for the Web Developer.

[SOLUTION] How to Debug Intermittent “Error Establishing Database Connection”

The Problem

Every couple weeks or so, my WordPress site would fail with “Error Establishing Database Connection.” I would restart the server and everything would work fine - for another couple weeks. I was using Digital Ocean’s WordPress on LAMP one-click install. (Huge fan of Digital Ocean BTW)

The Short Answer

Apache is getting overloaded, crashing, restarting and amidst all the chaos, MySQL crashes too. If your website doesn’t get a lot of traffic, you probably have bots brute-force-attacking or overpinging your site. The solution is to take measures to block the bot traffic and more importantly, adjust apache configuration to not allow itself to get overburdened and crash (and take MySQL with it).

This Post TL;DR

Basically, we have two problems to deal with: Problem 1: Apache’s configuration is making it vulnerable to crashes in traffic spikes. Problem 2: We’re using up a lot of resources dealing with bots.

Problem 1: Apache’s configuration is making it vulnerable to crashes in traffic spikes.

Run Apache2Buddy to diagnose current configuration.

1
$ curl -sL https://raw.githubusercontent.com/richardforth/apache2buddy/master/apache2buddy.pl | perl

Make adjustments to Apache’s MPM-Prefork configurations based on Apache2Buddy’s report

1
2
3
4
5
6
7
8
9
10
#/etc/apache2/mods-available/mpm_prefork.conf 

<IfModule mpm_prefork_module>
  StartServers             5
  MinSpareServers       5
  MaxSpareServers      10
# MaxRequestWorkers     150
  MaxRequestWorkers         22
  MaxConnectionsPerChild   0
</IfModule>

Problem 2: We’re using up a lot of resources dealing with bots.

Install IP Geo Block to block a bunch of traffic. And / or install Brute Force Login Protection to block IP’s that fail too many logins.

In-Depth Explanation of Problem 1: Apache’s configuration causes it to crash under heavy traffic

Step 1: Use Apache JMeter to stress test the server, replicate the error

I used this article, https://www.digitalocean.com/community/tutorials/how-to-use-apache-jmeter-to-perform-load-testing-on-a-web-server to install and set up Apache JMeter on my Mac. JMeter is a tool that will automate requests to your site and record the results. It’s pretty easy to set up and hugely useful in diagnosis.

The strategy now is to try to crash your site with JMeter. (Take a snapshot of your droplet, spin up a new droplet of that snapshot for pennies / hour). I set my Thread Group to do 150 requests every 3 seconds, repeat “forever,” and to “Stop Thread” on a crash. Before you start the test, SSH into your server and start htop ($ htop - you might have to install it). You’ll see a representation of your CPU, memory usage, Swap memory, and Processes. Now start the test and watch it go nuts. It might not fail right away, go ahead and give it some time. If you have “Stop Thread” selected, the test will stop pinging once the server fails.

Step 2: [SOLUTION] Reduce MaxRequestWorkers (MaxClients) in Apache configs to conserve memory

Depending on your server’s RAM and your application’s size, you’ll be able to handle different numbers of Apache RequestWorkers. There’s a sweet little tool called ApacheBuddy (Apache2Buddy for Apache2 users) that will make a good guess as to what your server can handle.

Since I’m using Apache2, I’d run $ curl -sL https://raw.githubusercontent.com/richardforth/apache2buddy/master/apache2buddy.pl | perl and it spits out a report.

At the time of this writing, the default setting for the Digital Ocean’s One-Click install has MaxRequestWorkers set to 150, which is way too high. My app is ~30MB and with ~900MB available memory. It’s given reign to request 150 * ~30 = 4.5 GB of memory. Ubuntu is pretty good at memory management, but funny things happen under stress. Apache2Buddy recommended me to set MaxRequestWorkers to something like 22 - 25.

This is dony by editing /etc/apache2/mods-available/mpm_prefork.conf and setting MaxRequestWorkers to the recommended value in the Apache2Buddy report and restart apache. (Older versions of Apache2 use the name “MaxClients.” Same variable, different name. Just use whatever’s there.)

1
2
3
4
5
6
7
8
9
10
#/etc/apache2/mods-available/mpm_prefork.conf 

<IfModule mpm_prefork_module>
  StartServers             5
  MinSpareServers       5
  MaxSpareServers      10
# MaxRequestWorkers     150
  MaxRequestWorkers         22
  MaxConnectionsPerChild   0
</IfModule>

Now when you run a stress test with JMeter, you’ll see the memory bar stays much more in a safe range.

For more information on optimizing Apache, the Digital Ocean staff wrote a great article here: https://www.digitalocean.com/community/tutorials/how-to-optimize-apache-web-server-performance. Adjusting the above configuration should fix our problem, but the article is a great read.

Step 3: (Optional) Add Swap memory for more wiggle-room

Again the folks at Digital Ocean wrote a great article about adding a Swap file here: https://www.digitalocean.com/community/tutorials/how-to-add-swap-on-ubuntu-14-04. Our app will now stay within our memory constraints better, but the added Swap gives some extra security.

In-Depth Explanation of Problem 2: Dealing with Bots

Step 1: Confirm hackers / bots

To confirm this, you just have to browse your apache access logs. The key thing we’re looking for is hundreds / thousands of requests from the same IP, often many per second, sometimes pointed at nonsensical URLs. You’ll know it when you see it.

1
$ cat /var/log/apache2/access.log

You might see stuff like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
185.106.94.99 - [31/Dec/2015:04:21:18 -0500](403 307) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:18 -0500](403 239) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:20 -0500](403 245) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:27 -0500](403 241) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:33 -0500](403 376) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:42 -0500](403 281) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:42 -0500](403 437) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:43 -0500](403 456) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:49 -0500](403 253) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:21:58 -0500](403 615) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:22:02 -0500](403 478) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:22:03 -0500](403 417) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:22:04 -0500](403 294) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:22:06 -0500](403 238) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"
185.106.94.99 - [31/Dec/2015:04:22:10 -0500](403 259) -[POST] /xmlrpc.php  "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" referred by:"-"

Now how to block them? The XMLRPC attack is well-known and documented. There are a number of methods, my preferred plugin is IP Geo Block. With IP Geo Block, you can block all XMLRPC connections, guard against “Zero Day Attack” in /wp-content/plugins/... or wp-content/themes/..., even block all connections from anywhere outside whitelisted countries.

There is also a WordPress Plugin that is elegant in its simplicity: Brute Force Login Protection locks users out at the .htaccess level after a given number of login failures. The plugin as it exists in the WordPress directory does not work for Apache versions >= 2.4 but here you can download a copy that will work for all Apache versions. (Simply click “Clone or Download” -> “Download ZIP”).