| misc

This article describes how I got speld.nl to a 13ms response time.

De Speld runs on WordPress and WP can be notoriously hard to speed up. Many sites rely on a whole range of plugins that may/most certainly slow the site down quite a bit. Hell, even a bare install will probably yield at max a 400ms loading speed.

We (read: I) went to a quite cumbersome process to reach this speed and stability. speld.nl is quite the ‘viral’ website where popular articles can gain traction in a very short time span; reaching hundreds of thousands people, mostly through campaigns on the social sites.

Info

Back in 2014, speld.nl was unsatisfied with their previous hoster due to the costs and had to swap hosting provider quickly. I provided initial support by transfering their hosting to a managed hosting company, just to get things up and running again.

The price for this managed hosting solution was nothing out of the ordinary for a high traffic website but I felt that it could be cheaper if I did it myself.

I proposed to host the site over at TransIP.nl, a local VPS provider that I’ve had good experience with.

The actual setup

I set out to sketch out a platform that scales horizontally - allowing me to add or remove machines when the load requires it. This setup is really flexible and makes use of small and cheap consumer VPSs that each have their own specific role.

diagram

Here is a table of the machines:

Name host eth0 eth1 TransIP role
LB speld.nl [redacted] [redacted] VPS x8 Loadbalance requests to the web instances
CDN hooiberg.speld.nl 149.210.216.170 [redacted] VPS x16 Deliver static content
DBM dbm.eth1 [redacted] [redacted] VPS x8 MySQL database
app1 app1.eth1 [redacted] [redacted] VPS x4 Webserver instance #1
app2 app2.eth1 [redacted] [redacted] VPS x4 Webserver instance #2
app3 app3.eth1 [redacted] [redacted] VPS x4 Webserver instance #3
dev-mon dev.speld.nl [redacted] [redacted] VPS x2 Monitoring / Backup / Development environment

Software used

Debian 8, NGiNX, Varnish 4, HAProxy, vsftpd, MySQL, WordPress, PHP5, Python 2.7, icinga2, GitHub

SSL (TLS)

Initially speld.nl did not make use of SSL. Thankfully there are some free SSL certificates out there, courtesy of the Electronic Frontier Foundation, through letsencrypt.org. Apart from providing more security, Google claims to give websites that make use of SSL more page rank and this is always welcome.

Haproxy, our loadbalancer, sends incoming requests to either one of the 3 web instances. In order to keep recurring requests going to the same web instance, a cookie is injected so that the loadbalancer knows where to send the next request to.

When using SSL though, as far as Haproxy is concerned the incoming traffic is just ‘TCP traffic’ and it will be unable to assign this special cookie.

The only solution is to use something called ‘SSL termination’ - so that our loadbalancer terminates the SSL connection and from that point on the traffic will be unencrypted HTTP.

This introduces a couple of problems:

  • Wordpress needs to be convinced it is actually running on HTTPS. I achieved this through: if ($_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https'){ $_SERVER['HTTPS']='on'; } in wp-config.php.
  • Naturally, the Wordpress theme had to be adjusted to reflect the new URL scheme. This also meant that any unencrypted references (http://) to resources (css/js/img) had to be changed. Sometimes these URLs are hardcoded into your Wordpress theme and they’ll need to be changed accordingly.
  • By now, Wordpress still did not correctly adjust itself to the new URL scheme. I ended up using the plugin SSL Insecure Content Fixer in ‘Simple’ mode. I also had to inject the HTTP_X_FORWARDED_PROTO in nginx in order to make this plugin work.

As for regularly updating the certificates; I have a cronjob that looks like this:

0 23 1 jul,oct,jan,apr * cd /root/letsencrypt/ && ./letsencrypt-auto certonly --webroot -w /home/despeld/public_html/ -d hooiberg.speld.nl && cat /etc/letsencrypt/live/hooiberg.speld.nl/cert.pem /etc/letsencrypt/live/hooiberg.speld.nl/privkey.pem > /etc/nginx/hooiberg.speld.nl.pem && service nginx reload >> /var/log/ssl_update.log

Syncing files between the web instances

We now have 3 different web instances. This introduces the problem of having to juggle media files around in order to keep them in sync between the web instances.

The problem is such that when an editor that is using web instance #1 will upload some media, then a second editor who is using web instance #2 will try to use this media but will be unable to find that file, since it’s only present on machine #1.

Initially I gave ClusterFS a try - in an attempt to create a storage cluster. Unfortunately, for reasons still unknown to me, it created an absurdly amount of stress on the CPU, eventually grinding the website to a halt. ClusterFS is actually good software, but I must admit that due to time constrains I was pressed to find a quicker solution and that solution was a CDN.

CDN

speld.nl uses a single machine as a ‘CDN’ which I am aware does not qualify as an CDN. Nonetheless it caches aggresively through Varnish and rarely breaks a sweat serving thousands of static files.

From within Wordpress the plugin W3C Cache is responsible for automatically uploading media to the CDN machine via FTP. This triggers a mechanism on the CDN that in turn syncs this file to all the web instances. It’s a hack that has been working great for the past 2 years.

To maximize CDN usage, most static files (.css/.js) under wp-content/ are uploaded. A nginx rule then forces a redirect:

location / {
    rewrite ^/wp-content/(.*)$ https://hooiberg.speld.nl/speld.nl/wp-content/$1 redirect;
    [...]
}

Caching

Initially speld.nl was using the ‘disk cache’ function of the W3C Wordpress plugin. W3C would write static HTML pages to disk and serve them from disk. I experienced a couple of problems with this approach:

  • W3C and nginx do not play nice together. Seems as if it’s only been made for apache. After a few hours of running nginx in debug mode I was able to create a working w3tc.conf.
  • For reasons still unknown to me, the caching function in W3C would regularly disable itself resulting in a website that’s running without caching. Eventually the website will start serving 500 errors when it gets too busy.
  • To make matters worse, the W3C plugin uses a local config - as opposed to saving its settings in the database. This means that every web instance had its own config - and every web instance was prone to random ‘paralysis’.

Varnish was eventually used and a custom VCL was created. It was intensively tested and dramatically decreased page response times. They are now sitting comfortably between 10-50ms no matter the load.

Monitoring

Icinga2 is complex to setup but very powerful. I prefer it over nagios. It monitors every server on the following points:

  • ping4
  • ping6
  • memory load
  • swap
  • diskspace
  • CPU load
  • Number of processes
  • Number of users logged in
  • aptitude package manager status
  • SSH access

In addition, the following checks are specific to the machines that affect them:

  • Check HTTP access
  • Check FTP access
  • Check each web instance’s availability
  • Validate haproxy request routing by cookie
  • Check if the CDN is nearing it’s bandwidth usage limit

If a check does not pass, I get a message about it via Telegram.

Development

The development server is quite simple. It tries to be a copy of the production counterpart and access is granted via SSH. GIT is used for version control. From here you may push to production.

Conclusion

Hosting high traffic WordPress sites is a pain but not impossible. Make a proper varnish config, use a CDN and you’re good to go :)