Deploy PHP applications with zero downtime and under (heavy) load

(This text was originally written in 2014) In the following few paragraphs I would like to share with you our experience we have gained while implementing the „bulletproof“ deploy of our PHP application on one of the most visited Czech web server. If you run a small web that serves only few requests per second, you probably will know nothing important. However, if you run a larger service with about 100 requests per second (at the peak) and the application is running on the multiple web servers, like ours, deploy often and you like it, and on the other hand, you don’t like loosing requests like us, then I believe the following text might be interesting to you.

First, let me do small intro. Our application is written in Nette, runs on multiple web servers, a user cache is available on each server, and of course the opcode cache (for both we used XCache, now it’s APCu and Zend OPcache). Plus, the application uses a shared cache for all servers (redis). As I wrote in the introduction, we have deployed the new version whenever it’s needed. We don’t expect for low traffic. In the past, there always some requests ended with application error after new deploy. During the higest server load, it was just a few requests, but our goal was to lower this number to zero.

In the past, we deployed our app in a way that most of you probably know and use. The web root is set up in the Apache virtual host as a symbolic link that always leads to the directory with the current application source. The actual deployment of the new version then looks like this – application files are uploaded to the new directory, the symlink changes to refer to this directory, some cache is deleted and the deploy is done. If you have multiple servers, you have to do the same procedure on each of them or sync them in some other way (we are using rsync). We were using this scenario and it worked on our servers until we upgraded Debian 6 „Squeeze“ to Debian 7 „Wheezy“ due to upgrade PHP to 5.4. Since then (only under heavy load), Apache has stopped managing the symlink change and has started returning empty answers for several minutes. First of all, we suspected XCache, which we started using after upgrade PHP to 5.4, but we got the same behavior with other opcode caches and even without opcode cache. After that, we tried some googling and we found some users with the same problem, but we also didn’t find any solution. Finally, we temporary solved this situation, that instead of changing the symlink, we changed the root directory by renaming directories (new version of the application was uploaded to www_new, the current www was renamed to www_old and www_new to www – unfortunately, unlike the symlink change, this can’t be done atomically, so a few requests alwayas ended with an error), which surprisingly Apache can handle. That’s how we worked for about a year before we found the time to try another and better solution.

Our first idea was to restart Apache in a way that the old requests will be completed and the new ones will be called with a changed configuration (graceful restart in Debian known as reload). For us, it would mean to change the document_root to a directory with a new version of the source code of the application. Everything worked nicely if the greaceful restart was made under low load. On the other hand, under heavy load, Apache died for a few minutes after restart. After that we have decided to let Apache go… For static data, we’ve been using nginx for a long time on another server and we’ve chosen to replace Apache everywhere.

With nginx, the change of symlink has started working as we expected, but there was another problem. I can’t explain why the same behavior didn’t occur when we used PHP as a Apache module, but PHP is internally caching a realpath for all the files it uses (this cache is also used for the __DIR__ and __FILE__ magical constants) and this cache is holded by default 2 minutes. If you use PHP with nginx (via fastcgi, php-fpm) and the site root is a symlink, then this behavior will take effect immediately. After root symlink is changed, you are getting the old paths of the application for (maximum) two minutes. Although this cache can be disabled in php.ini, it’s not recommended due to performance degradation. Unfortunately, this cache can’t be cleared globally, can only be deactivated for one request (another request will use this cache again). All tutorials for running nginx with PHP over fastcgi, which I have found, pass the path to a PHP file with untranslated symbolic links. It’s this line, which you most likely have in your configuration:

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

Suppose we have a web root site as a symlink /var/www/app/www that points to the current source code directory /var/www/app/releases/v1 and the server accepts a request to the index.php. Nginx calls PHP and passes the php file path through the SCRIPT_FILENAME parameter as /var/www/app/www/index.php. PHP will find that the /var/www/app/www directory is a symlink and cache the translated path to index.php as /var/www/app/releases/v1/index.php, this file is also processed and the result is returned. Now we will upload the new version of the application to the /var/www/app/releases/v2 directory and we change the /var/www/app/www symlink to this directory. The server will accept a new request again to index.php, call PHP again with SCRIPT_FILENAME as /var/www/app/www/index.php. PHP receive the request and looks if has this path in cache. And the cache says that /var/www/app/www/index.php is actually a link to /var/www/app/releases/v1/index.php. Well, it processes and returns the output from the older version of the application.

In order to prevent this behavior in elegant way, it would be good to pass file to PHP already with translated path and not as a symlink. And nginx this fortunately allows. This functionality is not very well presented at all (this could be done also with Apache via mod_realdoc, which I think might be necessary if you run PHP as fastcgi) and it was hard for us to find it, but at the end we have succeeded. The whole thing is quite simple. You only need to replace $document_root variable with $realpath_root. So the entire configuration line will look like this:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;

For sure, we add this line too:

fastcgi_param DOCUMENT_ROOT $realpath_root;

We need this line when we want to use in PHP code $_SERVER['DOCUMENT_ROOT']. We also need to comment these lines in nginx configuration file fastcgi_params or put these lines after the line include fastcgi_params.

Now, we have a web server ready for new deploy and the last thing we need is to keep in mind is that if we replace the symbolic link in linux (often by command ln -snf new current), this operation isn’t atomic and the symbolic link will not exist for some mili/micro seconds. For an atomic replacement, we have to use a small trick. A new temporary symbolic link will be created and this new symbolic link will move to the original one we want to replace. The whole operation then looks like this:

ln -s new current_tmp && mv -Tf current_tmp current

So that’s all the magic! With this we are able to deploy new versions of the application even at the greatest load and not loose a single request.

Summary and addition

  • don’t be afraid to replace Apache with nginx! If you have a shared server for PHP and static files, you will also feel the performance improvement. We have static files on another server, so we don’t see much difference in performance, but we still have a positive experience with nginx. Many people are afraid that they will loose .htaccess files, but how often do you edit them? Nginx does not support anything like this (because of performance, if anyone knew about a specific measurement, we’d love to look at it :-)) but it’s all going to be written directly into the configuration
  • set nginx to recognize symbolic links (somewhere in the forum we have read is written that a call to the system to recognize a symbolic link should take some power, but we did not find anything similar on our servers). Fastcgi settings may look something like this:
    location ~ \.php$ {
        try_files     $uri =404;
    
        fastcgi_pass         unix:/var/run/php5-fpm.sock;
    
        include  fastcgi_params;
    
        fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
        fastcgi_param DOCUMENT_ROOT $realpath_root;
    }
    
  • change the symbolic link to the web root directory atomically by creating a temporary symbolic link to the directory with the new application version and moving this new symlink over the original one:
    ln -s app_new_version www_tmp && mv -Tf www_tmp www
    

    – if you have multiple web servers, you must call this command on each one of them! For example, we use rsync to synchronize files, but if we synchronize even a symbolic link to the root directory, it would not be replaced in atomic way!

  • to ensure that everything works, each application version has to have its own temp directory, where you will be able to save the individual cache for a specific version of the application (templates, robot loader, …)
    – if you are using user cache (i.e. APC) on the server where the data is dependent on the application version (the structure may change with the new version of the application), it is appropriate to prefix the cache keys with the version of the application or use the namespace if possible – it could cause errors at the time between changing the symlink and deleting the user cache!
    – database changes for new version must be backward compatible with the older version (so the old and new versions of the application can work)
    – it all has also the advantage that if the new version for some reason didn’t work well, it can simply go back to the original by simple changing the symlink with a web root to a directory with an older version of the application
  • possible problems with this continuous deployment can occur with a shared cache. There are often stored objects that we don’t want to regenerate with each new application version, and if it’s necessary to delete the cache for the proper functionality of the application (i.e. when cache structure is changed) it can’t be done atomically with the application deploy. And just in time between changing a symlink and deleting the cache, the application may not work well.

2 komentáře u „Deploy PHP applications with zero downtime and under (heavy) load

Napsat komentář

Vaše emailová adresa nebude zveřejněna.