The basics of client caching in clear words and examples. Last-modified, Etag, Expires, Cache-control: max-age and other headers

Many people think that by default CSS files connected via link or @import are not cached. I have to disappoint you. It is precisely the css that is cached in a separate file, and it is very good, I would say excellent. This information is reliably verified on both 6 and up and other browsers. It is worth noting that many beloved caches such files at a completely wild speed, so to speak, gets first place for this case. By the way, it is to this very mechanism that Opera has in many cases a significant speed in comparison with other browsers. But I will make a reservation right away that this "super" caching in Opera is a cruel joke with it when using AJAX technology. While others tweak chicky bunches when using AJAX, Opera takes the old one. But this is a song of a separate topic.

CSS caching

BUT! There are still some grief problems in this direction. This is due, as a rule, to an incorrectly configured Apache server, which produces incorrect headers. And with the help of the header, you can control the caching of files. By default, of course, the cache is always enabled. But there are times when you don't need to cache files. For this, the pros start dancing with tambourines about HTTP headers. But if you are reading this entire article, then you are still very far from managing the HTTP headers. I assure you that in the near future you will not face such a task. And yet, if you are curious to the core, then I will briefly tell you how this happens.

sends the HTTP header to the WEB server - they say, hey, sweet pepper, give me the CSS file, otherwise I have CSS, but the last time there was such a change.
And the server says to him in response, so sweet, there have been no changes since that moment, take and use your old CSS boldly.
If the CSS has changed, then the browser stupidly updates the CSS in its cache.

Well, now, if not tired, then a little scientific rubbish from some kind of experiment.

I'll tell you right away the lower text will be poorly understood by beginners in the WEB. Basically, this will be useful for those who are still faced with the tasks of disabling and enabling the cache.

All experiments were carried out on a real, paid basis. A good hoster, so to speak, that allows you to change the structure of HTTP headers without having to paranoia that it will be hacked by the HTTP header :)

Browser modes

So any browser has 2 modes:

1. Default mode, the returned title is:

Cache-Control: no-store, no-cache, must-revalidate, post-check = 0, pre-check = 0

2. Caching Enabled Mode, the returned title is:

Cache-Control: private, max-age = 10800, pre-check = 10800

Next, I describe the behavior of browsers

FireFox 3.5 and higher

In the first mode firmly caches external JavaScript files and does not even check for updates, unless you force the page to refresh. The CSS is validated by a header request.

If-Modified-Since: "current date" GMT If-None-Match: "own hash code"

That is, the CSS is only reloaded if it has actually been updated.

Secondly mode stops refreshing the page altogether. That is, even if we have changed the content displayed on the page in the database, it does not display this, even if it is forced to refresh, since it sends a request:

GET / HTTP / 1.1 Host: xxx.com If-Modified-Since: current GMT date

and gets the answer:

HTTP / 1.1 304 Not Modified

Internet Explorer 8 (IE8)

In the first Internet Explorer mode sends If-Modified-Since & If-None-Match requests for both JavaScript and css, that is, it loads JavaScript and CSS only if they are actually updated. The same is true if the page is forced to refresh.

Secondly Internet Explorer mode also sends If-Modified-Since & If-None-Match requests for both JavaScript and css. But at the same time, it does not even try to load / update the page itself, that is, it does not even send a request, that is, your js / css will be updated, but the template and page content will not. Even a forced page refresh does not help to update the content.

Opera 10 and older

In the first Opera mode, in the first mode, the js & CSS update depends on the value of the Check images option in the settings. If the option is set to Always, then the opera sends requests with If-Modified-Since & If-None-Match to check for js & css updates. If a value is set, for example, 5 hours, then, accordingly, it will be checked once every 5 hours, or by forced page refresh.

Secondly mode, Opera does not check for js & CSS updates (does not make GET requests), and also does not make a GET request for the page itself, that is, we will not see either js & css updates or content updates, as in other things and in others browsers. But with a forced update, Opera is better. Unlike IE & FF, Opera explicitly requests page content without If-Modified-Since & If-None-Match. Js & CSS update requests for forced update come with If-Modified-Since & If-None-Match.

conclusions

Caching, if you don't understand exactly how it works in different browsers and what the consequences, is a rather dangerous thing.
Caching can be enabled only if the page is rarely updated (that is, if the site does not have pages that are updated in real time) and even in this case, it is imperative to set a limit on the caching limitation period (for example, a few hours or a day)
FireFox behaves, in my opinion, a little smarter than IE, because even with disabled caching it does not constantly check for JavaScript updates, which looks logical, because JavaScript is updated very rarely.
Opera allows you to flexibly control the updating of images, JavaScript and CSS using the Check images setting, which is a plus. Opera also behaves better than IE & FF with caching enabled and forced refresh, since, let me remind you, Opera completely refreshes the page content in this case, and IE & FF will leave you in happy ignorance.

Good luck and profitable sites.

Properly configured caching provides huge performance gains, saves bandwidth, and reduces server costs, but many sites do not implement caching well, creating a race condition that leads to out of sync between related resources.

The vast majority of caching best practices fall into one of two patterns:

Pattern # 1: immutable content and long max-age of the cache

Cache-Control: max-age = 31536000

The content on the URL does not change, therefore ...
Browser or CDN can cache a resource for a year without any problems
Cached content that is younger than the specified max-age can be used without consulting the server

Page : Hey, I need "/script-v1.js", "/styles-v1.css" and "/cats-v1.jpg" 10:24

Cache : I'm empty, how about you Server? 10:24

Server : OK, there they are. By the way, Cash, they should be used within a year, no more. 10:25

Cache : THX! 10:25

Page : Hooray! 10:25

The next day

Page : Hey, I need "/ script- v2.js "," / styles- v2.css "and" /cats-v1.jpg "08:14

Cache : There is a picture with cats, the rest is not. Server? 08:14

Server : Easy - here are the new CSS & JS. Once again, Cash: their shelf life is no more than a year. 08:15

Cache : Super! 08:15

Page : Thanks! 08:15

Cache : Hmm, I haven't used "/script-v1.js" & "/styles-v1.css" long enough. It's time to delete them. 12:32

Using this pattern, you never change the content of a specific URL, you change the URL itself:

Every URL has something that changes along with the content. This can be a version number, a modified date, or a hash of the content (this is the option I chose for my blog).

Most server-side frameworks have tools to do this kind of thing with ease (in Django I use Manifest Static Files Storage); There are also very small libraries in Node.js that do the same thing, like gulp-rev.

However, this pattern is not suitable for things like articles and blog posts. Their URLs cannot be versioned and their content can change. Seriously, I have a lot of grammar and punctuation errors and need to be able to quickly update the content.

Pattern # 2: mutable content that is always revalidated on the server

Cache-Control: no-cache

The content of the URL will change, so ...
Any local cached version cannot be used without specifying a server.

Page : Hey, I need the contents of "/ about /" and "/sw.js" 11:32

Cache : Can't help it. Server? 11:32

Server : There are such. Cash, keep them with you, but ask me before using. 11:33

Cache : Yes sir! 11:33

Page : THX! 11:33

The next day

Page : Hey, I need the contents of "/ about /" and "/sw.js" again 09:46

Cache : Wait a minute. Server, are my copies okay? A copy of "/ about /" is from Monday, and "/sw.js" is from yesterday. 09:46

Server : "/sw.js" did not change ... 09:47

Cache : Cool. Page, hold "/sw.js". 09:47

Server : ... but "/ about /" I have a new version. Cash, hold her, but like last time, remember to ask me first. 09:47

Cache : Understood! 09:47

Page : Fine! 09:47

Note: no-cache does not mean “do not cache”, it means “check” (or revalidate) the cached resource from the server. The no-store orders the browser not to cache at all. Also, must-revalidate does not mean mandatory revalidation, but the fact that the cached resource is used only if it is younger than the specified max-age, and only otherwise it is revalidated. That's how it gets started with caching keywords.

In this pattern, you can add ETag (version ID of your choice) or Last-Modified header to the response. On the next request for content from the client, it outputs If-None-Match or If-Modified-Since, respectively, allowing the server to say “Use what you have, your cache is up to date”, which is to return HTTP 304.

If sending ETag / Last-Modified is not possible, the server always sends the entire content.

This pattern always always requires network requests, so it is not as good as the first pattern that can do without network requests.

It is not uncommon when we do not have the infrastructure for the first pattern, but problems with network requests in pattern 2 can also arise. As a result, an intermediate option is used: a short max-age and mutable content. This is a bad compromise.

Using max-age with mutable content is usually the wrong choice.

And, unfortunately, it is widespread, Github pages can be taken as an example.

Imagine:

/ article /
/styles.css
/script.js

With a server-side header:

Cache-Control: must-revalidate, max-age = 600

URL content changes
If the browser has a cached version fresh 10 minutes, it is used without consulting the server
If there is no such cache, a network request is used, possibly with If-Modified-Since or If-None-Match

Page : Hey, I need "/ article /", "/script.js" and "/styles.css" 10:21

Cache : I have nothing like you, Server? 10:21

Server : No problem, here they are. But remember, Cash: they can be used within the next 10 minutes. 10:22

Cache : There is! 10:22

Page : THX! 10:22

Page : Hey, I need "/ article /", "/script.js" and "/styles.css" again 10:28

Cache : Oops, I'm sorry, but I lost "/styles.css", but I have everything else, take it. Server, can you customize "/styles.css" for me? 10:28

Server : Easy, it has already changed since the last time you picked it up. You can safely use it for the next 10 minutes. 10:29

Cache : No problem. 10:29

Page : Thanks! But it looks like something went wrong! Everything is broken! What is going on? 10:29

This pattern has the right to live in testing, but it breaks everything in a real project and is very difficult to track. In the example above, the server has updated the HTML, CSS and JS, but the page is displayed with the old HTML and JS from the cache, to which the updated CSS from the server has been added. Version mismatch spoils everything.

Often when making significant changes to HTML, we change both the CSS to reflect the new structure correctly, and the JavaScript to keep up with the content and styles. These resources are all independent, but the cache headers cannot express this. As a result, users may have the latest version of one / two resources and an old version of the rest.

max-age is set relative to the response time, so if all resources are transferred as part of the same address, they will expire at the same time, but there is still a small chance of desync. If you have pages that do not include JavaScript or include other styles, their cache expiration dates will be out of sync. And worse, the browser is constantly pulling content from the cache, not knowing that HTML, CSS, & JS are interdependent, so it can happily pull one off the list and forget about everything else. Considering all these factors together, you should understand that the likelihood of mismatching versions is quite high.

For the user, the result could be a broken page layout or other problems. From small glitches to completely unusable content.

Fortunately, users have an emergency exit ...

Refreshing the page sometimes saves

If the page is loaded by refresh, browsers always do server-side revalidation, ignoring max-age. Therefore, if the user has something broken due to max-age, a simple page refresh can fix everything. But, of course, after the spoons are found, the sediment will still remain and the attitude towards your site will be somewhat different.

A service worker can extend the life of these bugs.

For example, you have a service worker like this:

Const version = "2"; self.addEventListener ("install", event => (event.waitUntil (caches.open (`static - $ (version)`) .then (cache => cache.addAll (["/styles.css", "/ script .js "])));)); self.addEventListener ("activate", event => (//… delete old caches…)); self.addEventListener ("fetch", event => (event.respondWith (caches.match (event.request) .then (response => response || fetch (event.request)));));

This service worker:

caches script and styles
uses the cache on match, otherwise accesses the network

If we change the CSS / JS, we also increase the version number, which triggers an update. However, since addAll accesses the cache first, we may end up in a race condition due to max-age and mismatched CSS & JS versions.

After they are cached, we will have incompatible CSS & JS until the next update of the service worker - and this is if we again do not get into a race condition during the update.

You can skip caching in the service worker:

Self.addEventListener ("install", event => (event.waitUntil (caches.open (`static - $ (version)`) .then (cache => cache.addAll ([new Request ("/ styles.css", (cache: "no-cache")), new Request ("/ script.js", (cache: "no-cache"))])));));

Unfortunately, options for caching are not supported in Chrome / Opera and have just been added to Firefox nightly build, but you can do it yourself:

Self.addEventListener ("install", event => (event.waitUntil (caches.open (`static - $ (version)`) .then (cache => Promise.all (["/styles.css", "/ script .js "] .map (url => (// cache-bust using a random query string return fetch (` $ (url)? $ (Math.random ()) `) .then (response => (// fail on 404, 500 etc if (! response.ok) throw Error ("Not ok"); return cache.put (url, response);))))))));));

In this example, I'm flushing the cache using a random number, but you can go ahead and add a hash of the content on build (this is similar to what sw-precache does). This is a kind of JavaScript implementation of the first pattern, but it only works with a service worker, not browsers and CDNs.

Service Workers and HTTP Cache work great together, don't make them fight!

As you can see, you can work around the caching errors in your service worker, but it's better to tackle the root of the problem. Setting up caching correctly not only makes the service worker's job easier, but also helps browsers that don't support service workers (Safari, IE / Edge) and also allows you to get the most out of your CDN.

Correct caching headers can also make it much easier to update the service worker.

Const version = "23"; self.addEventListener ("install", event => (event.waitUntil (caches.open (`static - $ (version)`) .then (cache => cache.addAll (["/", "/ script-f93bca2c. js "," /styles-a837cb1e.css "," /cats-0e9a2ef4.jpg "])));));

Here I have cached the root page with pattern # 2 (server-side revalidation) and all other resources with pattern # 1 (immutable content). Each update of the service worker will cause a request to the root page, and all other resources will only be loaded if their URL has changed. The good news is that it saves traffic and improves performance, whether you are upgrading from a previous version or a very old version.

There is a significant advantage over the native implementation here, where the entire binary is downloaded even with a small change, or invokes complex binary comparisons. This way we can update a large web application with a relatively small load.

Service workers work better as an enhancement rather than a temporary crutch, so work with the cache instead of fighting it.

When used carefully, max-age and mutable content can be very good.

max-age is very often the wrong choice for mutable content, but not always. For example, the original article has a max-age of three minutes. Race conditions are not a problem as there are no dependencies on the page using the same caching pattern (CSS, JS & images use pattern # 1 - immutable content), everything else does not use this pattern.

This pattern means that I am calmly writing a popular article, and my CDN (Cloudflare) can take the load off the server, if, of course, I am willing to wait three minutes for the updated article to become available to users.

This pattern should be used without fanaticism. If I added a new section to an article, and linked to it from another article, I created a dependency that needs to be resolved. The user can click on the link and get a copy of the article without the section he is looking for. If I want to avoid this, I have to update the article, delete the cached version of the article from Cloudflare, wait three minutes, and only then add the link to another article. Yes, this pattern requires caution.

When used correctly, caching can provide significant performance and bandwidth savings. Pass immutable content if you can easily change the URL, or use server-side revalidation. Mix max-age and mutable content if you're bold enough to make sure your content doesn't have any dependencies that might get out of sync.

By including external CSS and Javascript, we want to keep unnecessary HTTP requests to a minimum.

For this, the .js and .css files are served with headers that provide reliable caching.

But what if any of these files change during development? All users have an old version in the cache - until the cache is outdated, a lot of complaints will come about broken integration of the server and client parts.

The correct way of caching and versioning completely eliminates this problem and ensures reliable, transparent synchronization of style / script versions.

Easy ETag Caching

The easiest way to cache static resources is using ETag.

It is enough to enable the appropriate server setting (for Apache it is enabled by default) - and ETag will be given to each file in the headers - a hash that depends on the update time, file size and (on inode-based file systems) inode.

The browser caches such a file and, on subsequent requests, specifies the If-None-Match header from the ETag of the cached document. Having received such a header, the server can respond with a 304 code - and then the document will be taken from the cache.

It looks like this:

First request to the server (cache clean) GET /misc/pack.js HTTP / 1.1 Host: site

In general, the browser usually adds a bunch of headers like User-Agent, Accept, etc. They are cut for brevity.

Server response The server sends in response a document with code 200 and ETag: HTTP / 1.x 200 OK Content-Encoding: gzip Content-Type: text / javascript; charset = utf-8 Etag: "3272221997" Accept-Ranges: bytes Content-Length: 23321 Date: Fri, 02 May 2008 17:22:46 GMT Server: lighttpd Next browser request On next request browser adds If-None-Match: (cached ETag): GET /misc/pack.js HTTP / 1.1 Host: site If-None-Match: "453700005" Server response The server is looking - yes, the document has not changed. This means you can issue the 304 code and not send the document again. HTTP / 1.x 304 Not Modified Content-Encoding: gzip Etag: "453700005" Content-Type: text / javascript; charset = utf-8 Accept-Ranges: bytes Date: Tue, 15 Apr 2008 10:17:11 GMT

Alternatively, if the document has changed, then the server simply sends 200 with a new ETag.

The Last-Modified + If-Modified-Since bundle works in a similar way:

the server sends the last modified date in the Last-Modified header (instead of ETag)
the browser caches the document, and on the next request for the same document sends the date of the cached version in the If-Modified-Since header (instead of If-None-Match)
the server verifies the dates, and if the document has not changed, it sends only the 304 code, no content.

These methods work stably and well, but the browser has to do it on demand anyway for each script or style.

Smart caching. Versioning

The general approach for versioning is in a nutshell:

Version (or modification date) is added to all scripts. For example, http: // site / my.js will turn into http: // site / my.v1.2.js
All scripts are hard-cached by the browser
When the script is updated, the version changes to a new one: http: // site / my.v2.0.js
The address has changed, so the browser will request and cache the file again
The old version 1.2 will gradually fall out of the cache

Hard caching

Hard caching- a kind of sledgehammer that completely nails requests to the server for cached documents.

To do this, just add the Expires and Cache-Control: max-age headers.

For example, to cache for 365 days in PHP:

Header ("Expires:" .gmdate ("D, d M Y H: i: s", time () + 86400 * 365). "GMT"); header ("Cache-Control: max-age =" + 86400 * 365);

Or you can cache content for a long time using mod_header in Apache:

Having received such headers, the browser will hard-cache the document for a long time. All further calls to the document will be served directly from the browser cache, without contacting the server.

Most browsers (Opera, Internet Explorer 6+, Safari) DO NOT cache documents if there is a question mark in the address, because they are considered dynamic.

This is why we add the version to the filename. Of course, with such addresses you have to use a solution like mod_rewrite, we will consider this later in the article.

P.S But Firefox caches addresses with question marks ..

Automatic name translation

Let's see how to automatically and transparently change versions without renaming the files themselves.

Version Name -> File

The simplest thing is to convert the versioned name to the original filename.

At the Apache level, this can be done with mod_rewrite:

RewriteEngine on RewriteRule ^ / (. * \.) V + \. (Css | js | gif | png | jpg) $ / $ 1 $ 2 [L]

This rule processes all css / js / gif / png / jpg files, stripping the version from the name.

For example:

/images/logo.v2.gif -> /images/logo.gif
/css/style.v1.27.css -> /css/style.css
/javascript/script.v6.js -> /javascript/script.js

But besides cutting the version, you also need to add hard-cached headers to the files. For this, the mod_header directives are used:

Header add "Expires" "Mon, 28 Jul 2014 23:30:00 GMT" Header add "Cache-Control" "max-age = 315360000"

And all together it implements such an apache config:

RewriteEngine on # removes the version, and at the same time sets the variable that the file is versioned RewriteRule ^ / (. * \.) V + \. (Css | js | gif | png | jpg) $ / $ 1 $ 2 # hard-cache version files Header add "Expires" "Mon, 28 Jul 2014 23:30:00 GMT" env = VERSIONED_FILE Header add "Cache-Control" "max-age = 315360000" env = VERSIONED_FILE

Due to the way the mod_rewrite module works, the RewriteRule must be placed in the main configuration file httpd.conf or in the files connected to it, but in no case in .htaccess, otherwise the Header commands will be run first, before it is installed variable VERSIONED_FILE.

Header directives can be anywhere, even in .htaccess - no difference.

Automatically add a version to the filename on an HTML page

How to put a version in a script name depends on your templating system and, in general, how you add scripts (styles, etc.).

For example, when using the modification date as the version and the Smarty template engine, links can be set like this:

The version function adds a version:

Function smarty_version ($ args) ($ stat = stat ($ GLOBALS ["config"] ["site_root"]. $ Args ["src"]); $ version = $ stat ["mtime"]; echo preg_replace ("! \. (+?) $! "," .v $ version. \ $ 1 ", $ args [" src "]);)

Result on page:

Optimization

To avoid unnecessary stat calls, you can store an array listing the current versions in a separate variable.

$ versions ["css"] = array ("group.css" => "1.1", "other.css" => "3.0",)

In this case, the HTML is simply substituted with the current version from the array.

You can cross both approaches, and issue a version by modification date during development - for relevance, and in production - a version from an array, for performance.

Applicability

This caching method works everywhere, including Javascript, CSS, images, flash movies, etc.

It is useful whenever the document changes, but the browser should always have the current current version.