Methods of HTTP Caching

Transport of Cached Content

Intermediate Protocol Change

While traversing a cache hierarchy, a request that a client made in protocol X (for example HTTP version 1.1) could be changed to occur in protocol Y (for example HTTP version 1.0) when propagating from a cache to a parent cache or to the authoritative source. Caches operating as conventional web proxies are required by the HTTP specification to document such changes in the Via header that is returned to the client.

It is completely up to the caching proxy to manage the implications this protocol change has for cache semantics. The client will always receive a response in the protocol version of the original request, independent from the protocol support of servers higher up in the cache hierarchy.

HTTP Keep-Alive

HTTP and HTTPS are protocols based on TCP. To perform a request and its response, a TCP connection must be established between client and server. Establishing and maintaining a TCP connection is a complex operation. A TCP handshake has to be performed, counters and buffers have to be maintained by each TCP/IP networking stack involved in the connection, which not only affects the HTTP clients and servers in the cache hierarchy but also intermediate stateful gateways and firewalls that potentially perform connection tracking, shaping, redirecting or other transformations. Using HTTPS with TLS further increases the per-connection effort, because TLS performs an own handshake and maintains additional state on client and server concerned with the encryption and decryption of data.

If a TCP connection between a client and a server can be “kept alive” over a longer period of time, to be reused by the client for subsequent requests, fewer TCP connections have to be established and maintained by HTTP clients, servers, operating systems and intermediate gateways. In HTTP version 1.0 a connection can optionally be put into a “keep-alive mode” by the client using HTTP header Connection: keep-alive in the first request. Since HTTP version 1.1 all connections are implicitly in keep-alive mode, meaning, they can be reused for further requests.

Keeping TCP connections alive is unreliable from a perspective of the HTTP client, because TCP connections can be terminated by the server and intermediate network nodes at any time.

In practice, user agents have established heuristics that decided how many connections should be kept alive and the period of inactivity between a client and a server, after which a connection should time out and be terminated. The benefit of re-using existing connections outweigh the cost of maintaining unused connections until they expire.

Popular web browsers maintain between 2 and 8 keep-alive connections per proxy and per authoritative web server, with some two minutes as default expiration timeout. In the “firefox” web browser, whether to use keep-alive or not can be configured separately for direct and for proxy connections, and the setting of direct connection keep-alive has no effect for any connection that is performed via a configured proxy.

HTTP Chunked Transfer

Keep-alive can cause problems when requesting resources that do not specify a Content-Length response header when accessing large resources parts of which require a transmission time that exceeds the keep-alive timeout interval. This situation occurs for example when data is delivered from server to client in a “streaming” fashion: In the absence of an initially known content length, the client can not decide if the current response is finished from the stream of body data alone.

To address the problem of missing initial content length, HTTP 1.1 offers the possibility to split response data into “chunks”. The response can announce that it’s data will be encoded into chunks by setting the header Transfer-Encoding: chunked. The transferred data then will appear, one chunk per line, each chunk preceeded by a hexadecimal representation of the number of bytes in the next chunk. Chunked data transfer is terminated explicitly by a chunk of length zero and optionally a trailer.

HTTP Pipelining

HTTP version 1.1 introduced a feature called pipelining, where a single established TCP connection between HTTP client and server could be used to issue a series of multiple HTTP resource requests at once. In the request phase of the communication, a client would state a series of resource requests, and in the response phase the server would deliver the responses to these requests, one by one, in the requested order, across the same connection.

In practice, enabling pipelining between user agents and servers on a general basis has lead to more problems than benefits:

The pipeline mechanism operates in a manner of “first ordered, first served”. This leads to a class of “head of line” transmission problems, where loading an important resource is delayed because it was pipelined with a less important resource, and the less important one was requested earlier. Since a pipeline response must deliver the resources one after the other in the order they were requested, and since the web application has little or no influence on the way the user agent pipelines and orders the requests, readying a complex web application consisting of interdependent resources could be stalled.
Generally, requesting and serving pipelined requests is a more complex task than non-pipelined requests; it introduces additional code complexity, maintenance effort and timing challenges. Quality of implementation varies across implementations of user agents, proxies, caches and servers which causes reliability problems.
Pipelining conflicts with certain HTTP authentication schemes that grant access to certain resources depending on the identity of the authenticated client.

In Squid, NTLM- and Kerberos-authentication are not functional if pipelining is enabled, and the default configuration of Squid does not enable it.

HTTP 1.1 pipelining is disabled or unavailable in many if not most modern web browsers, can be enabled in caching clients, and many popular web servers support it.

Pipelining has no specific meaning for the HTTP cache semantics described previously. If enabled, it can pose challenges to cache validation, because pipelined requests of the same resource could result in contradicting validation criteria, for example contradicting entity tags or different age values.

Implementations of pipelining HTTP clients, especially proxies, with errors in their handling of chunked transfers of requests form the basis for a class of information-disclosure attacks known as “HTTP request smuggling”. See [Demir 2020] for a practical analysis.

Concatenated Resources

To avoid TCP connection overhead, web developers concatenate multiple resources into one and utilize means of the presentation system to extract and use relevant portions from the resource content.

From a caching point of view, this method is questionable, because it imposes the semantics of a single resource onto what in reality is a multitude of resources. Any change to one logical entity within the combined resource requires a cache refresh for the entire resource, even if all other portions of the resource remain unmodified.

HTTP Preconnection

Modern web browsers offer the possibility to request opening a connection to an authoritative source as early as possible using the “preconnect” resource hint. This can be accomplished by a response HTTP header Link:

Link: <//example,com>; rel="preconnect"

Alternatively, a response can supply a HTML document that contains in its <head> section:

<link
   rel="preconnect"
   href="//example.com">

It is up to the client to implement this behavior. If a user agent is configured to use a proxy for the connection to the webserver, using this technique will most likely contribute to the TCP connection load on the proxy server in HTTP protocol versions below 2 (HTTP version 2 mandates a maximum connection count of one between clients and proxies, see below).

HTTP Prefetching

Modern web browsers offer the possibility to instruct the user agent to request, retrieve and potentially cache a remote resource before it is actually used by any displayed document or executed script. This can be accomplished by a response HTTP header Link:

Link: <//example.com/next-page.html>; rel="prefetch"; as="document"

Alternatively, a response can supply a HTML document that contains in its <head> section:

<link
    rel="prefetch"
    href="//example.com/next-page.html"
    as="document">

It is up to the client to implement this behavior. A client-side cache can be expected to validate prefetch requests and to cache responses appropriately.