Varnish HTTP Accelerator Presentation Notes

varnish-cache-image

Poul-Henning Kamp discusses Varnish HTTP Cache at TYPO3 2010 in Frankfurt. In this post, I will provide detailed notes of his presentation. This is for learning purposes and may contain inconsistencies. I will do my best to keep it sharp.

Varnish HTTP Accelerator Presentation Slides

Notes From Varnish HTTP Cache @ TYPO3 w. Poul-Henning Kamp

Content creation can be automated, you have a master copy and you want to distributed it as much as possible. You really want it to be faster than the Linotype, and a step beyond the Heidelberg Print Machine. Varnish’s Elevator Pitch, “Varnish delivers content faster & reliably, reduces the load on your CMS database, cheap hardware does 100+ kreq/s, can assist in content composition, can fixx stupid mistakes, fast, is Free & Open Source Software, has commercial support.”

After 15 years contributing to the FreeBSD project, Poul-Henning Kamp was approached by VG.no to write a web server. VG.no utilizes a slow CMS which needed a caching system that speeds up the HTTP accelerators. With Varnish, Kamp was able to reduce the use of 12 Squid cache services to a mere 3 Varnish cache servers and greatly reducing the response times.

Poul-Henning Kamp’s goals for Varnish started out simple

  • Varnish is only a HTTP Accelerator.
  • Better configuration.
  • Much Faster
  • Content Management Focused Feature Set

“We don’t do FTP, we do HTTP and we do it damn well,” Kamps continues “This is not a cache to put on the client side, this is a cache to put on the server side. It’s important to understand one thing here, the controlling standard for HTTP is still RFC2616, and if you read it real carefully you will find one place where it mentions a cache on the server side,” “then they realize, actually a cache on the serverside is just another web server.”

Cache on the client size has constraints, you cannot cache per user, private cache, crypto, etc. Varnish as a web server can cache anything we want server side, including crypto!

Understanding Varnish and VCL

Varnish aims to make the configuration process simple in VCL or C language. It’s important to understand the operations of Varnish according to the diagram below.

varnish-state-machine

An example of receive VCL code below.

sub vcl_recv {
if(req.url ~ “\.\.|\.exe”) {
error(999, “Bugger off.”);
}
if(client.ip ~ editor_ip) {
set req.http.x-cms = “no stats”;
return(pass);
}
if(req.url ~ “\.(jpg|png|gif|css)$”) {
unset req.http.cookie;
unset req.http.authenticate;
set req.backend = static_backend;
}
if(req.url == “hotstory.hmtl”) {
set.url = “hotstory.html”);
}
}

Understanding Why Varnish Cache Language Is Epic

  • Compiled to binary/shlib via C-code
    • Runs full speed
  • You can have multiple VCL’s loaded at the same time
    • Switch between them without restart
    • Instantaneous
  • Allows you to do anything you might fancy
    • Inline-C code, ’nuff said.
    • Modules/shlib will make it easier (3.0 feature)

Wikia has broken the 4000+ lines of VCL code and is available online via SVN for all to study.

An example of how Wikia’s VCL code utilizing multiple varnish servers. In this example there are two Varnish servers, one in Germany and one in England. The client is located in Germany attempted to access a server in the US. The client hits the Germany varnish server first, but the German Varnish server tells the client to connect to England Varnish than to the US because England’s pipe to the United States is bigger. This speeds up the load times since Germany’s direct tube to the US is clogged. You can’t just dump stuff on it, it’s a truck.

sub vcl_recv {
if (client.ip == “varnish1”) {
set req.backend = usa;
} else {
set req.backend = england;
}
}

Managing Varnish

Varnish is a daemon process on your computer and utilizing Command Line Interface for real-time control. The Management/Working process split. The manager allows (re)start the worker proc. Allows privilege separation. It also contains multithreading to worker process.

varnish_architecture_shah_anand

Image Source: shah-anand.com (thx bro).

Looking over the Varnish architecture we have one binary program that contains two processes, the Managing Process and Cacher process. The managing process will take your VCL code, send it to the C-compiler, and off to shared object for the Cacher process loads. Varnish Cache also has a cluster control concept that can control 10 different instances of Varnish is separate geolocations. The cluster control is still a concept and is not built out.

One thing to note: Varnish does not write .lock files – it has a shared memory segment, it places lock information and statistics in there and other applications will pull from the shared memory.

Brief CLI Management

Example of CLI Management below (I am a bit lost on this because on param.show I get an error on my instance.

$ telnet localhost 80
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’
param.show
200 675
default_ttl 120 [seconds]
thread_pools 5 [pools]
thread_pool_max 1500 [threads]
thread_pool_min 1 [threads]
thread_pool_timeout 120 [seconds]
overflow_max 100 [%]
http_workspace 8192 [bytes]
sess_timeout 5 [seconds]
pipe_timeout 60 [seconds]
send_timeout 600 [seconds]
auto_restart on [bool]
[…]

Performance & Speed

Programmed for performance since day one. They wanted to make sure they were aiming for wire speed. Performance is not something you do after the fact, you do it from day one and don’t add things that will bog it down. Writing the code for Varnish aims for today’s complex hardware. This isn’t designed for your dad’s computer anymore!

Use modern features:

  • Virtual Memory
  • sendfile(2), accept_filters(2), kqueue(2)
  • and ever other trick in the book

Performance price list has changed, we can execute 100,000,000+/s per cpu. Disk I/O is behind on response time, stored memory is instead placed in virtual page memory (RAM). Classical logging is horribly expensive. (Examples in slides).

Where does my traffic come from?

Below are a few commands that you can use to access information on a live Varnish instance.

$ varnishtop -i Rxheader
$ varnishtop -i Rxurl
$ varnishhist

Varnishhist is real-time histogram that shows the cache-miss (#) / cache-hit (|) on an x & y axis.
$ varnishstat

Varnishstat pulls real-time statistics from shared memory

Content Management Features

  • Instant action purges/bans (regex or exact match)
  • TTL/Caching policy control in VCL
  • Load/Situation mitigation in VCL
  • Header washing
  • Vary
  • Edge-side-includes (ESI)

27:14

Read More