Distribution of HTTP Requests per TCP Connection

In order to check the efficacy for either HTTP Pipelining or adoption of HTTP/2.0, I wanted to know the average number of HTTP requests made over a single TCP connection. I fired up the following query

SELECT   round(avg(reqTotal/_connections)) average,
  round(NTH(50, quantiles(reqTotal/_connections))) median,
  round(NTH(75, quantiles(reqTotal/_connections))) p75,
  round(NTH(90, quantiles(reqTotal/_connections))) p90,
  round(NTH(99, quantiles(reqTotal/_connections))) p99
FROM [httparchive:runs.latest_pages];

Interestingly the data is as follows

Avg of 4 requests over a given connection which is the same as 75th percentile. In fact fully half of the pages make less than 3 requests per open connection

To see the same distribution across pages we can use

SELECT INTEGER(ROUND(reqs_per_conn/10)*10) as req_bucket, SUM(pages) as pages FROM (
  SELECT reqTotal/_connections as reqs_per_conn, COUNT(*) AS pages
  FROM [httparchive:runs.latest_pages] 
  GROUP BY reqs_per_conn
) 
GROUP BY req_bucket
ORDER BY req_bucket;

Some of the top sites that exhibit this behavior are Craigslist(http://httparchive.org/viewsite.php?pageid=18240620) and www.wordpress.com(http://httparchive.org/viewsite.php?pageid=18240585) as evidenced by the following query:

SELECT rank,url,reqTotal,_connections
FROM [httparchive:runs.latest_pages]
where reqTotal == _connections and rank < 1000;

Rakuten.co.jp site serves 754 requests using only 61 connections (http://httparchive.org/viewsite.php?pageid=18240705)

For HTTP/2, it really needs to be requests per hostname; browsers will only be using one conn per hostname.

SELECT
  round(NTH(50, quantiles(total_reqs/total_origins))) p50,
  round(NTH(75, quantiles(total_reqs/total_origins))) p75,
  round(NTH(90, quantiles(total_reqs/total_origins))) p90,
  round(NTH(99, quantiles(total_reqs/total_origins))) p99
FROM (
  SELECT pageid, count(hostname) as total_origins, sum(reqs) total_reqs FROM (
    SELECT pageid, HOST(url) as hostname, count(*) reqs
    FROM [httparchive:runs.latest_requests]
    GROUP BY pageid, hostname
  )
  GROUP BY pageid
)

Number of requests per origin:

Also, looking at origins by themselves:

SELECT
  NTH(50, quantiles(total_origins)) p50,
  NTH(75, quantiles(total_origins)) p75,
  NTH(90, quantiles(total_origins)) p90,
  NTH(99, quantiles(total_origins)) p99
FROM (
  SELECT pageid, count(hostname) as total_origins, sum(reqs) total_reqs FROM (
    SELECT pageid, HOST(url) as hostname, count(*) reqs
    FROM [httparchive:runs.latest_requests]
    GROUP BY pageid, hostname
  )
  GROUP BY pageid
)

So, “median site” is fetching resources from 11 origins.

Finally, since we’re on the subject. Distribution of requests:

SELECT
  NTH(50, quantiles(total_reqs)) p50,
  NTH(75, quantiles(total_reqs)) p75,
  NTH(90, quantiles(total_reqs)) p90,
  NTH(99, quantiles(total_reqs)) p99
FROM (
  SELECT pageid, count(hostname) as total_origins, sum(reqs) total_reqs FROM (
    SELECT pageid, HOST(url) as hostname, count(*) reqs
    FROM [httparchive:runs.latest_requests]
    GROUP BY pageid, hostname
  )
  GROUP BY pageid
)

Websites use a lot of domains but many of these are 3rd party domains. I think it’s good to look at the Max Requests on 1 Domain stat from HTTP Archive:

The shows that the average website loads 50+ resources on a single domain. Pipelining and HTTP/2 would bring a big benefit to these sites.