numDomElements data

Very frequently there are discussions about complexity based on the size of the DOM. A number of features aren’t implemented because of the complexity they would create with large trees - the number of DOM elements on a page. Inevitably this involves discussions on how big trees really are in practice/on average. There are pages like the HTML Living Standard, Single Page edition that has > 140k elements, but it’s widely accepted that this isn’t the norm - the question for me has always been how can we get decent information on the norms…

I noticed that the HTTP Archive includes some information about the average # DOM elements

However, this would seem to imply that 3200 is the max # of elements in the DOM tree in the top million sites… are there 0 in this dataset with >3200? If so, assuming there was one, would it show up as a new bucket with a range, or would it just show up in an “>3200” bucket?

As a start, a quick way to enumerate the “heaviest” sites with respect to number of DOM elements:

SELECT url, numDomElements 
FROM [runs.2016_05_01_pages] 
ORDER BY numDomElements desc


1	120086	 
2	89844	 
3	84249	 
4	76956	 
5	74064	 
6	68784	 
7	64500	 
8	62185	 
9	60091	 
10	58149

Restricting to “top 1k” sites:

SELECT url, numDomElements 
FROM [runs.2016_05_01_pages] 
WHERE rank < 1000
ORDER BY numDomElements desc


Row	url	numDomElements	 
1	22482	 
2	19007	 
3	12563	 
4	11274	 
5	10219	 
6	9488	 
7	8541	 
8	8321	 
9	8208	 
10	8123	

Closer to your actual question… Quantiles for element counts:

  NTH(10, quantiles(numDomElements,101)) tenth,
  NTH(20, quantiles(numDomElements,101)) twentieth,
  NTH(30, quantiles(numDomElements,101)) thirtieth,
  NTH(40, quantiles(numDomElements,101)) fortieth,
  NTH(50, quantiles(numDomElements,101)) fiftieth,
  NTH(60, quantiles(numDomElements,101)) sixtieth,
  NTH(70, quantiles(numDomElements,101)) seventieth,
  NTH(80, quantiles(numDomElements,101)) eightieth,
  NTH(90, quantiles(numDomElements,101)) ninetieth,
  NTH(95, quantiles(numDomElements,101)) ninety_fifth,
  NTH(99, quantiles(numDomElements,101)) ninety_ninth
FROM [httparchive:runs.latest_pages]

And perhaps the initial question is why the histogram stops at 3200. I believe I don’t show any histograms where the percentage is < 1%. A possible improvement in this situation might be to say “> 2800” (rather than “2801-3200”).

Yeah, this was effectively my question - looking at the chart it seems to imply that there just aren’t sites bigger than 3200 elements or something, didn’t seem right. I guess I misunderstood, but maybe others would too