Analyzing Largest Contentful Paint stats via Lighthouse audits

In a blog post, I shared some insights into the Largest Contentful Paint metric using data from Lighthouse audits. This post shares the SQL queries and data I used to prepare that article.

Extracting LCP Details from the Lighthouse table

The Lighthouse table is quite large, so the most efficient way to run SQL queries against it is to determine what you want to extract and then create a smaller table with just the data for your analysis. When I started this analysis, I read through the JSON of a Lighthouse audit to look for any useful data I could extract. I ran the following query against the May 2021 Lighthouse table, and saved the results in the table httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats.

-- WARNING: this query will process 2.2TB of data
SELECT  
  url as pageURL,
  CAST(JSON_EXTRACT(report, "$.audits.largest-contentful-paint.numericValue") AS FLOAT64) as lcp_value,
  JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.displayValue") as displayValue,
  JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.nodeLabel") as nodeLabel,
  JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.path") AS nodePath,
  CAST(JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.boundingRect.height") AS INT64) AS height,
  CAST(JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.boundingRect.width") AS INT64) AS width,
  CAST(JSON_EXTRACT(report, "$.audits.preload-lcp-image.score") AS FLOAT64) as preloadLCP_score,
  CAST(JSON_EXTRACT(report, "$.audits.preload-lcp-image.numericValue") AS FLOAT64) as preloadLCP_numericValue,  
  JSON_EXTRACT(report, "$.audits.preload-lcp-image.details.items[0].url") AS lcp_imageURL
  
FROM `httparchive.lighthouse.2021_05_01_mobile`

The resulting scratchspace table is 1.8 GB.

How Large is the Largest Contentful Paint?

When Lighthouse identifies the LCP element, it paints a yellow rectangle around it. The dimensions of this rectangle can be used to calculate the area of the LCP element. In this example, the product of the LCP image’s height (191) and width (340) was 64,940 pixels. Since the Lighthouse test was run with an emulated Moto G4 user agent with a screen size of 640x360, we can also calculate that this particular LCP image took up 28% of the viewport.

In order to create the graph CDF and histograms below, I ran the following query -

SELECT 
  ROUND((height * width) / (640*360),2) AS pctScreen,
  SUM(freq) AS sites
FROM (
  SELECT 
    height, 
    width, 
    COUNT(*) AS freq
  FROM `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats`
  GROUP BY height, width
)
GROUP BY pctScreen
ORDER BY pctScreen ASC

Node Paths of LCP Elements
Lighthouse identifies the node path of LCP elements, and an example of one is:

1,HTML,1,BODY,8,DIV,2,SECTION,1,DIV,0,DIV,0,DIV,0,UL,0,LI,0,ARTICLE,1,DIV,0,DIV,0,A,0,IMG

I thought it would be interesting to extract the last segment in the node path and summarize them. In order to do that use used ARRAY_REVERSE to reverse the string and SPLIT to extract the last element. The query for this is:

SELECT 
  ARRAY_REVERSE(SPLIT(nodePath))[SAFE_OFFSET(0)] last_segment,
  COUNT(*) as freq
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats`
GROUP BY last_segment
ORDER BY freq DESC

Image Weight for the LCP
There’s a Lighthouse audit that will identify the LCP image and recommend preloading it.

In order to summarize the distribution of LCP element sizes, I ran the following query


SELECT 
  ROUND(respBodySize/1024/16)*16 AS LCP_ImageKB,
  COUNT(*) as sites
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON 
   summary_pages.pageid = summary_requests.pageid
WHERE 
   lcp_imageURL IS NOT NULL
   AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY 
   LCP_ImageKB
ORDER BY 
   LCP_ImageKB

This next query was used to summarize the LCP element as a percentage of Image Weight and Page Weight. In order to do this, we needed to JOIN the lighthouse stats with the summary_pages and summary_requests table so that we could get the size of the LCP element.

SELECT 
  percentile,
  ImageRequests,
  ROUND(ImageKB,2) AS ImageKB,
  ROUND(TotalKB,2) AS TotalKB,
  ROUND(LCP_ImageKB,2) AS LCP_ImageKB,
  ROUND(LCP_ImageKB/ImageKB*100,2) as pctLCPofImages, 
  ROUND(LCP_ImageKB/TotalKB*100,2) as pctLCPofTotal
  
FROM (
SELECT 
  "p25" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(251)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(251)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(251)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(251)] AS LCP_ImageKB
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url

UNION ALL

SELECT 
  "p50" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(501)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(501)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(501)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(501)] AS LCP_ImageKB
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url

UNION ALL

SELECT 
  "p75" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(751)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(751)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(751)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(751)] AS LCP_ImageKB,
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url

UNION ALL

SELECT 
  "p95" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(951)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(951)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(951)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(951)] AS LCP_ImageKB
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url

)
ORDER BY percentile ASC

Largest Contentful Paint by Popularity

The HTTP Archive now contains rank groupings, obtained from the Chrome User Experience Report. This can enable us to segment this analysis based on the popularity of sites. The rank grouping indicator buckets sites into the top 1K, 10K, 100K, 1 million and 10 million.

SELECT
  rank,
  percentile,
  sites,
  ImageRequests,
  ROUND(ImageKB,2) AS ImageKB,
  ROUND(TotalKB,2) AS TotalKB,
  ROUND(LCP_ImageKB,2) AS LCP_ImageKB,
  ROUND(LCP_ImageKB/ImageKB*100,2) as pctLCPofImages, 
  ROUND(LCP_ImageKB/TotalKB*100,2) as pctLCPofTotal
  
FROM (

SELECT 
  rank,
  "p25" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(251)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(251)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(251)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(251)] AS LCP_ImageKB
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank

UNION ALL

SELECT 
rank,
"p50" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(501)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(501)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(501)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(501)] AS LCP_ImageKB
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
UNION ALL

SELECT 
  rank,
  "p75" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(751)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(751)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(751)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(751)] AS LCP_ImageKB,
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
UNION ALL

SELECT 
  rank,
  "p95" as percentile,
  COUNT(*) as sites,
  APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(951)] AS ImageRequests,
  APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(951)] AS ImageKB,
  APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(951)] AS TotalKB,
  APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(951)] AS LCP_ImageKB
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
  
)

ORDER BY rank, percentile

This last query was used to summarize the image format served. Like the previous query, this JOINs the summary_pages and summary_requests table so that we can identify the image format.

SELECT 
  format,
  COUNT(*) as sites
FROM 
  `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
  `httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
  `httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
  AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url

GROUP BY format
ORDER BY sites DESC

All of the query results and graphs from this blog post are available here - What can the HTTP Archive tell us about Largest Contentful Paint - Google Sheets

2 Likes