In a blog post, I shared some insights into the Largest Contentful Paint metric using data from Lighthouse audits. This post shares the SQL queries and data I used to prepare that article.
Extracting LCP Details from the Lighthouse table
The Lighthouse table is quite large, so the most efficient way to run SQL queries against it is to determine what you want to extract and then create a smaller table with just the data for your analysis. When I started this analysis, I read through the JSON of a Lighthouse audit to look for any useful data I could extract. I ran the following query against the May 2021 Lighthouse table, and saved the results in the table httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats
.
-- WARNING: this query will process 2.2TB of data
SELECT
url as pageURL,
CAST(JSON_EXTRACT(report, "$.audits.largest-contentful-paint.numericValue") AS FLOAT64) as lcp_value,
JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.displayValue") as displayValue,
JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.nodeLabel") as nodeLabel,
JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.path") AS nodePath,
CAST(JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.boundingRect.height") AS INT64) AS height,
CAST(JSON_EXTRACT(report, "$.audits.largest-contentful-paint-element.details.items[0].node.boundingRect.width") AS INT64) AS width,
CAST(JSON_EXTRACT(report, "$.audits.preload-lcp-image.score") AS FLOAT64) as preloadLCP_score,
CAST(JSON_EXTRACT(report, "$.audits.preload-lcp-image.numericValue") AS FLOAT64) as preloadLCP_numericValue,
JSON_EXTRACT(report, "$.audits.preload-lcp-image.details.items[0].url") AS lcp_imageURL
FROM `httparchive.lighthouse.2021_05_01_mobile`
The resulting scratchspace table is 1.8 GB.
How Large is the Largest Contentful Paint?
When Lighthouse identifies the LCP element, it paints a yellow rectangle around it. The dimensions of this rectangle can be used to calculate the area of the LCP element. In this example, the product of the LCP image’s height (191) and width (340) was 64,940 pixels. Since the Lighthouse test was run with an emulated Moto G4 user agent with a screen size of 640x360, we can also calculate that this particular LCP image took up 28% of the viewport.
In order to create the graph CDF and histograms below, I ran the following query -
SELECT
ROUND((height * width) / (640*360),2) AS pctScreen,
SUM(freq) AS sites
FROM (
SELECT
height,
width,
COUNT(*) AS freq
FROM `httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats`
GROUP BY height, width
)
GROUP BY pctScreen
ORDER BY pctScreen ASC
Node Paths of LCP Elements
Lighthouse identifies the node path of LCP elements, and an example of one is:
1,HTML,1,BODY,8,DIV,2,SECTION,1,DIV,0,DIV,0,DIV,0,UL,0,LI,0,ARTICLE,1,DIV,0,DIV,0,A,0,IMG
I thought it would be interesting to extract the last segment in the node path and summarize them. In order to do that use used ARRAY_REVERSE
to reverse the string and SPLIT
to extract the last element. The query for this is:
SELECT
ARRAY_REVERSE(SPLIT(nodePath))[SAFE_OFFSET(0)] last_segment,
COUNT(*) as freq
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats`
GROUP BY last_segment
ORDER BY freq DESC
Image Weight for the LCP
There’s a Lighthouse audit that will identify the LCP image and recommend preloading it.
In order to summarize the distribution of LCP element sizes, I ran the following query
SELECT
ROUND(respBodySize/1024/16)*16 AS LCP_ImageKB,
COUNT(*) as sites
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON
summary_pages.pageid = summary_requests.pageid
WHERE
lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY
LCP_ImageKB
ORDER BY
LCP_ImageKB
This next query was used to summarize the LCP element as a percentage of Image Weight and Page Weight. In order to do this, we needed to JOIN the lighthouse stats with the summary_pages and summary_requests table so that we could get the size of the LCP element.
SELECT
percentile,
ImageRequests,
ROUND(ImageKB,2) AS ImageKB,
ROUND(TotalKB,2) AS TotalKB,
ROUND(LCP_ImageKB,2) AS LCP_ImageKB,
ROUND(LCP_ImageKB/ImageKB*100,2) as pctLCPofImages,
ROUND(LCP_ImageKB/TotalKB*100,2) as pctLCPofTotal
FROM (
SELECT
"p25" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(251)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(251)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(251)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(251)] AS LCP_ImageKB
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
UNION ALL
SELECT
"p50" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(501)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(501)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(501)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(501)] AS LCP_ImageKB
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
UNION ALL
SELECT
"p75" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(751)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(751)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(751)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(751)] AS LCP_ImageKB,
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
UNION ALL
SELECT
"p95" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(951)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(951)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(951)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(951)] AS LCP_ImageKB
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
)
ORDER BY percentile ASC
Largest Contentful Paint by Popularity
The HTTP Archive now contains rank groupings, obtained from the Chrome User Experience Report. This can enable us to segment this analysis based on the popularity of sites. The rank grouping indicator buckets sites into the top 1K, 10K, 100K, 1 million and 10 million.
SELECT
rank,
percentile,
sites,
ImageRequests,
ROUND(ImageKB,2) AS ImageKB,
ROUND(TotalKB,2) AS TotalKB,
ROUND(LCP_ImageKB,2) AS LCP_ImageKB,
ROUND(LCP_ImageKB/ImageKB*100,2) as pctLCPofImages,
ROUND(LCP_ImageKB/TotalKB*100,2) as pctLCPofTotal
FROM (
SELECT
rank,
"p25" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(251)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(251)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(251)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(251)] AS LCP_ImageKB
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
UNION ALL
SELECT
rank,
"p50" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(501)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(501)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(501)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(501)] AS LCP_ImageKB
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
UNION ALL
SELECT
rank,
"p75" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(751)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(751)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(751)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(751)] AS LCP_ImageKB,
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
UNION ALL
SELECT
rank,
"p95" as percentile,
COUNT(*) as sites,
APPROX_QUANTILES(reqImg, 1000)[SAFE_ORDINAL(951)] AS ImageRequests,
APPROX_QUANTILES(bytesImg/1024, 1000)[SAFE_ORDINAL(951)] AS ImageKB,
APPROX_QUANTILES(bytesTotal/1024, 1000)[SAFE_ORDINAL(951)] AS TotalKB,
APPROX_QUANTILES(respBodySize/1024, 1000)[SAFE_ORDINAL(951)] AS LCP_ImageKB
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY rank
)
ORDER BY rank, percentile
This last query was used to summarize the image format served. Like the previous query, this JOINs the summary_pages and summary_requests table so that we can identify the image format.
SELECT
format,
COUNT(*) as sites
FROM
`httparchive.scratchspace.2021_05_01_lighthouse_lcp_stats` AS lcp_stats
INNER JOIN
`httparchive.summary_pages.2021_05_01_mobile` AS summary_pages
ON lcp_stats.pageURL = summary_pages.url
INNER JOIN
`httparchive.summary_requests.2021_05_01_mobile` AS summary_requests
ON summary_pages.pageid = summary_requests.pageid
WHERE lcp_imageURL IS NOT NULL
AND REPLACE(lcp_imageURL,"\"","") = summary_requests.url
GROUP BY format
ORDER BY sites DESC
All of the query results and graphs from this blog post are available here - What can the HTTP Archive tell us about Largest Contentful Paint - Google Sheets