Lighthouse performance analysis (starter experience)

#1

Only just recently found out about the httparchive (through the Almanac plans) I got excited to also start digging into archive/lighthouse data and see if I could find something nice.

Three ideas:

  1. Visualize how front-end performance technologies (pre-load, pre-catch, http/2, compression) are adopted over time.
    https://docs.google.com/spreadsheets/d/1v5ylvMbIQKAWgXiRerA84hR_kyryo7WStGruegv9SUQ/edit?usp=sharing (has a chart… only need to understand the Lighthouse scoring system better, see SQL below)

  2. Migration from render-blocking Javascript in the head over time.
    https://docs.google.com/spreadsheets/d/1LQWG4671sjYO0k0VlyCojAAV-ukLCgu54SoYVAi8IMA/edit?usp=sharing (has a chart… only not much to see ~2% variation, maybe use averages?)

  3. Visualize whether TTFB + FMP take less % to complete over total load time.
    https://docs.google.com/spreadsheets/d/1cqMKGAhWU5kqLLvIO1Y5uRce0mdLtTRGZzmORA9XdCI/edit?usp=sharing (has a chart… only also needs some work)

(all three limited to the top 10k hosts)

This topic will further log my progress and challenges, maybe helping out other starters:


Challenges:

  • It took some time for me to accept the limitations of all data being related to the homepage-only. E.g. for flagging technologies I expected the whole host. Hard to then make a general statement about tech x powering % of the web.
  • It took some time for me to understand the various Big Query tables, their contents and more importantly their differences.
  • I miss the Alexa rank for urls. While 4M urls is nice, the limiting data to the top [x] could be more representative. It takes expensive SELECT hacks to get the rank value back.
  • Lighthouse’s SEO tests aren’t very insightful.

Experience:

  • BigQuery Standard SQL is actually pretty accessible. The UI tool works great especially good debugging details. But having a “This query will process 3.49 TB when run.” notice while just started learning a system is a mental (and financial) threshold.
    [WIP]

Idea #1 SQL:

   CREATE TEMP FUNCTION isFound(json_array STRING)
   RETURNS INT64
   LANGUAGE js AS """
   var arr = JSON.parse(json_array);
   if(arr) {
      if(arr.scoreDisplayMode == 'binary') {
          return arr.rawValue === true ? 1 : 0;
      } else if(arr.scoreDisplayMode == 'numeric') {
          return arr.score;
      }
   }
   return 0;
   """;

 SELECT
   SUBSTR(_TABLE_SUFFIX, 0, 7) AS date,
   COUNT(url) AS `total`,
   COUNT(DISTINCT url) AS `distinct_total`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.uses-rel-preload')) )AS `uses_rel_preload`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.uses-rel-preconnect'))) AS `uses_rel_preconnect`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.uses-http2'))) AS `uses_http2`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.uses-text-compression'))) AS `uses_text_compression`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.uses-long-cache-ttl'))) AS `uses_long_cache_ttl`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.uses-responsive-images'))) AS `uses_responsive_images`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.font-display'))) AS `font_display`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.unminified-javascript'))) AS `unminified_javascript`,
   SUM(isFound(JSON_EXTRACT(report, '$.audits.unminified-css'))) AS `unminified_css`
 FROM
   `httparchive.lighthouse.*`
 WHERE
   _TABLE_SUFFIX >= '2018_10_01'
   AND ENDS_WITH(_TABLE_SUFFIX,
     '_01_mobile')
   AND NET.HOST(url) IN (
   SELECT
     NET.HOST(url) AS host
   FROM
     `httparchive.summary_pages.2018_06_01_desktop`
   WHERE
     rank < 10000
   ORDER BY
     rank ASC)
  GROUP BY _TABLE_SUFFIX
  ORDER BY _TABLE_SUFFIX ASC
1 Like