Querying Blink features

One of the easiest ways to determine if a page makes use of a particular API or capability is to check to see if its “use counter” has been triggered. For example, if the page includes a <form> element, the feature named FormElement with ID 84 is counted, along with a timestamp of when it was first discovered.

This is really useful because HAR files from WebPageTest can easily be queried to see if the feature name or ID is included. The alternative can be prohibitively expensive, for example querying all response bodies and using a regular expression to try to find something resembling the <form> tag. Parsing HTML using regular expressions can be a nightmare and prone to error, besides the time and cost complexity to query TB of data.

WebPageTest recently made a change to the way feature usage is reported.

// Old schema
blinkFeatureFirstUsed = {
  "Features": {
    // Feature name
    "FormElement": 249.064,
    // Feature ID
    "Feature_2202": 248.489
  }
}

Previously, the blinkFeatureFirstUsed.Features object was a map from feature names to first-used timestamps. If the feature name was unknown, Feature_{ID} was used instead. This may cause inconsistencies when you don’t know whether to query a feature by its name or ID.

So the schema was recently updated to be unambiguous:

// New schema
blinkFeatureFirstUsed = {
  "Features": {
    // Feature ID
    "84": {
      // Timestamp of first use
      "firstUsed": 249.064,
      // Name of the feature
      "name": "FormElement"
    },
    // Feature ID
    // "name" is unknown
    "2202": {
      "firstUsed": 248.489
    }
  }
}

The object is now a map of feature IDs to an object containing the firstUsed timestamp and an optional name field, when applicable.

If you’re trying to query for a particular feature, one bulletproof way would be to look it up by name, and fall back to ID if that fails:

#standardSQL
SELECT
  COUNTIF(IFNULL(
    JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.FormElement'),
    JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.84')
  ) IS NOT NULL)
FROM
  `httparchive.pages.2020_02_01_desktop`

This new schema takes effect in the February 2020 (2020_02_01) dataset, so changing the query to January should also work.

For a more complicated example of why this is useful, when we generate the tables in the httparchive.blink_features dataset, we can now more easily fall back to the ID when the name is unknown:

#standardSQL
CREATE TEMP FUNCTION features(payload STRING)
RETURNS ARRAY<STRUCT<name STRING, type STRING>> LANGUAGE js AS
"""
  function getFeatureNames(featureMap, featureType) {
    try {
      return Object.entries(featureMap).map(([key, value]) => {
        if (value.name) {
          return {'name': value.name, 'type': featureType};
        }
        return {'name': key, 'type': featureType};
      });
    } catch (e) {
      return [{'name': e, 'type': 'error'}];
    }
  }
  
  $ = JSON.parse(payload);
  if (!$._blinkFeatureFirstUsed) return [];
  
  return getFeatureNames($._blinkFeatureFirstUsed.Features, 'default')
    .concat(getFeatureNames($._blinkFeatureFirstUsed.CSSFeatures, 'css'))
    .concat(getFeatureNames($._blinkFeatureFirstUsed.AnimatedCSSFeatures, 'animated-css'));
""";


SELECT
  REPLACE(SUBSTR(_TABLE_SUFFIX, 0, 10), '_', '') AS yyyymmdd,
  IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
  url,
  feature.name AS feature,
  feature.type
FROM
  `httparchive.pages.*`,
  UNNEST(features(payload)) AS feature
WHERE
  payload IS NOT NULL

This query uses a UDF named features to parse the JSON and extract all feature names or IDs according to their feature type (HTML/JS, CSS, AnimatedCSS).

Chrome Status uses this dataset to show a dashboard of feature usage over time.

SELECT
  *
FROM
  `httparchive.blink_features.usage`
WHERE
  feature = 'FormElement'
ORDER BY
  yyyymmdd DESC,
  client

1 Like