Identifying the GOV.UK Design System usage across the web

Nooshu · February 11, 2020, 12:12am

Government Digital Service (GDS) have created and maintain a service called the GOV.UK Design System. It is a set of styles, components and patterns that make UK Government services look like GOV.UK. The code for the styles, components, and patterns is stored in the GOV.UK Frontend repository and is all open source and is free to examine and use.

We as a department encourage all government departments to use it, but we can’t enforce it. As every department is essentially its own separate entity, we have very little visibility on who is actually using it and where it is being used (unless they approach us for support leave us feedback). This creates problems when we are asked to provide numbers on the adoption across both central government services, and more recently local councils too.

Speaking to Rick he has suggested adding detection code to Wappalyzer, a one-time custom metric, and examining static HTML response bodies for HTML patterns that match the Design System’s code base.

Opening this thread to continue the discussion and analysis.

paulcalvano · February 11, 2020, 4:22am

Examining HTML response bodies sounds like a good way to start. However due to the size of that table, I’d suggest creating a smaller subset of it with just the GOV.UK sites that are of interest. This way you can search for adoption across UK Government services first.

A quick query shows 2151 sites with the hostname matching “%.gov.uk”. Are there any other domain names you’d like to see included in an excerpt?

Some example URLs that match this query -

Nooshu · February 11, 2020, 11:09am

Hi @paulcalvano,
Thanks for the reply. Yes a smaller subset would be a great idea to start with, as that is where we are most likely to see adoption. The *.gov.uk domain would be the only one to include in the excerpt. We do have an issue that services tend to sit on the *.service.gov.uk domain, and our service manual asks these domains to stop search engines crawling them, but I’d guess there isn’t anyway around this?

I’m currently looking into the HTML patterns that will be common to a service that uses the Design System to hang further searches off.

paulcalvano · February 11, 2020, 12:05pm

Great. I’ve created a smaller (1.5GB) table that we can use to keep the query costs down. The table httparchive.scratchspace.gov_uk_response_bodies_2020_01_mobile contains the response bodies for 40,183 resources across 2367 .gov.uk sites.

Unfortunately I did not find any service.go.uk domains in there, so these are likely not included in the HTTP Archive.

Nooshu · February 11, 2020, 1:53pm

Thanks Paul! That data set looks much more reasonable that the 2.5TB I saw listed!

So I’ve now raised a PR with Wappalyzer and in there we are hanging our detection of the existence off a <body> with a govuk-template__body class, and also looking for and anchor with govuk-link as a class. These are both fundamental to the styling, and our guidance pushes for teams who need to make custom modifications to use BEM, so in theory these classes should exist somewhere in the codebase.

We could get even more granular by looking for specific font names we use, since they will be very unique. But I’d hope the above is enough.

RE: Service domains. I’m not sure if you made a typo in the query? You have service.go.uk listed above rather than service.gov.uk. I’ve just run a quick query using the SQL you listed in your first reply and found 63 sites (there are around ~240 in total according to our internal team), so many will likely be excluded.

Nooshu · February 12, 2020, 3:46pm

For reference: sites that exhibit the patterns we are looking for include:

rviscomi · February 12, 2020, 4:02pm

Here’s an example query using @paulcalvano’s scratchspace table that matches sites having govuk-template__body or govuk-link in their HTML.

SELECT
  url
FROM
  `httparchive.scratchspace.gov_uk_response_bodies_2020_01_mobile`
WHERE
  page = url AND
  REGEXP_CONTAINS(body, '(govuk-template__body|govuk-link)')

The regexp is overly simple and doesn’t distinguish between the patterns being found on specific tags, but the strings are unique enough that it shouldn’t matter much.

All 5 of the known websites are in the results:

https://pay-roadside-fine.service.gov.uk/
https://www.get-disability-work-support.service.gov.uk/
https://childcare.tax.service.gov.uk/
https://www.insolvencydirect.bis.gov.uk/
https://findapprenticeshiptraining.apprenticeships.education.gov.uk/
https://eucitizensrights.campaign.gov.uk/
https://mobile.learnerview.ofsted.gov.uk/
https://www.signin.service.gov.uk/
http://www.dft.gov.uk/
https://scotlis.ros.gov.uk/
http://www.hmrc.gov.uk/
https://www.crowncommercial.gov.uk/
https://www.vmd.defra.gov.uk/
http://apps.environment-agency.gov.uk/
https://euexit.campaign.gov.uk/
http://open.justice.gov.uk/
https://www.ipo.gov.uk/
https://auth.apply-for-innovation-funding.service.gov.uk/
https://rhi.ofgem.gov.uk/
https://visas-immigration.service.gov.uk/
https://www.tax.service.gov.uk/
https://nationalcareersservice.direct.gov.uk/
https://signon.publishing.service.gov.uk/
https://vehicleenquiry.service.gov.uk/
https://www.homeofficesurveys.homeoffice.gov.uk/
https://view-and-prove-your-rights.homeoffice.gov.uk/
https://www.transportoffice.gov.uk/
https://www.dft.gov.uk/
https://design-system.service.gov.uk/
http://customs.hmrc.gov.uk/
https://www.update-student-loan-employment-details.service.gov.uk/
https://www.digitalmarketplace.service.gov.uk/
https://www.reminders.mot-testing.service.gov.uk/
https://childmaintenanceservice.direct.gov.uk/
https://helpwithcourtfees.service.gov.uk/
http://www.mhra.gov.uk/
https://www.cica.gov.uk/
https://childcare-support.tax.service.gov.uk/
https://www.registertovote.service.gov.uk/
https://www.gov.uk/
https://register.getintoteaching.education.gov.uk/
https://find-postgraduate-teacher-training.education.gov.uk/
https://get-information-schools.service.gov.uk/
https://graduatetalentpoolsearch.direct.gov.uk/
https://apply-for-innovation-funding.service.gov.uk/
https://learnerview.ofsted.gov.uk/
https://www.ethnicity-facts-figures.service.gov.uk/
https://smokecontrol.defra.gov.uk/
https://www.payments.service.gov.uk/
https://update-your-details.homeoffice.gov.uk/
https://www.judicialappointments.gov.uk/
https://www.wifi.service.gov.uk/
https://euexitbusiness.campaign.gov.uk/
https://www.update-student-loan-employment-details.service.gov.uk/
https://www.dartford-crossing-charge.service.gov.uk/
https://teaching-vacancies.service.gov.uk/
https://nationalcareers.service.gov.uk/
https://www.apply-civil-service-fast-stream.service.gov.uk/
https://www.verify.service.gov.uk/
https://customs.hmrc.gov.uk/
https://www.notifications.service.gov.uk/
https://www.get-information-schools.service.gov.uk/
https://secure.hmce.gov.uk/

Nooshu · February 12, 2020, 5:35pm

Thank you @paulcalvano / @rviscomi. This is really useful! From this example we can now expand this out to cover other areas we want to track like accessibility (aria tags etc), and our legacy code bases.

Is the ability to create smaller data sets with the response bodies something I can do (e.g. when the next crawl data is released)? Or is it only possible as a maintainer?

rviscomi · February 12, 2020, 6:16pm

Your personal BigQuery account comes with its own storage space, so you can create your own tables and work on subsets of the public dataset as needed. From the BigQuery UI you can configure any query to write its results to a destination table of your choosing, similar to how Paul created the scratchspace table.

Nooshu · February 13, 2020, 5:09pm

Is there a way to detect 301 redirects and filter them out when looking over a dataset?

A number of the URL’s a redirecting back to the main www.gov.uk domain and being flagged as a false positive.

rviscomi · February 13, 2020, 5:48pm

Yes, the summary_requests tables include a firstHtml field which would be set to true for the canonical URL.

We got the initial URLs from the Chrome UX Report, which represents the websites real Chrome users visit, so I wouldn’t expect to see too many HTML redirects. @Nooshu do you know if some of these sites are redirecting because our tests are unauthenticated and/or run from the US? Do they always redirect or only under certain conditions?

Nooshu · February 13, 2020, 5:59pm

A lot of these will redirect in all conditions. I only know of a single (and very recent) service with any sort of geo-blocking so it won’t be that. And in terms of authentication, we don’t have any form of user-login for access to most services. HMRC (Tax department) requires some for their services. Some examples of redirects:

I will ask about the nature of redirects. Many of them could be for legacy reasons.
And some that don’t are:

rviscomi · February 14, 2020, 9:41pm

Opened this issue to investigate redirects more generally: https://github.com/HTTPArchive/httparchive.org/issues/197

Nooshu · February 20, 2020, 9:57am

Small update on this. Version 5.9.4 of Wappalyzer now detects GOV.UK Frontend (Design System).

patmeenan · February 20, 2020, 7:34pm

I just updated the Wappalyzer definitions in the agent so it will be picked up in the next crawl (and any WPT tests)

Nooshu · February 21, 2020, 3:34pm

Awesome! Thanks @patmeenan!

Topic		Replies	Views
Popularity of document.domain Analysis	2	2211	November 15, 2017
Representative URLs for common unusual markup for investigation Analysis	8	2981	August 28, 2019
List of domains using a given technology Analysis	2	372	May 23, 2024
Query and expose feature usage share metrics Analysis	5	1225	April 17, 2019
JavaScript Library Detection Analysis	19	14203	October 26, 2018

Identifying the GOV.UK Design System usage across the web

Related topics