How to find how many websites are using Magento?

Hi @nman. There are 2 ways I can think of to identify Magento sites:

  • Many Magento sites have requests for resources containing the string /skin/frontend/, which we can search for in the requests tables
  • Most Magento sites I’ve seen set a Cookie named frontend, so we can look for the presence of those cookies in the HAR files.

Let’s take a look at the resources containing /skin/frontend/ first. This simple query shows us examples of some of the requests matching this pattern:

SELECT url
FROM `httparchive.runs.2018_02_15_requests` r
WHERE r.url LIKE "%/skin/frontend/%"
LIMIT 10

When we look at the results of it, I can see lots of Magento URLs -
image

The requests table has a pageid field, which we can join against the pages table to summarize this. For example, the following query does the same match against the request URL, but joins the pages table and counts the number of requests matching the pattern per page.

SELECT rank, p.url, count(*) frontendreqs
FROM `httparchive.runs.2018_02_15_requests` r
INNER JOIN `httparchive.runs.2018_02_15_pages` p 
ON r.pageid=p.pageid
WHERE r.url LIKE "%/skin/frontend/%"
GROUP BY rank, p.url
ORDER BY rank ASC

The results look like this - and there are 5897 sites -
image

The HAR file approach is a bit more complex because we need to extract the cookie names from the JSON, and there are often more than 1 cookie set per page. I’m using a JavaScript function that @rviscomi shared with me a while back that outputs headers to an array (and then can be UNNESTed to summarize them later). The query counts the number of cookies that are specifically named frontend, and returned 5129 results.

(Note: This query processes 187GB of data, 1/5th of the monthly free tier)

CREATE TEMPORARY FUNCTION getHeaderNames(payload STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  var $ = JSON.parse(payload);
  var headers = $.response.headers;
  
  var headernames=[];
  j=0;
  for (i in headers) { 
    if (headers[i].name.indexOf('set-cookie') != -1) {
        headernames[j] = headers[i].name + ': ' + headers[i].value;
        j++;
      }
    i++;
   }
  
  try {
    return headernames;
  } catch (e) {
    return [];
  }
""";

SELECT page, COUNT(*) magento_cookies
FROM (
  SELECT page, url, getHeaderNames(lower(payload)) cookienamevalues
  FROM `httparchive.har.2018_02_01_chrome_requests`
)
CROSS JOIN 
   UNNEST(cookienamevalues) AS cookienamevalue
WHERE cookienamevalue LIKE "set-cookie: frontend=%"
GROUP BY page

When I combined the results from both queries and filtered out the duplicate URLs, I found that there are 6165 sites in the HTTP Archive that are likley using Magento.

1 Like