Detect AMP as a technology

Would like to see amp detection as part of the technologies a website uses.

Docs on what to look for: https://amp.dev/documentation/guides-and-tutorials/optimize-and-measure/discovery/?format=websites

Hi @kevinsproles. We include detections by the Wappalyzer tool, which does include an AMP detector. The tool’s output is saved to the httparchive.technologies table on BigQuery. Here’s an example of how to query for the URLs detected as having AMP:

#standardSQL
# 1.7 GB
SELECT
  COUNT(DISTINCT url) AS urls
FROM
  `httparchive.technologies.2020_01_01_desktop`
WHERE
  app = 'AMP'

The result is 2,647 URLs.

Is that what you were looking for?

2 Likes

@rviscomi - AMP is only relevant for Mobile so why you have ‘desktop’ table in your query?

Also, I am keen to see AMP adoption by eCommerce sites. As you know, I am very new to this, can you please guide which tables to join to get AMP adoption for eCommerce sites?

@rockeynebhwani the name can be misleading. AMP websites can definitely be used on desktop.

The same technologies dataset that detects AMP could be used to detect websites that use an ecommerce platform. Here’s an example query of joining the two groups together to get the AMP+ecommerce websites for mobile:

SELECT
  COUNT(DISTINCT url) AS urls
FROM (
  SELECT
    url
  FROM
    `httparchive.technologies.2020_06_01_mobile`
  WHERE
    app = 'AMP')
JOIN (
  SELECT
    url
  FROM
    `httparchive.technologies.2020_06_01_mobile`
  WHERE
    category = 'Ecommerce')
USING (url)

The results show that there aren’t many; only 224 URLs.

Thanks @rviscomi for the query. I extracted the URLs (list at the end of this comment) and none of the big websites which are using AMP are on the list. Closer look shows that current detection for AMP+ECommerce is not going to give us clearer picture.

None of the top sites listed in this blog post are part of result set - https://www.plumrocket.com/blog/2019/02/google-amp-on-the-most-popular-e-commerce-websites/ (it’s possible that some of the sites mentioned in this post have stopped using AMP)

Here are some examples -

  1. direct.asda.com (AMP version URL is amp.direct.asda.com)
    wappalyzer identified direct.asda.com as eCommerce site but misses to identify amp.direct.asda.com as eCommerce site. Looks like Wappalyzer detection for identifying ‘Salesforce commerce cloud’ based sites can be improved to look for certain types of cookies. Current detection is possible going to miss any proxy based solutions

  2. www.snapdeal.com
    Example product page - https://www.snapdeal.com/product/redmi-note-8-pro-128gb/6917529702717010640
    AMP equivalent - https://m.snapdeal.com/product/redmi-note-8-pro-128gb/6917529702717010640/amp

Wappalyzer fails to detect www.snapdeal.com as eCommerce site as it’s not using any standard platform.

This is a bigger problem where Wappalyzer based detection misses lot of custom eCommerce websites. Wappalyzer is also going to miss lot of eCommerce sites based on micro-services / headless architecture as it becomes very difficult to identify underlying eCommece technology in this. I will share some thoughts on how we can improve this on separate thread.

I think instead of relying on Wappalyzer detection for AMP eCommerce sites, it’s best to look for presence of

<link rel="amphtml"

This should provide better picture and is more robust way to find sites using AMP (eCommerce or Otherwise)

cc: @jrharalson

URLs from @rviscomi’s query: Sheet

Also, just an observation current list 31 sites with .vn domain. That’s 13.83% of all sites. I wonder why AMP seems to be more popular in Vietnam

Any improvements to AMP / Ecommerce detection should be implemented upstream in Wappalyzer. I’d encourage you to file an issue and document the false negatives and strategies to mitigate them that you’ve identified.

(Also FYI I’ve edited your post to link to all of those URLs in an external spreadsheet so it doesn’t get marked as spam)

1 Like

@rviscomi - Raised two feature request for Wappalyzer


Based on your experience, what are the chances of these getting picked in time for eCommerce chapter? It can really help with the coverage if we are able to do this.

@drewzboto @jrharalson

Did couple of PRs to Wappalyzer which should improve the coverage but we are still going to miss lot of examples (mostly where Wappalyzer fails to detect a site as eCommerce platform). Let’s run this query again post next release of Wappalyzer and Aug HTTP Crawl

Sheet with examples we are going to miss - https://docs.google.com/spreadsheets/d/1wHlgMNTzDYe3gSQgHMAhBWTDxWaiXJxKkp6adMfChWA/edit?usp=sharing