Technologies Table


#1

Hi. Please excuse me if my questions have been covered in another post. If so please direct me to the related post - I have been looking for awhile and have not found anything.

I have been looking at the technologies table which contains some very interesting/useful data. However, I have two very specific questions.

  1. Can a site’s framework info be trusted? The reason I ask this is because I have seen some posts that discuss the difficulties/challenges in determining the framework a given site uses. My interest lies in wanting to see if the is a correlation between framework and performance over time. Perhaps there is a better way…?

  2. I pulled data from technologies table over time and something struck me immediately - there are some sites that do not have any entires. For example I am curious about amazon.com and cannot find any data recorded for Amazon. So I have two related questions. The first is why would a particular site be missing? The second is can a site “opt out” or prevent this data from being collected?

All input is much appreciated. Thank you.


#2

Hey Greg! Good questions.

Here are some links for more context behind the dataset:




To your questions:

The data is only as good as its detection. We use Wappalyzer to do the detection in WebPageTest. And while we’re reasonably confident in the detection, there may be some blind spots, for example frameworks that hide their presence from the global scope. @developit has made some efforts to improve this detection upstream in Wappalyzer.

I think the data is solid enough to do an analysis of framework performance (eg see CMS Performance for similar analysis).

Are you looking up the right URL? For example this query shows 78 Amazon URLs in the most recent technologies table:

SELECT
  DISTINCT url
FROM
  `httparchive.technologies.2019_02_01_desktop`
WHERE
  NET.REG_DOMAIN(url) = 'amazon.com'

and there are some entries for https://www.amazon.com/ specifically (AWS and Cloudfront).