Pages and Requests table origin

amirian · July 7, 2017, 6:50pm

Hi there,

I’m trying to better understand how this tool works. Is the pages table generated from the requests table for a specific pageid, or is the pages table created from a totally separate run of webpagetest?

I’m asking to make sure I understand some analysis I’m trying to run correctly. For example, if I wanted to see what server software was used for a webpage, could I simply look at the server software of the requests table and make sure firstHTML was set to TRUE (since there doesn’t seem to be a server software field for the pages table), or is that incorrect?

rviscomi · July 10, 2017, 8:06pm

The pages table contains one row for every web page tested. The latest crawl has about 500,000 pages. For example, the row where url = 'http://www.microsoft.com/' has a unique page ID of 78071569. The row contains summary statistics about the page.

The requests table contains one row for every request in a page’s test. The latest crawl has about 50,000,000 requests, or an average of 100 requests per page. There are 167 rows having Microsoft’s page ID, and each of those have their own unique request ID.

Here’s a graphical explanation:

So if you wanted to find out the server software used for Microsoft’s home page, you could do something like this:

SELECT
  url,
  resp_server
FROM
  [httparchive:runs.2017_06_15_requests]
WHERE
  firstHtml = true AND
  pageid = 78071569

Results:

url									resp_server	 
https://www.microsoft.com/en-us/	Microsoft-IIS/8.5

The resp_server field corresponds with the Server response header. Be aware that it’s not a required header and many websites omit it from the response, so it would be empty when queried.

Topic		Replies	Views
Queries returning numbers larger than size of tables? Analysis	4	1299	July 12, 2017
Resource Churn Across Crawls Analysis	2	2135	September 30, 2013
How many pages include resources from web.archive.org Analysis	0	751	July 4, 2020
Data collection in HTTPArchive Analysis	1	1629	January 15, 2019
Help finding list of home pages with specific http response header Analysis	7	924	June 7, 2023

Pages and Requests table origin

Related topics