Pages and Requests table origin

Hi there,

I’m trying to better understand how this tool works. Is the pages table generated from the requests table for a specific pageid, or is the pages table created from a totally separate run of webpagetest?

I’m asking to make sure I understand some analysis I’m trying to run correctly. For example, if I wanted to see what server software was used for a webpage, could I simply look at the server software of the requests table and make sure firstHTML was set to TRUE (since there doesn’t seem to be a server software field for the pages table), or is that incorrect?

The pages table contains one row for every web page tested. The latest crawl has about 500,000 pages. For example, the row where url = 'http://www.microsoft.com/' has a unique page ID of 78071569. The row contains summary statistics about the page.

The requests table contains one row for every request in a page’s test. The latest crawl has about 50,000,000 requests, or an average of 100 requests per page. There are 167 rows having Microsoft’s page ID, and each of those have their own unique request ID.

Here’s a graphical explanation:

So if you wanted to find out the server software used for Microsoft’s home page, you could do something like this:

SELECT
  url,
  resp_server
FROM
  [httparchive:runs.2017_06_15_requests]
WHERE
  firstHtml = true AND
  pageid = 78071569

Results:

url									resp_server	 
https://www.microsoft.com/en-us/	Microsoft-IIS/8.5

The resp_server field corresponds with the Server response header. Be aware that it’s not a required header and many websites omit it from the response, so it would be empty when queried.