Getting Column Headers from .CVS file


#1

Continuing the discussion from How to download the HTTP Archive data:

I downloaded the .cvs version successfully. However when I open up the file, the labels for each column is missing. For example, I have no clue as to which column represents Speed Index. I was able to extract each by looking at the dataset in Big Query. There must be an easier way to do this.


#2

First off the column definitions can be seen in the table creation scripts as something like this

        INSERT INTO `pages` (`pageid`, `createDate`, `archive`, `label`, `crawlid`, `wptid`, `wptrun`, `url`, `urlShort`, `urlhash`, `cdn`, `startedDateTime`, `TTFB`, `renderStart`, `onContentLoaded`, `onLoad`, `fullyLoaded`, `visualComplete`, `PageSpeed`, `SpeedIndex`, `rank`, `reqTotal`, `reqHtml`, `reqJS`, `reqCSS`, `reqImg`, `reqGif`, `reqJpg`, `reqPng`, `reqFont`, `reqFlash`, `reqJson`, `reqOther`, `bytesTotal`, `bytesHtml`, `bytesJS`, `bytesCSS`, `bytesImg`, `bytesGif`, `bytesJpg`, `bytesPng`, `bytesFont`, `bytesFlash`, `bytesJson`, `bytesOther`, `bytesHtmlDoc`, `numDomains`, `maxDomainReqs`, `numRedirects`, `numErrors`, `numGlibs`, `numHttps`, `numCompressed`, `numDomElements`, `maxageNull`, `maxage0`, `maxage1`, `maxage30`, `maxage365`, `maxageMore`, `gzipTotal`, `gzipSavings`, `_connections`, `_adult_site`, `avg_dom_depth`, `document_height`, `document_width`, `localstorage_size`, `sessionstorage_size`, `num_iframes`, `num_scripts`, `doctype`, `meta_viewport`, `reqAudio`, `reqVideo`, `reqText`, `reqXml`, `reqWebp`, `reqSvg`, `bytesAudio`, `bytesVideo`, `bytesText`, `bytesXml`, `bytesWebp`, `bytesSvg`, `num_scripts_async`, `num_scripts_sync`, `usertiming`)

	INSERT INTO `pagesmobile` (`pageid`, `createDate`, `archive`, `label`, `crawlid`, `wptid`, `wptrun`, `url`, `urlShort`, `urlhash`, `cdn`, `startedDateTime`, `TTFB`, `renderStart`, `onContentLoaded`, `onLoad`, `fullyLoaded`, `visualComplete`, `PageSpeed`, `SpeedIndex`, `rank`, `reqTotal`, `reqHtml`, `reqJS`, `reqCSS`, `reqImg`, `reqGif`, `reqJpg`, `reqPng`, `reqFont`, `reqFlash`, `reqJson`, `reqOther`, `bytesTotal`, `bytesHtml`, `bytesJS`, `bytesCSS`, `bytesImg`, `bytesGif`, `bytesJpg`, `bytesPng`, `bytesFont`, `bytesFlash`, `bytesJson`, `bytesOther`, `bytesHtmlDoc`, `numDomains`, `maxDomainReqs`, `numRedirects`, `numErrors`, `numGlibs`, `numHttps`, `numCompressed`, `numDomElements`, `maxageNull`, `maxage0`, `maxage1`, `maxage30`, `maxage365`, `maxageMore`, `gzipTotal`, `gzipSavings`, `_connections`, `_adult_site`, `avg_dom_depth`, `document_height`, `document_width`, `localstorage_size`, `sessionstorage_size`, `num_iframes`, `num_scripts`, `doctype`, `meta_viewport`, `reqAudio`, `reqVideo`, `reqText`, `reqXml`, `reqWebp`, `reqSvg`, `bytesAudio`, `bytesVideo`, `bytesText`, `bytesXml`, `bytesWebp`, `bytesSvg`, `num_scripts_async`, `num_scripts_sync`, `usertiming`)


	INSERT INTO `requests` (`requestid`, `pageid`, `startedDateTime`, `time`, `method`, `url`, `urlShort`, `redirectUrl`, `firstReq`, `firstHtml`, `reqHttpVersion`, `reqHeadersSize`, `reqBodySize`, `reqCookieLen`, `reqOtherHeaders`, `status`, `respHttpVersion`, `respHeadersSize`, `respBodySize`, `respSize`, `respCookieLen`, `expAge`, `mimeType`, `respOtherHeaders`, `req_accept`, `req_accept_charset`, `req_accept_encoding`, `req_accept_language`, `req_connection`, `req_host`, `req_if_modified_since`, `req_if_none_match`, `req_referer`, `req_user_agent`, `resp_accept_ranges`, `resp_age`, `resp_cache_control`, `resp_connection`, `resp_content_encoding`, `resp_content_language`, `resp_content_length`, `resp_content_location`, `resp_content_type`, `resp_date`, `resp_etag`, `resp_expires`, `resp_keep_alive`, `resp_last_modified`, `resp_location`, `resp_pragma`, `resp_server`, `resp_transfer_encoding`, `resp_vary`, `resp_via`, `resp_x_powered_by`, `_cdn_provider`, `_gzip_save`, `crawlid`, `type`, `ext`, `format`)


	INSERT INTO `requestsmobile` (`requestid`, `pageid`, `startedDateTime`, `time`, `method`, `url`, `urlShort`, `redirectUrl`, `firstReq`, `firstHtml`, `reqHttpVersion`, `reqHeadersSize`, `reqBodySize`, `reqCookieLen`, `reqOtherHeaders`, `status`, `respHttpVersion`, `respHeadersSize`, `respBodySize`, `respSize`, `respCookieLen`, `expAge`, `mimeType`, `respOtherHeaders`, `req_accept`, `req_accept_charset`, `req_accept_encoding`, `req_accept_language`, `req_connection`, `req_host`, `req_if_modified_since`, `req_if_none_match`, `req_referer`, `req_user_agent`, `resp_accept_ranges`, `resp_age`, `resp_cache_control`, `resp_connection`, `resp_content_encoding`, `resp_content_language`, `resp_content_length`, `resp_content_location`, `resp_content_type`, `resp_date`, `resp_etag`, `resp_expires`, `resp_keep_alive`, `resp_last_modified`, `resp_location`, `resp_pragma`, `resp_server`, `resp_transfer_encoding`, `resp_vary`, `resp_via`, `resp_x_powered_by`, `_cdn_provider`, `_gzip_save`, `crawlid`, `type`, `ext`, `format`)

However the number of fields can be 73 or 88 depending on the time of the crawl as one year back we had slightly different data than now.

Hope this helps. LMK which specific month and table you want to analyse and I can help you further


#3

Thanks. I was able to get those already as you indicate. It would be nice though if they were there by default.