Working with CSV dumps

The HARs in queryable form are in bigquery. Those get REALLY expensive to query though.

The requests tables also have the raw headers in JSON format (extracted from the HARs):

"response":{
    "status":200,
    "statusText":"",
    "headersSize":586,
    "bodySize":1736,
    "headers":[
    {
        "name":"status",
        "value":"200"
    },
    {
        "name":"expect-ct",
      "value":"max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""
   },
   {
      "name":"strict-transport-security",
      "value":"max-age=31536000"
   },
   {
      "name":"content-encoding",
      "value":"br"
   },
   {
      "name":"cf-cache-status",
      "value":"HIT"
   },
   {
      "name":"expires",
      "value":"Tue, 12 Nov 2019 09:19:05 GMT"
   },
   {
      "name":"vary",
      "value":"Accept-Encoding"
...

and also (duplicated):

"_headers":{
   "request":[
      "Referer: https://fanatec.com/us-en",
      "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36 PTST/191105.201111",
      ":method: GET",
      ":authority: fanatec.com",
      ":scheme: https",
      ":path: /themes/Frontend/Fanatec/frontend/_public/src/img/endor-logo_white.svg",
      "user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36 PTST/191105.201111",
      "accept: image/webp,image/apng,image/*,*/*;q=0.8",
      "sec-fetch-site: same-origin",
      "sec-fetch-mode: no-cors",
      "referer: https://fanatec.com/us-en",
      "accept-encoding: gzip, deflate, br",
      "accept-language: en-US,en;q=0.9",
      "cookie: [128 bytes were stripped]"
   ],
   "response":[
      "status: 200",
      "expect-ct: max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"",
      "strict-transport-security: max-age=31536000",
      "content-encoding: br",
      "cf-cache-status: HIT",
      "expires: Tue, 12 Nov 2019 09:19:05 GMT",
      "vary: Accept-Encoding",
      "server: cloudflare",
      "last-modified: Tue, 05 Nov 2019 05:19:35 GMT",
      "etag: W/\"5dc10667-12de\"",
      "cache-control: public, max-age=604800, must-revalidate, proxy-revalidate",
      "date: Wed, 06 Nov 2019 15:53:50 GMT",
      "cf-ray: 53183e17af5c6d82-SJC",
      "alt-svc: h3-23=\":443\"; ma=86400",
      "content-type: image/svg+xml",
      "age: 110085",
      ":status: 200"
   ]
},

The cookie/auth headers were stripped by Chrome’s netlog capture and were just fixed so those won’t be accurate but everything else should be (though automated parsing of them in bigquery can be a pain).

1 Like