Determining the frequency of apng


#1

While apng images do technically have their own mime-type (image/apng), it does not show up at all in the archive. Every instance I do find through googling seems to ship them as .png images, with image/png headers. You can determine if a png image is apng by checking whether or not the string “acTL” occurs in the file, but am not how or if I can write a query to find instances of apng images

Is it possible?


#2

Hi @patrickkettner. I haven’t seen APNG images shipped with an APNG extension either, and looking at the HTTP Archive requests table it looks like there are only 2 objects with an apng extension.

SELECT url
FROM `httparchive.summary_requests.2018_04_01_desktop`
WHERE ext = "apng"

The HTTP Archive does capture response payloads for some objects (html, css, js), but binary images are not included in that dataset. So there is no way to specifically search for the string acTL in the payload of the .png responses via BigQuery.

An alternative solution for you to find this information is to extract a list of all PNGs…

SELECT distinct url
FROM `httparchive.summary_requests.2018_04_01_desktop`
WHERE url LIKE "%.png%" and respBodySize > 1024

or a random sample of PNGs…

SELECT url
FROM (
   SELECT distinct url, rand() as random
   FROM `httparchive.summary_requests.2018_04_01_desktop`
   WHERE url LIKE "%.png%" and respBodySize > 1024
   ORDER BY random ASC
)
LIMIT 100000

And then download the results to a CSV file so that you can process them outside of BigQuery. For example, if I were to process this list using some bash commands, the following command would

  • Loop through the data files outputted from the above queries
  • perform a curl request, outputting the response body to stdout
  • set a timeout on the curl request to 5s to avoid timeouts stalling you
  • grep for the payload in the response
  • output a url only if the “acTL” payload is detected
for i in $(cat png.csv); do curl -sk -m 5 $i 2>&1 | grep -q "acTL" && echo $i; done 

Hope that helps! Let us know if you find anything interesting :slight_smile:


#3

If it would be useful in an ongoing basis I’d recommend filing an issue to report the “sniffed content type” or something like that and we can add support to WebPageTest to report content types based on the binary files (assuming it is sniffable from the first few bytes of the file).

The agents already sniff the image types as part of the optimization checks so it doesn’t feel like a stretch to add, it’s just not currently possible.


#4

image/apng has not been registered, although image/vnd.mozilla.apng has:
https://www.iana.org/assignments/media-types/media-types.xhtml#image

I agree that knowing the usage of apng would be very useful.


#5

@paulcalvano - cheers. I downloaded the full list and am downloading the 4.8M pngs as I type.

@patmeenan It would be useful. First few examples that come up when you google apng have the acTL magic string ~40 bytes into the file. Not sure if that counts as the first few bites.

@svgeesus ah shoot, my bad. I could have sworn it was registered. Thanks for the correction!


#6

It isn’t registered because it isn’t part of the PNG spec, having failed a vote years back.

More recently I asked for new discussion and a re-vote because of widespread adoption. Little interest.


#7

apng specifically should be easy enough since the optimization check actually walks the entire PNG looking for non-image tags as bloat. Finding PNGs with the acTL chunk (anywhere in them) should be easy enough. May be able to have something for the 5/15 crawl, just need to figure out where to plumb it into the results.


#8

Processing finally finished.

here is a list of the 4,437,360 files that hit on @paulcalvano’s query (gziped version).

I split the list into several chunks to parallelize, and then ran the following to filter

while read pngURL; do curl -sk -m 5 -r 0-99 $pngURL 2>&1 | grep -q acTL && echo $pngURL >> /tmp/results-apng ; done < png_list

The final result of that is available here.

The answer to my original question is that apngs are ~0.00275% (122 of the 4.4m) of PNG files.


#9

Given the URL of a PNG file, how would I be able to get the domain of the page that requested the image?


#10

For example:

SELECT
  DISTINCT NET.REG_DOMAIN(page) 
FROM
  `httparchive.requests.2018_04_01_desktop`
WHERE
  url = 'http://maps.gstatic.com/mapfiles/transparent.png'

Multiple pages may request the same image though, so expect multiple results.