Roundup Tracker - Issues

Issue 2551187

classification
Provide a mixin to save image files in avif format and convert to other image types
Type: rfe Severity: minor
Components: Web interface Versions:
process
Status: closed accepted
:
: rouilj : rouilj
Priority: low :

Created on 2022-01-10 03:27 by rouilj, last changed 2023-05-20 19:41 by rouilj.

Messages
msg7439 Author: [hidden] (rouilj) Date: 2022-01-10 03:27
https://wiki.roundup-tracker.org/MixinClassFileClass defined a
gzip mixin that compresses data files transparently.

Could we add a similar mixin that takes an image file and
converts it to avif format for storage at the same quality? This
should reduce file storage needs by 66% in many cases.

Issues:

 1. some cases avif can increase file size. In this case, the
    mixin should store the original file. The mime type for the
    file should be the stored format: e.g. image/avif if smaller
    or the original image/... mime type.

 2. client:_serve_file doesn't look at the accept header, it
    probably should.  If no acceptable format is found, possible
    raise exception and return 406.  parse_accept_header() (from
    roundup.rest import parse_accept_header) can do this to
    self.request.headers['Accept']. Consider moving
    parse_accept_header out of rest.py and into a general
    roundup/http_headers.py. Also maybe
    roundup/cgi/accept_language.py can join it.

 3. Browsers that don't support avif. Using the <picture> tag
    with multiple source tags for each supported format: avif,
    png, jpg should work. Set the srcset="..."  to
    .../file33/afile.jpg?@format=image/jpeg should return a jpg
    version of file33. It looks like the accept header includes
    all supported image types: e.g. chrome will download the avif
    type but includes image/avif, image/webp, image/apng,
    image/svg+xml, image/*, and */*;q=0.8. So maybe we don't
    bother with @format and just search for a valid hit in the
    Accept header. We prefer sending the stored mime type when it
    is supported.

 4. rendering time. The conversion to jpg can be done on the
    fly. But this can take time. Consider caching file33 as a jpg
    to file33.jpg at db/files/file/0/file33.jpg.  These files can
    be cleaned using a scheduled script to remove files either
    unread (atime) or older (mtime) than 30 days for example.  As
    more browsers support avif the back conversion will be less
    of an issue.

 5. Can this be done totally as a mixin? I think the mixin gets the
    client object as self. So the logic about what to convert to
    should be possible without changes to _serve_file. However
    _serve_file may need to be modified to call write_file with the
    modified file##.jpg name.

 6. Is there some time we should not convert the file? E.G. if some
    analysis of the file would be damaged by converting it. Think
    png that trigger a remote code execution. Converting it may trigger
    the issue or only the original file format will support forensic analysis.
 
For python library support, pillow with
https://pypi.org/project/pillow-avif-plugin/ should support avif as
source and destination formats.
msg7739 Author: [hidden] (rouilj) Date: 2023-03-06 04:13
Another solution I am testing is to use a docker image of imgproxy 
(https://github.com/imgproxy/imgproxy).

My use case assumes I display an image attached to a message/issue in the issue or
message item view.

In the Roundup generated web page replace the img src="url" of:

   https://example.com/issues/file14/face.jpg 

with

   https://example.com/imgproxy/security_hash/rs:fit:300:200:0:0/
           filename:face:expires:XXXXX/plain/local:///0/file14

This url resizes the image to 300x200. It will serve up WEBP or AVIF if imgproxy
is configured to do so. The download filename produced will be face.jpg or face.webp
or some other extension depending on what imgproxy serves up. imgproxy is configured
with direct access to the db/files/file tree and serves the image from there.

This appears to work well serving up images 1/4 or less of the size of the original.
For image heavy pages it reduces the time to finish loading the page (even with lazy
loading).

The problem is imgproxy has no way to know if it should serve up a file or not.
It can't access the authentication and authorization info that the tracker has
for an image.

To keep this somewhat secure, I use the expires unix epoch timestamp and the
security_hash. The security_hash prevents all modifications to the URL unless you
know the salt and key used to generate the hash. So you can't change the file that
is accessed, add the raw directive to access some other non-image file, modify the
expiration (expires) time etc.

I set the expires value to a few seconds (say 5) from the time the page was generated.
If the URL isn't used within 5 seconds of its generation it's useless. This attempts
to make the access valid only for including the image in the Roundup issue/msg page.

The headers returned from imgproxy allow a the file to be cached only until the
expires time is reached. This does mean that upstream caches are of limited use.
If you  know the file will be displayed to the anonymous user without any limitations,
you could set the expires header far in the future to make use of an http cache.

Downloading directly from Roundup is preferred if you are viewing the fileNN page.
This should download using the normal URL: /issues/file14/face.jpg for 2 reasons:

  1) access controls are applied
  2) the original uploaded file is downloaded without any modifications

Number 2 can be important since imgproxy strips EXIF and other info. Also depending
on why the image was attached, modifying the format can reduce detail etc.

See: https://github.com/imgproxy/imgproxy/issues/1126 for more details.

If I deploy this, I will write up something on the wiki and probably close out
this ticket as I think this is a better solution:

  1) original file is preserved
  2) no wasted disk space for files that will never be displayed
  2) will run faster (imgproxy is a go binary)

I also took a brief look at https://github.com/thumbor/thumbor (written in python) which
has a similar signature method to prevent DOS. But it i missing the equivalent of the expires
directive. So there is no way to limit the time the image URL is available.
Some of the PRO options in imgproxy are available in thumbor: watermark, algorithm tuning
.... for free. Also it has an internal cache.
msg7740 Author: [hidden] (rouilj) Date: 2023-03-07 02:51
See: https://wiki.roundup-tracker.org/EfficientImageServing for a writeup on
using imgproxy and linking it with Roundup.
msg7768 Author: [hidden] (rouilj) Date: 2023-05-20 19:41
Closing this. I think the imgproxy solution is better:

 * it preserves the original uploaded file
 * speed isn't a problem with imgproxy
 * no extra disk space needed
History
Date User Action Args
2023-05-20 19:41:31rouiljsetstatus: new -> closed
resolution: remind -> accepted
messages: + msg7768
2023-03-07 02:51:51rouiljsetmessages: + msg7740
2023-03-06 04:14:12rouiljsetassignee: rouilj
resolution: remind
2023-03-06 04:13:45rouiljsetmessages: + msg7739
2022-01-10 03:27:39rouiljcreate