Issue 2551187
Created on 2022-01-10 03:27 by rouilj, last changed 2023-05-20 19:41 by rouilj.
Messages | |||
---|---|---|---|
msg7439 | Author: [hidden] (rouilj) | Date: 2022-01-10 03:27 | |
https://wiki.roundup-tracker.org/MixinClassFileClass defined a gzip mixin that compresses data files transparently. Could we add a similar mixin that takes an image file and converts it to avif format for storage at the same quality? This should reduce file storage needs by 66% in many cases. Issues: 1. some cases avif can increase file size. In this case, the mixin should store the original file. The mime type for the file should be the stored format: e.g. image/avif if smaller or the original image/... mime type. 2. client:_serve_file doesn't look at the accept header, it probably should. If no acceptable format is found, possible raise exception and return 406. parse_accept_header() (from roundup.rest import parse_accept_header) can do this to self.request.headers['Accept']. Consider moving parse_accept_header out of rest.py and into a general roundup/http_headers.py. Also maybe roundup/cgi/accept_language.py can join it. 3. Browsers that don't support avif. Using the <picture> tag with multiple source tags for each supported format: avif, png, jpg should work. Set the srcset="..." to .../file33/afile.jpg?@format=image/jpeg should return a jpg version of file33. It looks like the accept header includes all supported image types: e.g. chrome will download the avif type but includes image/avif, image/webp, image/apng, image/svg+xml, image/*, and */*;q=0.8. So maybe we don't bother with @format and just search for a valid hit in the Accept header. We prefer sending the stored mime type when it is supported. 4. rendering time. The conversion to jpg can be done on the fly. But this can take time. Consider caching file33 as a jpg to file33.jpg at db/files/file/0/file33.jpg. These files can be cleaned using a scheduled script to remove files either unread (atime) or older (mtime) than 30 days for example. As more browsers support avif the back conversion will be less of an issue. 5. Can this be done totally as a mixin? I think the mixin gets the client object as self. So the logic about what to convert to should be possible without changes to _serve_file. However _serve_file may need to be modified to call write_file with the modified file##.jpg name. 6. Is there some time we should not convert the file? E.G. if some analysis of the file would be damaged by converting it. Think png that trigger a remote code execution. Converting it may trigger the issue or only the original file format will support forensic analysis. For python library support, pillow with https://pypi.org/project/pillow-avif-plugin/ should support avif as source and destination formats. |
|||
msg7739 | Author: [hidden] (rouilj) | Date: 2023-03-06 04:13 | |
Another solution I am testing is to use a docker image of imgproxy (https://github.com/imgproxy/imgproxy). My use case assumes I display an image attached to a message/issue in the issue or message item view. In the Roundup generated web page replace the img src="url" of: https://example.com/issues/file14/face.jpg with https://example.com/imgproxy/security_hash/rs:fit:300:200:0:0/ filename:face:expires:XXXXX/plain/local:///0/file14 This url resizes the image to 300x200. It will serve up WEBP or AVIF if imgproxy is configured to do so. The download filename produced will be face.jpg or face.webp or some other extension depending on what imgproxy serves up. imgproxy is configured with direct access to the db/files/file tree and serves the image from there. This appears to work well serving up images 1/4 or less of the size of the original. For image heavy pages it reduces the time to finish loading the page (even with lazy loading). The problem is imgproxy has no way to know if it should serve up a file or not. It can't access the authentication and authorization info that the tracker has for an image. To keep this somewhat secure, I use the expires unix epoch timestamp and the security_hash. The security_hash prevents all modifications to the URL unless you know the salt and key used to generate the hash. So you can't change the file that is accessed, add the raw directive to access some other non-image file, modify the expiration (expires) time etc. I set the expires value to a few seconds (say 5) from the time the page was generated. If the URL isn't used within 5 seconds of its generation it's useless. This attempts to make the access valid only for including the image in the Roundup issue/msg page. The headers returned from imgproxy allow a the file to be cached only until the expires time is reached. This does mean that upstream caches are of limited use. If you know the file will be displayed to the anonymous user without any limitations, you could set the expires header far in the future to make use of an http cache. Downloading directly from Roundup is preferred if you are viewing the fileNN page. This should download using the normal URL: /issues/file14/face.jpg for 2 reasons: 1) access controls are applied 2) the original uploaded file is downloaded without any modifications Number 2 can be important since imgproxy strips EXIF and other info. Also depending on why the image was attached, modifying the format can reduce detail etc. See: https://github.com/imgproxy/imgproxy/issues/1126 for more details. If I deploy this, I will write up something on the wiki and probably close out this ticket as I think this is a better solution: 1) original file is preserved 2) no wasted disk space for files that will never be displayed 2) will run faster (imgproxy is a go binary) I also took a brief look at https://github.com/thumbor/thumbor (written in python) which has a similar signature method to prevent DOS. But it i missing the equivalent of the expires directive. So there is no way to limit the time the image URL is available. Some of the PRO options in imgproxy are available in thumbor: watermark, algorithm tuning .... for free. Also it has an internal cache. |
|||
msg7740 | Author: [hidden] (rouilj) | Date: 2023-03-07 02:51 | |
See: https://wiki.roundup-tracker.org/EfficientImageServing for a writeup on using imgproxy and linking it with Roundup. |
|||
msg7768 | Author: [hidden] (rouilj) | Date: 2023-05-20 19:41 | |
Closing this. I think the imgproxy solution is better: * it preserves the original uploaded file * speed isn't a problem with imgproxy * no extra disk space needed |
History | |||
---|---|---|---|
Date | User | Action | Args |
2023-05-20 19:41:31 | rouilj | set | status: new -> closed resolution: remind -> accepted messages: + msg7768 |
2023-03-07 02:51:51 | rouilj | set | messages: + msg7740 |
2023-03-06 04:14:12 | rouilj | set | assignee: rouilj resolution: remind |
2023-03-06 04:13:45 | rouilj | set | messages: + msg7739 |
2022-01-10 03:27:39 | rouilj | create |