Roundup Tracker - Issues

Issue 2551067

classification
Title: Document file and msg upload for rest interface
Type: behavior Severity: normal
Components: Documentation, API Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rouilj Nosy List: rouilj, schlatterbeck
Priority: normal Keywords: Blocker, rest

Created on 2019-10-15 02:22 by rouilj, last changed 2019-10-21 13:53 by schlatterbeck.

Messages
msg6746 Author: [hidden] (rouilj) Date: 2019-10-15 02:22
I think the last thing that needs to be added to the rest doc is how to post 
messages and files.

Metadata (everything but content) can be done in json.

For msgs, if the message text can be represented as utf8, it can be all posted at 
the same time. Also it could be two posts: metadata post to get a url for the 
message. Then post to the url: message_url/content.

Files however are not the same. We support creation of the metadata using json, but 
I am not sure how we can post the metadata and then set the content separately.
Would we use a multipart/form-data posted to /rest/data/file to include the metadata 
and content?

Comments?
msg6747 Author: [hidden] (schlatterbeck) Date: 2019-10-15 13:13
On Tue, Oct 15, 2019 at 02:22:21AM +0000, John Rouillard wrote:
> 
> Files however are not the same. We support creation of the metadata
> using json, but I am not sure how we can post the metadata and then
> set the content separately.
> Would we use a multipart/form-data posted to /rest/data/file to
> include the metadata and content?

I've successfully written files (metadata and content property) using
the method from the request library. The idea is instead of specifying
json = dictionary we specify data = dictionary.

        d = dict (name = filename, content = content, type = content_type)
        j = self.post ('file', data = d)

The self.post method sets up the necessary headers and prefixes the
given path.

I *think* this by default sends the contents as
application/x-www-form-urlencoded
which is sub-optimal for large files.

You can force the requests library to use multipart/form-data by
specifying both, files= *and* data= parameters, e.g.,

# A binary string that can't be decoded as unicode
content = open ('random-junk', 'rb').read ()
fname   = 'a-bigger-testfile'
d = dict \
    ( name = fname
    , type='application/octet-stream'
    )
c = dict (content = content)
r = session.post (url + 'file', files = c, data = d)
print (r.json ())

This produces something like

POST /path/to/tracker/.../file HTTP/1.1
Host: bee:8080
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.12.4
Content-Length: 2405
Content-Type: multipart/form-data;
boundary=788e954792774a6cbe747ba2ca2a276a
Authorization: Basic <censored>

--788e954792774a6cbe747ba2ca2a276a
Content-Disposition: form-data; name="type"

application/octet-stream
--788e954792774a6cbe747ba2ca2a276a
Content-Disposition: form-data; name="name"

a-bigger-testfile
--788e954792774a6cbe747ba2ca2a276a
Content-Disposition: form-data; name="content"; filename="content"

i.S...Em..3/].T...e1ag.G..?N.b.%..P`M..#a...r.S......}>..d.>7.3a...n.."..`
.P.[.aQc..Rg.....q...s1z.9........%..]..|..1.|...M..p.GC....=..BV.L.5..
+.F.!..H...gI..cdg?.........k...t..A..........}`...J.....
....Y.....>....{..E..
%.E...:a.o.F.......o...../..).>o..qmm.U7..BT..
--788e954792774a6cbe747ba2ca2a276a--

And, yes, this works for creating files :-)

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg6749 Author: [hidden] (rouilj) Date: 2019-10-16 23:53
Hi Ralf:

In message <20191015131334.3h4qoopfsqiphydy@runtux.com>,
Ralf Schlatterbeck writes:
>On Tue, Oct 15, 2019 at 02:22:21AM +0000, John Rouillard wrote:
>> Files however are not the same. We support creation of the metadata
>> using json, but I am not sure how we can post the metadata and then
>> set the content separately.

Answer we can't. Content is a required property 8-).

>> Would we use a multipart/form-data posted to /rest/data/file to
>> include the metadata and content?
>
>I've successfully written files (metadata and content property) using
>the method from the request library. The idea is instead of specifying
>json = dictionary we specify data = dictionary.
>
>        d = dict (name = filename, content = content, type = content_type)
>        j = self.post ('file', data = d)
>
>The self.post method sets up the necessary headers and prefixes the
>given path.
>
>I *think* this by default sends the contents as
>application/x-www-form-urlencoded

I couldn't make curl do this.

>which is sub-optimal for large files.

Agreed. The encoding would make the file larger.

>You can force the requests library to use multipart/form-data by
>specifying both, files= *and* data= parameters, e.g.,
>
># A binary string that can't be decoded as unicode
>content = open ('random-junk', 'rb').read ()
>fname   = 'a-bigger-testfile'
>d = dict \
>    ( name = fname
>    , type='application/octet-stream'
>    )
>c = dict (content = content)
>r = session.post (url + 'file', files = c, data = d)
>print (r.json ())
>
>This produces something like
>
>POST /path/to/tracker/.../file HTTP/1.1
>Host: bee:8080
>Connection: keep-alive
>Accept-Encoding: gzip, deflate
>Accept: */*
>User-Agent: python-requests/2.12.4
>Content-Length: 2405
>Content-Type: multipart/form-data;
>boundary=788e954792774a6cbe747ba2ca2a276a
>Authorization: Basic <censored>
>
>--788e954792774a6cbe747ba2ca2a276a
>Content-Disposition: form-data; name="type"
>
>application/octet-stream
>--788e954792774a6cbe747ba2ca2a276a
>Content-Disposition: form-data; name="name"
>
>a-bigger-testfile
>--788e954792774a6cbe747ba2ca2a276a
>Content-Disposition: form-data; name="content"; filename="content"
>
>i.S...Em..3/].T...e1ag.G..?N.b.%..P`M..#a...r.S......}>..d.>7.3a...n.."..`
>.P.[.aQc..Rg.....q...s1z.9........%..]..|..1.|...M..p.GC....=..BV.L.5..
>+.F.!..H...gI..cdg?.........k...t..A..........}`...J.....
>....Y.....>....{..E..
>%.E...:a.o.F.......o...../..).>o..qmm.U7..BT..
>--788e954792774a6cbe747ba2ca2a276a--
>
>And, yes, this works for creating files :-)

Cool. I think this curl command does the same using multipart/form-data:

   curl -u demo:demo -s  -X POST -H "Referer: https://.../demo/" \
       -H "X-requested-with: rest" \
       -F "name=afile" -F "status=1" -F "type=image/vnd.microsoft.icon" \
      -F  "content=@doc/roundup-favicon.ico"  \
       https://.../demo/rest/data/file

which returns:

  {
    "data": {
        "id": "11",
        "type": "file",
        "link": "https://.../demo/rest/data/file/11",
        "attributes": {
            "acl": null,
            "content": {
                "link": "https://.../demo/file11/"
            },
            "name": "afile",
            "status": {
                "id": "1",
                "link": "https://.../demo/rest/data/filestatus/1"
            },
            "type": "image/vnd.microsoft.icon"
        },
        "@etag": "\"74276f75ef71a30a0cce62dc6a8aa1bb\""
    }
  }

but I can't actually use https://.../demo/file11/ to get the contents
of the file.

That returns the full file11 page including page.html and the form to
change the file's metadata. To get the file contents, I need to use:

   https://.../demo/file11/afile

should we change that response? Currently, you need to get
demo/data/file/11, pull the name and append it to the content link.

Also if I get demo/data/file/11/content, I see:

      "data": "file11 is not text, retrieve using binary_content property. mdsum: bd990c0f8833dd991daf610b81b62316",

but using demo/data/file/11/binary_content I get:

  "data": "b'\\x00\\x00\\x01\\x00\\x01\\x00\\x10\\x10\\x00\\x00\\x00\\x00\\x00\\x00h\\x05\\x00\\x00\\x16\\x00\\x00\\x00(\\x00\\x1c\\x1c\\x1c\\x00ttt ...

etc.  It is encapsulated in the json data wrapper. I assume that is
some encoded form of actual binary data? Is that encoded form
decodable from javascript?

Given how this bloats the file size, I wonder if we should provide a
way via the rest interface to download just the content data in raw
form.

I think the right way to do this is to make the request to
demo/data/file/11/content but set the header:

 Accept:  image/vnd.microsoft.icon

If the content type matches the file type, respond with a binary data
stream with appropriate Content-Type (either the same as the Accept
type or application/octet-stream) and Content-Length. If it doesn't
match we return 406 - not acceptable.

Thoughts?
msg6751 Author: [hidden] (schlatterbeck) Date: 2019-10-17 09:53
On Wed, Oct 16, 2019 at 11:53:43PM +0000, John Rouillard wrote:
> but I can't actually use https://.../demo/file11/ to get the contents
> of the file.

Really? The trailing '/' makes the difference for me.
So roundup really doesn't care if a filename follows, the two URLs (last
time I looked):
https://.../demo/file11/
https://.../demo/file11/afile

Both yield the file contents while the URL without the '/' yields the
file edit page:

https://.../demo/file11

This might have changed in the meantime. BTW, you could append *anything*
after the '/', all would yield the content, e.g.,

https://.../demo/file11/justarandomstring

> That returns the full file11 page including page.html and the form to
> change the file's metadata. To get the file contents, I need to use:
> 
>    https://.../demo/file11/afile
> 
> should we change that response? Currently, you need to get
> demo/data/file/11, pull the name and append it to the content link.

See above, the trailing '/' makes the difference for me, after the '/'
anything can follow.

> Also if I get demo/data/file/11/content, I see:
> 
>       "data": "file11 is not text, retrieve using binary_content property. mdsum: bd990c0f8833dd991daf610b81b62316",
> 
> but using demo/data/file/11/binary_content I get:
> 
>   "data": "b'\\x00\\x00\\x01\\x00\\x01\\x00\\x10\\x10\\x00\\x00\\x00\\x00\\x00\\x00h\\x05\\x00\\x00\\x16\\x00\\x00\\x00(\\x00\\x1c\\x1c\\x1c\\x00ttt ...
> 
> etc.  It is encapsulated in the json data wrapper. I assume that is
> some encoded form of actual binary data? Is that encoded form
> decodable from javascript?

I've never seen 'binary_content', did someone implement this? Or is this
something in the web server used? And I don't think this can be decoded
in javascript because js expects UTF-8. So no arbitrary binary data can
be encoded in JSON.

> Given how this bloats the file size, I wonder if we should provide a
> way via the rest interface to download just the content data in raw
> form.
Yes, I don't think it makes much sense. And I don't think javascript
could parse it for arbitrary binary data.

> I think the right way to do this is to make the request to
> demo/data/file/11/content but set the header:
> 
>  Accept:  image/vnd.microsoft.icon
> 
> If the content type matches the file type, respond with a binary data
> stream with appropriate Content-Type (either the same as the Accept
> type or application/octet-stream) and Content-Length. If it doesn't
> match we return 406 - not acceptable.

I don't get the part with the 'image/vnd.microsoft.icon' content-type.
Why do you do this?

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg6752 Author: [hidden] (rouilj) Date: 2019-10-17 14:50
Hi Ralf:

In message <20191017095339.nx5hhxgxu4wtzxs7@runtux.com>,
Ralf Schlatterbeck writes:
>On Wed, Oct 16, 2019 at 11:53:43PM +0000, John Rouillard wrote:
>> but I can't actually use https://.../demo/file11/ to get the contents
>> of the file.
>
>Really? The trailing '/' makes the difference for me.
>So roundup really doesn't care if a filename follows, the two URLs (last
>time I looked):
>https://.../demo/file11/
>https://.../demo/file11/afile
>
>Both yield the file contents while the URL without the '/' yields the
>file edit page:
>
>https://.../demo/file11
>
>This might have changed in the meantime. BTW, you could append *anything*
>after the '/', all would yield the content, e.g.,
>
>https://.../demo/file11/justarandomstring

Hmm, I'll check tonight to make sure I included the terminating /.  I
am sure I cut/pasted it, but I may have dropped the /.

>> Also if I get demo/data/file/11/content, I see:
>> 
>>       "data": "file11 is not text, retrieve using binary_content property. mdsum: bd990c0f8833dd991daf610b81b62316",
>> 
>> but using demo/data/file/11/binary_content I get:
>> 
>>   "data": "b'\\x00\\x00\\x01\\x00\\x01\\x00\\x10\\x10\\x00\\x00\\x00\\x00\\x00\\x00h\\x05\\x00\\x00\\x16\\x00\\x00\\x00(\\x00\\x1c\\x1c\\x1c\\x00ttt ...
>> 
>> etc.  It is encapsulated in the json data wrapper. I assume that is
>> some encoded form of actual binary data? Is that encoded form
>> decodable from javascript?
>
>I've never seen 'binary_content', did someone implement this?

IIRC it's implemented in the db.

>Or is this something in the web server used?

I'm using roundup-server.

>And I don't think this can be decoded
>in javascript because js expects UTF-8. So no arbitrary binary data can
>be encoded in JSON.

If you hex encode all non-utf-8 characters. \\x00 is a null with
an escaped initial \ right? Which I think is correct for json.

>> Given how this bloats the file size, I wonder if we should provide a
>> way via the rest interface to download just the content data in raw
>> form.
>Yes, I don't think it makes much sense. And I don't think javascript
>could parse it for arbitrary binary data.

I assume you mean yes there should be a way to download binary data
via /rest path.

>> I think the right way to do this is to make the request to
>> demo/rest/data/file/11/content but set the header:

       ^  corrected added /rest.

>> 
>>  Accept:  image/vnd.microsoft.icon
>> 
>> If the content type matches the file type, respond with a binary data
>> stream with appropriate Content-Type (either the same as the Accept
>> type or application/octet-stream) and Content-Length. If it doesn't
>> match we return 406 - not acceptable.
>
>I don't get the part with the 'image/vnd.microsoft.icon' content-type.
>Why do you do this?

If I don't set an accept header, I will get back a json wrapped blob
of data. Hitting /content wil return a json { data: {link:
..., data: ..., @etag: ...}} object.

If I set the accept to something that does not allow json, I can
return something of that type (the raw content data) or say I can't
accomidate your request (406 error).

Does that make it clearer?
msg6754 Author: [hidden] (rouilj) Date: 2019-10-18 23:33
Ralf, you are correct I was missing a trailing / in the file download.

BTW thanks for your notes on using the requests library. Can you check 
the rest doc and see if the snippets are understandable.

If it looks good to you I will close this and open a new issue for making 
file contents available via rest interface.

-- rouilj
msg6758 Author: [hidden] (schlatterbeck) Date: 2019-10-21 13:53
Everything fine, thanks for updating the documentation!

Ralf
msg6759 Author: [hidden] (schlatterbeck) Date: 2019-10-21 13:53
Closing, sorry for the noise
History
Date User Action Args
2019-10-21 13:53:46schlatterbecksetstatus: open -> closed
resolution: remind -> fixed
messages: + msg6759
2019-10-21 13:53:27schlatterbecksetmessages: + msg6758
2019-10-20 21:34:28rouiljsetresolution: remind
2019-10-18 23:33:57rouiljsetmessages: + msg6754
2019-10-17 21:45:49rouiljsettitle: Document file amd msg upload for rest interface -> Document file and msg upload for rest interface
2019-10-17 14:50:58rouiljsetmessages: + msg6752
2019-10-17 09:53:44schlatterbecksetmessages: + msg6751
title: Document file and msg upload for rest interface -> Document file amd msg upload for rest interface
2019-10-17 02:13:25rouiljsetstatus: new -> open
2019-10-17 02:13:19rouiljsetpriority: normal
assignee: rouilj
type: behavior
title: Document file amd msg upload for rest interface -> Document file and msg upload for rest interface
2019-10-16 23:53:43rouiljsetmessages: + msg6749
2019-10-15 13:13:40schlatterbecksetnosy: + schlatterbeck
messages: + msg6747
2019-10-15 02:22:21rouiljcreate