Issue 2551328: REST has @links[next] link if the number of items is a multiple of @page_size - Roundup tracker

classification

Title:	REST has @links[next] link if the number of items is a multiple of @page_size
Type:	behavior	Severity:	normal
Components:	API	Versions:

process

Status:	fixed	Resolution:	fixed
Dependencies		Superseder:
Assigned To:	rouilj	Nosy List:	rouilj, schlatterbeck
Priority:		Keywords:	rest

Created on 2024-03-28 13:32 by rouilj, last changed 2024-04-01 19:58 by rouilj.

Messages
msg7968	Author: [hidden] (rouilj)	Date: 2024-03-28 13:32
Ralf, can I get your thoughts on this. If you have 10 matches and set @page_size to 10, the returned json has a next key in the @links target indicating there is another page of results. Fetching the next link will return 0 items and the json will not have a next link. You don't see the bug if @page_size is not a multiple of the total amount of matching pages. This is a fencepost error. Maybe we need to limit the number of items filtered from the database to @page_size+1 (currently @page_size). Add the next link only if the number of filtered items is @page_size+1. Then return the first @page_size items. This will fix the issue but.... Maybe we shouldn't be pushing the limit down to the database at all. This results in a larger filter response and possibly a slower response for non-embedded databases (mysql/postgresql). But returning the whole set from the database allows us to fix issue 2551264 where the total_size value is incorrect. We can still push offset to the database to reduce some returned data since we can calculate the number of items in the offset as @page_index-1*@page_size and just add this to the total size of the set. I think the HTML web interface gets all the data without any limit or offset so we wouldn't be any worse than that. This bug was found by the team building the classhelper web-component: https://github.com/UMB-CS-682-Team-03/tracker
msg7969	Author: [hidden] (rouilj)	Date: 2024-03-28 16:53
Blech. It looks like you have to have the LIMIT clause if you specify OFFSET in sqlite or mysql. In postgres it can be Null. So I propose using some large number maybe 10 million as the limit. If the size of the returned set is 10M report a UsageError (your filtering is returning too many rows) or ValueError error. I mean what are they going to do when paging through 10M rows? If they really need large chunks, they should be able to @page_index=2 @page_size 9999999 and keep going until exhaustion. I'll also make this a property of the RestfulInstance class so it can be changed from interfaces.py. (This will hopefully allow me to test for the limit exceeded case by setting the limit to a low value.)
msg7972	Author: [hidden] (rouilj)	Date: 2024-04-01 05:49
Took me about 6 hours, but I have a working implementation. Docs are done. I found a couple of edge cases while writing the tests so.... I think the tests are exercising all the code paths. Will find out from code coverage reports when I push to ci. Ralf if you have any concerns, let me know. I'll commit/push later this morning my time. I am pretty sure that this isn't worse than the web's index interface and might be better since I am pushing the offset down to the database layer. This should result in less data transfer when not on the first page. It's not perfect. It can return fewer rows than the requested @page_size even when there are matching items. This happens when the page size is near the max limit and the user doesn't have access to some of the returned items. I don't do an additional database fetch to fill the response with exactly @page_size items. If that gets implemented, adding a cache for the additional fetch and some way to indicate an offset into that fetch would be a nice addition. However the rest interface never promised that it would always return @page_size records if they were available. The presence of the next link should be used to determine if the client can get more data, not a count of the number of rows returned. I assume your front end using the rest interface is implemented using the next link correctly. Also the total number of rows leaks a little info. It's the matching rows in the database not the number of row the user has the ability to see. So it's possible that counts like: "Viewing records 30 - 40 of 256" are off as the user will never see 256 rows, and at this point might only be looking at the 25th to 35th rows thy can access if 5 of the rows in the 1-30 range are in-accessible to them. I'm not sure if that's going to be an issue. The fix I think is to do what the HTML web interface does: batch gets the filtered set of items. But I claim this is better than what we had. I'll open a new ticket to better filter @total_size at some future date when I close this.
msg7973	Author: [hidden] (schlatterbeck)	Date: 2024-04-01 09:13
Hmm: When there are permissions with check methods on a Class, we are checking permissions in python not in the database. The consequence is that we do not know at SQL time which items will be displayed. I do have several classes with permission methods in my trackers. I've often thought about this, if it would make sense to implement permissions that can be checked at SQL time so that we do not need to filter in python. On some of my trackers filtering in python is a real performance killer because the tables are quite big... So in a first step we may want to use your new optimized implementation only if there are no check methods. Also the check methods are not sorted: Some users may see everything but we are sometimes checking the low-performance (method-based) checks before the lightweight checks that may permit everything for a certain role. An optimized version of the permission checks that sorts permissions by role and tries permissions without check methods first would be nice here. Sorry for making this issue into a wishlist :-) Thanks Ralf
msg7974	Author: [hidden] (rouilj)	Date: 2024-04-01 15:24
Hi Ralf: I committed changeset: 7853:03c1b7ae3a68. While I was composing this response, it passed CI and all code paths are checked for what that's worth. Your performance concern can be addressed by reducing the number of rows returned from the SQL (or anydbm) db using the method incorporated in this changeset. See admin_guide.txt look for max_response_row_size. Reduce the returned data set to 100 or 1000 items. The only down side is that the total size returned will be set to -1 indicating the total size is unknown. Which is an improvement from the @total_size hallucination that used to be returned 8-). Currently the max size is a constant for all classes. You could experiment with making max size a function that takes a class/classname and @page_size and returns a different limit value for different inputs. I didn't do it because YAGNI but if YNI. > When there are permissions with check methods on a Class, > we are checking permissions in python not in the > database. The consequence is that we do not know at SQL > time which items will be displayed. AFAICT all permission checks are in Python. Regardless of a check method. This bit of code: for item_id in obj_list: r = {} if self.db.security.hasPermission( 'View', uid, class_name, itemid=item_id, property='id', ): r = {'id': item_id, 'link': class_path + item_id} if display_props: # format_item does the permission checks r.update(self.format_item(class_obj.getnode(item_id), item_id, props=display_props, verbose=verbose)) if r: result['collection'].append(r) in rest.py filters obj_list (returned by the database) by permission. The permissions generated from the user's Role is checked in Python even if a permission has no check method. > I do have several classes with permission methods in my > trackers. I've often thought about this, if it would make > sense to implement permissions that can be checked at SQL > time so that we do not need to filter in python. On some > of my trackers filtering in python is a real performance > killer because the tables are quite big... Pushing permissions down to the DB backend is an interesting idea. Sadly I can't think of a way to make it happen. With the check method, Roundup's access control changes from an RBAC system to something more powerful. In theory I can use a check method on a permission to deny access based on time of day. This makes it more of an ABAC (Attribute based access control) system. I think that expressiveness makes it difficult if not impossible to push permissions to the database layer and keeps it in the hyperdb. > So in a first step we may want to use your new optimized > implementation only if there are no check methods. As I stated above, I think the Permissions processing happens in Python in all cases. Changing the max size of a returned set to a number 1 more than @page_size should fix the bug that triggered this issue with no performance impact. > Also the check methods are not sorted: Some users may see > everything but we are sometimes checking the low-performance > (method-based) checks before the lightweight checks that may > permit everything for a certain role. An optimized version > of the permission checks that sorts permissions by role and > tries permissions without check methods first would be nice > here. AFAIK, permissions aren't sorted by role. The role is used to generate a list of permissions and then is discarded. Did you mean permission type/name? I agree with optimizing the application of the permissions list. I think I proposed something of the sort many years ago. IIRC all permissions are in a single list. So you could have (args are type, class, properties, check, props_only): Permission('Edit', user, [name, phonenumber], during_daytime) (1) Permission('View', user, [name, phonenumber], during_daytime) (2) Permission('Edit', issue, [nosy,], None, props_only=True) (3) Permission('View', issue,) (4) ... Permission('View', issue, [title, nosy, priority,]) (5) in a single list. if you are checking to see if somebody can view the issue, you have to iterate over 1-3 for each item in the database before reaching 4. Grouping the permission list by type/class: P[edit][user] = [ Permission('edit', user, [name, phonenumber], during_daytime) ] P[view][issue] = [Permission('view', issue,), Permission('view', issue, [title, nosy, priority,])] ... makes selecting all permissions by type and class an O(1) rather than O(n) cost. It should have no effect on the expressed permissions. This should reduce the number of permissions that have to be iterated over. But, I have no idea if traversal of the list is a major issue. Sorting the permissions can help as well. I am concerned that sorting could result in changing the applied permissions from the unsorted state. AFAIK, unlike detectors permissions don't have a method to set ordering which plays a role in what permission is expressed. > Sorry for making this issue into a wishlist :-) I agree permissions and their application can be a bottleneck. I wish I had better developer documentation on how they work. IIRC when I added props_only it was because a class level check was being allowed by a property level permission. Documenting the various check and permission types and how they play together would be helpful. I think further discussion of this should be on a new ticket. Agreed?
msg7976	Author: [hidden] (rouilj)	Date: 2024-04-01 19:58
Linked the prior two messages to issue2551330 and started a new ticket for discussing performance improvement in permissions.

History
Date	User	Action	Args
2024-04-01 19:58:08	rouilj	set	status: open -> fixed resolution: fixed messages: + msg7976
2024-04-01 15:24:35	rouilj	set	messages: + msg7974
2024-04-01 09:13:41	schlatterbeck	set	messages: + msg7973
2024-04-01 05:49:20	rouilj	set	status: new -> open messages: + msg7972
2024-03-30 07:26:56	rouilj	link	issue2551264 superseder
2024-03-28 16:53:26	rouilj	set	assignee: rouilj messages: + msg7969
2024-03-28 13:32:15	rouilj	create