Message 8129 - Roundup tracker

Author	rouilj
Recipients	rouilj, schlatterbeck
Date	2024-10-17.01:38:10
Message-id	<20241017013803.583396A01A3@pe15.cs.umb.edu>
In-reply-to	<1729010267.65.0.710188892934.issue2551330@roundup.psfhosted.org>
Hi Ralf:

In message <1729010267.65.0.710188892934.issue2551330@roundup.psfhosted.org>,
Ralf Schlatterbeck writes:
>Concerning speed-up via SQL:
>The main speed problem is for index templates: After the SQL has
>returned it will check all issues *in python* for visibility. And I
>think for every issue all permissions are checked again in the same
>order.

I agree with this. What the permission order is, I am not quite
sure. I think it's the same order that the permissions are assigned in
schema.py.

So for an issue index, you are scanning and discarding checks for the
user, keyword ... classes even though you only care about issues.

>There are fast permission checks that pass almost instantly like when
>the user has access to the whole class.  There are slow permission
>checks that require execution of a check method in python. A
>permission check method in python might make further queries slowing
>down things even more.  The idea outlined in earlier messages of
>sorting permissions by speed (i.e. the ones with check methods would
>be sorted last) would fix the problem for users having permissions
>without a check method because the first check that passes terminates
>the permission check.

I think this is right. IIRC the evaluation algorithm is:

   if permission test passes, return allow
   if permission test fails, go to next permission
   if no more permissions available return deny

If you can satify yourself that this is correct, then I claim order
doesn't matter.

(Note search operations check against both view and search
permissions, which I think is unique to have one operation check two
different permissions types. This might complicate things.)

>For users *only* having permissions that require execution of a
>check-method we could speed this up if
>- we can formulate a fast permission check (either in SQL or in
>  something similar to the current query language of the REST API),
>  this would be ANDed in SQL with the generated user query. If such a
>  method is present we would *not* filter issues in python but instead
>  rely on SQL to return only visible issues

So the check method would return SQL that can be pushed to the DB level?

Would these examples match what you are proposing?

   if userid == issue.creator allow

   results in SQL query:

      AND issue.creator == `<user id>`

  if creator has manager role, allow:

  results in python query (or subselect)

      manager_set = select id where user.role == `manager`

  with
       AND issue.creator in {manager set}

>  - A permission on the whole class without a check method would have a
>    trivial "True" here

and could be removed from the SQL query.

>  - This would be *in addition* to the normal visibility checks but
>    after retrieval of the list of issues from SQL we would *not* run
>    checks in python to determine which issues to show

Right if we pushed a check down to the SQL db level, don't rerun it in
Python.

>  - If there are several permissions with check methods this would only
>    work if all of them have the fast permission check set. The fast
>    permission checks for several issues would need to be ORed in SQL.

Right so the row is returned if any one of the checks pass.

>The reason why this would speed up things is that we do a first
>visibility check in SQL. Only later will we check each issue for
>visibility but this will occur only for a handful of issues displayed
>in one batch of the index template.

The hope is that pushing permission rules into the SQL will return
fewer rows that need to be filtered in Python.

I like the idea. A few questions about your tracker:

  1) how many permissions are tested in python before a row is
     allowed? (similar stats for denied would also be good)

  2) how many permissions for the wrong class are tested before a row
     is allowed/denied?

  3) how many permissions of the wrong type (e.g. view vs edit) are
     tested before a row is allowed/denied.

If 2 is 50% less than 1, we can restructure the permissions from

   permissions = [(permission tuple)+]

to

   permissions = { 'klass': [(permission tuple)+], 'klass2': ...}

This is a win regardless of the back end.

If 3 is also some large percentage of 1, maybe make permissions:

   permissions = { 'klass': { 'perm_name': [(permission tuple)+], 'pn2': ...},
                   'klass2': ...}

I would suggest finding this data first before we start pushing down
to the SQL level. I was going to add capturing:

  4) how many rows are returned from the db that are discarded by
     python filtering 

however if we have allow and deny stats for 1-3 I think we can
calculate 4.

Also if we agree that sort order doesn't matter, sorting (probably at
insertion time) in the order:

  only klass/name specified (includes klass = None (e.g. 'Web Access'))
  properties defined (with props_only=true or false, unordered by props_only)
  check defined

for each [(permission tuple)+] should help as well.

Also memoizing hasPermission() might help as well. If a call to
hasPermission doesn't include the itemid, we might be able to reduce
dozens of calls to a simple lookup.

Also you may have explained this, but I'm not recognizing it from
earlier in this ticket. The user's permissions only include
permissions defined for that user's roles. So why would we use Roles
as a selector (permissions['Role']['class']['perm_name']

Also another thought about including roles in the selection
structure. If role A and role B overlap such that both of them have
the same permission, does the permissions structure get two copies of
the same permission? If so then adding a filter/iterator over Roles
may or may not make sense. If the same permission was added by Role A
and one by Role B so we care which permission is used to allow the
item?

Personally I would prefer that the duplicate rule not be added at all,
but that seems difficult to determine and is not low lying fruit.

Also if somebody has 30 roles, adding roles to the tuple could mean 29
lookups and possible iterations over the permissions tuple list. Can
the admin define the order that Roles are returned? I seem to recall
the roles are returned in alphabetic sorted order but....

Thanks for prompting me on this.
History
Date	User	Action	Args
2024-10-17 01:38:11	rouilj	set	recipients: + rouilj, schlatterbeck
2024-10-17 01:38:11	rouilj	link	issue2551330 messages
2024-10-17 01:38:10	rouilj	create
Roundup Tracker - Issues

Message8129