Roundup Tracker - Issues

Issue 2551330

classification
Improving permission performance - RBAC, ABAC push to db layer from hyperdb
Type: security Severity: normal
Components: Database, Web interface, API Versions:
process
Status: new
:
: : rouilj, schlatterbeck
Priority: :

Created on 2024-04-01 19:52 by rouilj, last changed 2024-04-22 15:43 by rouilj.

Messages
msg7973 Author: [hidden] (schlatterbeck) Date: 2024-04-01 09:13
Hmm: When there are permissions with check methods on a Class, we are checking permissions *in python* not in the database. The consequence is that we do not know at SQL time which items will be displayed.

I do have several classes with permission methods in my trackers. I've often thought about this, if it would make sense to implement permissions that can be checked at SQL time so that we do not need to filter in python. On some of my trackers filtering in python is a real performance killer because the tables are quite big...

So in a first step we may want to use your new optimized implementation only if there are no check methods.

Also the check methods are not sorted: Some users may see everything but we are sometimes checking the low-performance (method-based) checks before the lightweight checks that may permit everything for a certain role. An optimized version of the permission checks that sorts permissions by role and tries permissions without check methods first would be nice here.

Sorry for making this issue into a wishlist :-)

Thanks
Ralf
msg7974 Author: [hidden] (rouilj) Date: 2024-04-01 15:24
Hi Ralf:

I committed changeset: 7853:03c1b7ae3a68. While I was composing
this response, it passed CI and all code paths are checked for
what that's worth.

Your performance concern can be addressed by
reducing the number of rows returned from the SQL (or
anydbm) db using the method incorporated in this changeset.
See admin_guide.txt look for max_response_row_size.

Reduce the returned data set to 100 or 1000 items. The only
down side is that the total size returned will be set to -1
indicating the total size is unknown. Which is an improvement
from the @total_size hallucination that used to be returned 8-).

Currently the max size is a constant for all classes. You
could experiment with making max size a function that takes
a class/classname and @page_size and returns a different
limit value for different inputs. I didn't do it because
YAGNI but if YNI.

> When there are permissions with check methods on a Class,
> we are checking permissions *in python* not in the
> database. The consequence is that we do not know at SQL
> time which items will be displayed.

AFAICT all permission checks are in Python. Regardless of a
check method.

This bit of code:

        for item_id in obj_list:
            r = {}
            if self.db.security.hasPermission(
                'View', uid, class_name, itemid=item_id, property='id',
            ):
                r = {'id': item_id, 'link': class_path + item_id}
            if display_props:
                # format_item does the permission checks
                r.update(self.format_item(class_obj.getnode(item_id),
                    item_id, props=display_props, verbose=verbose))
            if r:
                result['collection'].append(r)

in rest.py filters obj_list (returned by the database) by
permission.

The permissions generated from the user's Role is
checked in Python even if a permission has no check method.

> I do have several classes with permission methods in my
> trackers. I've often thought about this, if it would make
> sense to implement permissions that can be checked at SQL
> time so that we do not need to filter in python. On some
> of my trackers filtering in python is a real performance
> killer because the tables are quite big...

Pushing permissions down to the DB backend is an interesting
idea. Sadly I can't think of a way to make it happen. With
the check method, Roundup's access control changes from an
RBAC system to something more powerful. In theory I can use
a check method on a permission to deny access based on time
of day.

This makes it more of an ABAC (Attribute based access
control) system. I think that expressiveness makes it
difficult if not impossible to push permissions to the
database layer and keeps it in the hyperdb.

> So in a first step we may want to use your new optimized
> implementation only if there are no check methods.

As I stated above, I think the Permissions processing
happens in Python in all cases. Changing the max size of a
returned set to a number 1 more than @page_size should fix
the bug that triggered this issue with no performance impact.

> Also the check methods are not sorted: Some users may see
> everything but we are sometimes checking the low-performance
> (method-based) checks before the lightweight checks that may
> permit everything for a certain role. An optimized version
> of the permission checks that sorts permissions by role and
> tries permissions without check methods first would be nice
> here.

AFAIK, permissions aren't sorted by role. The role is used
to generate a list of permissions and then is discarded. Did
you mean permission type/name?

I agree with optimizing the application of the permissions
list. I think I proposed something of the sort many years
ago. IIRC all permissions are in a single list. So you could
have (args are type, class, properties, check, props_only):

  Permission('Edit', user, [name, phonenumber], during_daytime)  (1)
  Permission('View', user, [name, phonenumber], during_daytime)  (2)
  Permission('Edit', issue, [nosy,], None, props_only=True)      (3)
  Permission('View', issue,)                                     (4)
  ...
  Permission('View', issue, [title, nosy, priority,])            (5)

in a single list. if you are checking to see if somebody can
view the issue, you have to iterate over 1-3 for each item
in the database before reaching 4.

Grouping the permission list by type/class:

   P[edit][user] = [
       Permission('edit', user, [name, phonenumber], during_daytime)
   ]

   P[view][issue] = [Permission('view', issue,),
            Permission('view', issue, [title, nosy, priority,])]

   ...

makes selecting all permissions by type and class an O(1)
rather than O(n) cost. It should have no effect on the
expressed permissions. This should reduce the number of
permissions that have to be iterated over. But, I have no
idea if traversal of the list is a major issue.

Sorting the permissions can help as well. I am concerned
that sorting could result in changing the applied
permissions from the unsorted state. AFAIK, unlike detectors
permissions don't have a method to set ordering which plays
a role in what permission is expressed.

> Sorry for making this issue into a wishlist :-)

I agree permissions and their application can be a
bottleneck. I wish I had better developer documentation on
how they work. IIRC when I added props_only it was because a
class level check was being allowed by a property level
permission. Documenting the various check and permission
types and how they play together would be helpful.

I think further discussion of this should be on a new ticket.
Agreed?
msg7975 Author: [hidden] (rouilj) Date: 2024-04-01 19:52
Linked the first two messages from a discussion on issue2551328.
Ralf noted that performance when retrieving a lot of data tanks
due to permission checks in python at the hyperdb layer.
msg7977 Author: [hidden] (schlatterbeck) Date: 2024-04-02 08:03
I'm cut&pasting my email from yesterday here, my MTA reports

<issues@roundup-tracker.org>: host bugs.python.org[167.71.181.142] said: 454
    4.7.1 <issues@roundup-tracker.org>: Relay access denied (in reply to RCPT
    TO command)

So it seems we can no longer reply to messages from our bug tracker.

On Mon, Apr 01, 2024 at 03:24:35PM +0000, John Rouillard wrote:
> > When there are permissions with check methods on a Class,
> > we are checking permissions *in python* not in the
> > database. The consequence is that we do not know at SQL
> > time which items will be displayed.
>
> AFAICT all permission checks are in Python. Regardless of a
> check method.

Yes, I know, I was highlighting this because it's slow :-)

> in rest.py filters obj_list (returned by the database) by
> permission.
>
> The permissions generated from the user's Role is
> checked in Python even if a permission has no check method.

Yes, and the permissions are checked in arbitrary order. My proposal
would be to group permissions by role (so that we need to check only the
permissions where we know the user has that role) *and* order the easy
ones (without a check method) first. That way we can
- Check only the permissions for the roles we have
- Check the easy ones first. On first hit we stop. That way in cases
  where a user has permissions that are easy to check we stop early
  without touching the performance-problematic ones

>
> Pushing permissions down to the DB backend is an interesting
> idea. Sadly I can't think of a way to make it happen. With
> the check method, Roundup's access control changes from an
> RBAC system to something more powerful. In theory I can use
> a check method on a permission to deny access based on time
> of day.

Sure. I was thinking of having less expressive permissions than we
currently have with check methods. We may find that for many use-cases
we can formulate permissions that are checkable in SQL.

> AFAIK, permissions aren't sorted by role. The role is used
> to generate a list of permissions and then is discarded. Did
> you mean permission type/name?

I meant that as a proposed change, see above.

> Grouping the permission list by type/class:
>
>    P[edit][user] = [
>        Permission('edit', user, [name, phonenumber], during_daytime)
>    ]
>
>    P[view][issue] = [Permission('view', issue,),
>             Permission('view', issue, [title, nosy, priority,])]
>
>    ...
>
> makes selecting all permissions by type and class an O(1)
> rather than O(n) cost. It should have no effect on the
> expressed permissions. This should reduce the number of
> permissions that have to be iterated over. But, I have no
> idea if traversal of the list is a major issue.

The permissions with check methods are heavy on performance, we may want
to check the ones without method first and only if we found no
permission check the others. Would improve performance a lot for some of
my use-cases.

>
> Sorting the permissions can help as well. I am concerned
> that sorting could result in changing the applied
> permissions from the unsorted state. AFAIK, unlike detectors
> permissions don't have a method to set ordering which plays
> a role in what permission is expressed.

I do not think the semantics change. If a person has multiple
permissions on a Class (and/or properties) this should be equivalent to
just one permission.

> I think further discussion of this should be on a new ticket.
> Agreed?
Umm, that maybe should have been at the beginning :-)

But, yes, lets start a new ticket.

[Well, the last comment is obsolete because the mail never made it]
msg7978 Author: [hidden] (rouilj) Date: 2024-04-05 02:25
> So it seems we can no longer reply to messages from our bug
> tracker.

I have a ticket open with psf. It appears this has been broken
since their OS upgrade at the begining of th year.

> On Mon, Apr 01, 2024 at 03:24:35PM +0000, John Rouillard wrote:
> > AFAICT all permission checks are in Python. Regardless of a
> > check method.
>
> Yes, I know, I was highlighting this because it's slow :-)

Gotcha.

> > in rest.py filters obj_list (returned by the database) by
> > permission.
> >
> > The permissions generated from the user's Role is checked in
> > Python even if a permission has no check method.
> 
> Yes, and the permissions are checked in arbitrary order.

Reordering the permissions can help here.

> My proposal would be to group permissions by role (so that we
> need to check only the permissions where we know the user has
> that role) *and* order the easy ones (without a check method)
> first. That way we can - Check only the permissions for the
> roles we have

How would that work? Would you need to replace a
hasPermission('View', 'issue' ...) in the core code with a
hasRole() check? If so how would the code know what Role to use?
The same permission could be in multiple roles.

Also I wonder (see below) if the permissions are grouped by role
by default.

There is a hasRole() in templating.py, but I don't think that's
used outside of the templating system. Also as I remember it
checks for Anonymous and maybe Admin roles only. These are the
only two roles with predefined meanings.

> - Check the easy ones first. On first hit we stop. That way in
>   cases where a user has permissions that are easy to check we
>   stop early without touching the performance-problematic ones

This is my understanding of how things work, do you agree?

1) user authenticates
2) user's roles are looked up
3) security object is populated with the permissions associated
   with the user's roles
4) hasPermission(X) calls are made to see if the user can do X

Sorting to put "easy ones" first makes sense. See below on my
questions about ordering/reordering.

> > Pushing permissions down to the DB backend is an interesting
> > idea. Sadly I can't think of a way to make it happen. With
> > the check method, Roundup's access control changes from an
> > RBAC system to something more powerful. In theory I can use a
> > check method on a permission to deny access based on time of
> > day.
> 
> Sure. I was thinking of having less expressive permissions than
> we currently have with check methods. We may find that for many
> use-cases we can formulate permissions that are checkable in
> SQL.

AFAIK access is denied only if no permission matches. A
permission can't return a "deny", it only says "I don't
match". Is that your understanding as well?

I can see a way to do some of this if we define relationships
like:

  owner
  actor

which are done via check commands currently. These map to a
database check on 'creator' and 'actor'. But expressing a
permission like:

  If the issue is marked private the user needs to be on one of
     two access lists (nosy and verynosy in my case) to see the
     issue

requires a like check for the id number in the two fields. Not
insurmountable. But you would need to define some sort of method
to implement this using a join (maybe, my SQL isn't great).

> > AFAIK, permissions aren't sorted by role. The role is used to
> > generate a list of permissions and then is discarded. Did you
> > mean permission type/name?
> 
> I meant that as a proposed change, see above.

Ok. It's possible that the role does have an impact on
ordering. See below.

> > Grouping the permission list by type/class:
> >
> >    P[edit][user] = [
> >        Permission('edit', user, [name, phonenumber], during_daytime)
> >    ]
> >
> >    P[view][issue] = [Permission('view', issue,),
> >             Permission('view', issue, [title, nosy, priority,])]
> >
> >    ...
> >
> > makes selecting all permissions by type and class an O(1)
> > rather than O(n) cost. It should have no effect on the
> > expressed permissions. This should reduce the number of
> > permissions that have to be iterated over. But, I have no
> > idea if traversal of the list is a major issue.
> 
> The permissions with check methods are heavy on performance, we
> may want to check the ones without method first and only if we
> found no permission check the others. Would improve performance
> a lot for some of my use-cases.

Since no permission can cause a 'deny' action, I think sorting it
is safe, but I am not willing to put money on that.

Adding grouping by class (or None for tracker level permissions)
to the existing grouping by permission name I think is a
win. Then sorting within those buckets by:

   no properties assigned no check function
   no properties assigned check function
   properties assigned and properties defined
      without prop_only no check
   properties assigned and properties defined
      without prop_only with check
   properties assigned and defined with prop_only no check function
   properties assigned and defined with prop_only and check function

might be faster. I think this ordering can be expressed using a tuple
with one position for each criteria. So a permission list item for the
first bucket is:

   (properties=0, check=0, prop-only=0, permission object)

and the last bucket is

   (properties=1, check=1, prop-only=1, permission object)

So sorting should be doable.

I am not sure what the default order of permisions is. I think the
order of permissions assigned to a role is in the order it's assigned
in schema.py. I would expect the order for a user with roles User,
Role1, Role2 would be:

  User permissions in order they were added in schema.py
  Role1 permissions in order they were added in schema.py
  Role2 permissions in order they were added in schema.py

Reordering into buckets as above could help.

> > Sorting the permissions can help as well. I am concerned that
> > sorting could result in changing the applied permissions from
> > the unsorted state. AFAIK, unlike detectors permissions don't
> > have a method to set ordering which plays a role in what
> > permission is expressed.
> 
> I do not think the semantics change. If a person has multiple
> permissions on a Class (and/or properties) this should be
> equivalent to just one permission.

I think you are corect but it's because a permission can't result
in a deny. It can only result in a "I don't match" so the
permission checker moves on to the next item.

Permissions defined with properties (e.g. View) match a request
for View permissions (without properties) on the class. So:

  havePermission('View', 'issue')

matches:

  addPermission(name='View', klass='issue', property=['title']).

This may or may not make a difference. It's possible that the
order I outlined above might be more efficient if all permisions
with a check were moved after the permissions without
checks. Otherwise the bucket order stays as above.

Thoughts?
msg8004 Author: [hidden] (rouilj) Date: 2024-04-22 15:43
Ralf, do you have more thoughts on this?
History
Date User Action Args
2024-04-22 15:43:22rouiljsetmessages: + msg8004
2024-04-05 02:25:49rouiljsetmessages: + msg7978
2024-04-02 08:03:40schlatterbecksetmessages: + msg7977
2024-04-01 19:55:35adminsetmessages: + msg7973, msg7974
2024-04-01 19:53:00rouiljcreate