Issue 2551212: Speed up wsgi_handler.py

classification

Title:	Speed up wsgi_handler.py
Type:	rfe	Severity:	urgent
Components:	Web interface	Versions:	devel

process

Status:	fixed	Resolution:	fixed
Dependencies		Superseder:
Assigned To:	rouilj	Nosy List:	marcus.priesch, mbehrle, rouilj, schlatterbeck
Priority:	normal	Keywords:	patch

Created on 2022-06-27 08:31 by marcus.priesch, last changed 2022-07-04 08:27 by schlatterbeck.

Files
File name	Uploaded	Description	Edit	Remove
wsgi_handler_speedup.patch	marcus.priesch, 2022-06-27 08:31
wsgi_cache_tracker.patch	rouilj, 2022-06-28 01:54	Added feature flags around Marcus' patch. Added docs, CHANGES.txt, updated test.

Messages
msg7586	Author: [hidden] (marcus.priesch)	Date: 2022-06-27 08:31
Hi There, on our last set of patches, the speed up patch was not applied to wsgi_handler.py upon trying what is currently in the repo i found that my rest-powered frontend again is rather slow. timing the requests gives 230-240ms per request with the current code, whereas with the attached patch applied i come down to 70-80ms per request ! i remember we discussed that the code currently in use reloads the tracker/templates on every request, which was decided to be a good thing. discussing with ralf we came to the conclusion, we could do this optionally via a config option. my personal opinion rather would be to apply the patch and to have the "template reload" feature only when using roundup-server and not in the wsgi backend - as this (in my opinion) mainly should be used in production environments where speed matters. however speaking about automatic code reload, django also does this (and i think web.py also, which has a smaller code base), maybe we should take a look at how they are doing things - with the current behaviour only the trackers auditors and templates get reloaded but i dont think changes in roundup core get reloaded also ... but i could be wrong here ... regards, marcus.
msg7587	Author: [hidden] (rouilj)	Date: 2022-06-27 14:10
Hi Marcus: How well have you tested your changes to make sure that the lack of reinitialization can't cause security issues? It looks like a new client is created on every connect which I think should guarantee that database access by a client is only done with proper access restrictions but the comment in instance.open that says: # load the database schema # we cannot skip this part even if self.optimize is set # because the schema has security settings that must be # applied to each database instance could a lack of a call to instance.open result in an incorrectly applied schema being used? Ralf ideas/concerns about applying this patch without tests less than three weeks before the 2.2.0 release? Ideas on how to test this? We do have test_live_server but I am not sure how do test any side effects from this patch. Also what is the effect of: with self.get_tracker() as tracker: as a context handler in the original code? Does it do any close/cleanup/release of resources? It looks like preload() just calls get_tracker(). Is it's role simply to prime the pump and preload/precompile the tracker before the call to: with self.get_tracker() as tracker: so that the cost isn't paid when the first connection is made? -- rouilj
msg7588	Author: [hidden] (schlatterbeck)	Date: 2022-06-27 15:18
On Mon, Jun 27, 2022 at 02:10:06PM +0000, John Rouillard wrote: > > John Rouillard added the comment: > > Hi Marcus: > > How well have you tested your changes to make sure that the lack of > reinitialization can't cause security issues? I don't think this will cause security implications. Only the instance is cached, not the database. Note how the client calls instance.open with a username in Client.opendb. This means we do not reload all reactors and actions for each request. But we do open the database with a new user in each request. > It looks like a new client is created on every connect which I think > should guarantee that database access by a client is only done with > proper access restrictions but the comment in > instance.open that says: > > # load the database schema > # we cannot skip this part even if self.optimize is set > # because the schema has security settings that must be > # applied to each database instance > > could a lack of a call to instance.open result in an incorrectly > applied schema being used? See above: instance.open seems to be called for each request. Of course we should test this, so technically the patch is lacking a test that makes sure that for each request an instance.open is called. I'm currently not sure how to test the wsgi in a test, though. > Ralf ideas/concerns about applying this patch without tests less than > three weeks before the 2.2.0 release? I didn't apply that particular patch at the time because I didn't fully understand it (and the i18n stuff was unrelated and worked without the patch). But I'm fairly sure it will not cause major breakage. But we can put it in the next release, I'm running with intermediate versions all the time. Then I'll probably have experience with a big deployment for the next release :-) > Ideas on how to test this? We do have test_live_server > but I am not sure how do test any side effects from this patch. > > Also what is the effect of: > > with self.get_tracker() as tracker: > > as a context handler in the original code? Does it do any > close/cleanup/release of resources? I think that was the original intention. > It looks like preload() just calls get_tracker(). Is it's role simply > to prime the pump and preload/precompile the tracker before the call > to: > > with self.get_tracker() as tracker: > > so that the cost isn't paid when the first connection is made? This might have been the intention but I don't think this has ever worked (for wsgi!) because the instance is re-created on every request. Which means the template instantiation is not retained for the next request independent of the setting of 'optimize' in the instance. You might also want to compare Marcus' change to the wsgi with the implementation of roundup_server in scripts/roundup_server.py: This does also cache the instance, see how the get_tracker method of RoundupRequestHandler caches and re-uses the instance in the TRACKERS (line 200). Its a little complicated how TRACKERS is only initialized in a sub-class but looks like the same functionality to me. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7589	Author: [hidden] (rouilj)	Date: 2022-06-27 18:05
In message <20220627151828.45dnn3em4uv3sedi@runtux.com>, Ralf Schlatterbeck writes: >On Mon, Jun 27, 2022 at 02:10:06PM +0000, John Rouillard wrote: >> >> John Rouillard added the comment: >> How well have you tested your changes to make sure that the lack of >> reinitialization can't cause security issues? > >I don't think this will cause security implications. Only the instance >is cached, not the database. Note how the client calls instance.open >with a username in Client.opendb. >This means we do not reload all reactors and actions for each request. >But we do open the database with a new user in each request. Ok, that's what I expected from the Client initialization. >> It looks like a new client is created on every connect which I think >> should guarantee that database access by a client is only done with >> proper access restrictions but the comment in >> instance.open that says: >> >> # load the database schema >> # we cannot skip this part even if self.optimize is set >> # because the schema has security settings that must be >> # applied to each database instance >> >> could a lack of a call to instance.open result in an incorrectly >> applied schema being used? > >See above: instance.open seems to be called for each request. Of course >we should test this, so technically the patch is lacking a test that >makes sure that for each request an instance.open is called. I'm >currently not sure how to test the wsgi in a test, though. At the very least I can test wsgi startup using the test_liveserver.py module. Exposing that is a mystery to me too. >> Ralf ideas/concerns about applying this patch without tests less than >> three weeks before the 2.2.0 release? > >I didn't apply that particular patch at the time because I didn't fully >understand it (and the i18n stuff was unrelated and worked without the >patch). But I'm fairly sure it will not cause major breakage. But we can >put it in the next release, I'm running with intermediate versions all >the time. Then I'll probably have experience with a big deployment for >the next release :-) I am considering adding this with a feature switch. Since this feature deals with startup, I think the only place the feature switch can go is as an argument to RequestDispatcher(). AFAIK, the invocation of RequestDispatcher is in the wsgi wapper and is always locally coded so this should be doable. But I am not sure if is is a good idea/allowed. I would change RequestDispatcher::__init__ to read: def __init__(self, home, debug=False, timing=False, lang=None, feature=None) where feature is an object. So calling: app = RequestDispatcher(tracker_home, feature={"wsgi_single_load": true}) would set self.feature_flags and if wsgi_single_load is set would enable Marcus' 2 code path changes. If it was not set the current code paths would be run. Thoughts? This would allow yuo to use it in production but it would be disabled unless opted into. >> Also what is the effect of: >> >> with self.get_tracker() as tracker: >> >> as a context handler in the original code? Does it do any >> close/cleanup/release of resources? > >I think that was the original intention. Does it actually clean anything up? AFAICT it doesn't right? >> It looks like preload() just calls get_tracker(). Is it's role simply >> to prime the pump and preload/precompile the tracker before the call >> [...] >This might have been the intention but I don't think this has ever >worked (for wsgi!) because the instance is re-created on every request. >Which means the template instantiation is not retained for the next >request independent of the setting of 'optimize' in the instance. Yeah I was always a little confused how/if optimize worked. >You might also want to compare Marcus' change to the wsgi with the >implementation of roundup_server in scripts/roundup_server.py: This >does also cache the instance, see how the get_tracker method of >RoundupRequestHandler caches and re-uses the instance in the TRACKERS >(line 200). Its a little complicated how TRACKERS is only initialized in >a sub-class but looks like the same functionality to me. It certainly does look the same. Hence my thought to feature flag it.
msg7590	Author: [hidden] (rouilj)	Date: 2022-06-28 01:54
I am attaching a new patch. Marcus can you apply this and verify that using the feature flag as documented enables the same speedup you reported with your tests. In my testing I used test_liveserver.py and ran it 6 times in a row. I recorded the time pythonX -m pytest for the last 5 runs. I ran the loop 4 times under: python2 with feature_flags={} python3 with feature_flags={} python2 with with feature_flags={"cache_tracker": ""} python3 with with feature_flags={"cache_tracker": ""} The cached tracker does seem to be faster. With python2 test_liveserver.py completes about 4% faster on average. For python3 it completes about 16% faster. I'm not seeing your x3 improvement but that may be because the time to run the test is dominated by pytest and setup overhead. Also I am running anydbm as the backend that may add to the overhead. If this looks ok, I'll get it committed and then work on how to test. I think I might be able to set up an i/o buffer for the wsgi server to write to. Passing the i/o buffer in via a feature flag could work. Then I should be able to get the output just from the wsgi handler from there in the test harness. -- rouilj
msg7591	Author: [hidden] (rouilj)	Date: 2022-06-30 23:53
Marcus, have you tested my latest patch and verified that it works for you if you set the feature flag as documented in doc/upgrading.txt? Release is on 13 July so less than 2 weeks out. -- rouilj
msg7592	Author: [hidden] (schlatterbeck)	Date: 2022-07-01 06:43
On Thu, Jun 30, 2022 at 11:53:23PM +0000, John Rouillard wrote: > > Marcus, have you tested my latest patch and verified that it works for you > if you set the feature flag as documented in doc/upgrading.txt? He's on vacation, I'll look into your patch and test it here. Note that I think that the feature flag default should be on eventually: We may want to set it to off now and warn users in the release note that they should test it with the feature turned on because it will change in the next release? What do you think? Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7593	Author: [hidden] (schlatterbeck)	Date: 2022-07-01 08:07
On Tue, Jun 28, 2022 at 01:54:03AM +0000, John Rouillard wrote: > > The cached tracker does seem to be faster. With python2 > test_liveserver.py completes about 4% faster on average. For python3 > it completes about 16% faster. I'm not seeing your x3 improvement but > that may be because the time to run the test is dominated by pytest > and setup overhead. Also I am running anydbm as the backend that may > add to the overhead. This is probably also because the tracker in question has a lot more detectors and actions and a lot more templates. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7594	Author: [hidden] (schlatterbeck)	Date: 2022-07-01 08:18
On Tue, Jun 28, 2022 at 01:54:03AM +0000, John Rouillard wrote: > > I am attaching a new patch. Marcus can you apply this and verify that > using the feature flag as documented enables the same speedup you > reported with your tests. John, I've just looked at your patch and it seems to require that the user changes some things in the guts of roundup? What do you think about making this a config option that for now is off by default and will later becom on by default (next release)? We can warn in the release notes that people should test the option and that it boosts performance (especially if there are many detectors and/or templates). Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7595	Author: [hidden] (schlatterbeck)	Date: 2022-07-01 09:26
I should read the documentation that comes with a patch more closely: John, I'm fine with your changes, in fact the way you use the feature flags is nice, the user has to come up with a WSGI script anyway so changing this is fine. No need for a config option and we can make the feature the default later on. With your patch turned on I'm observing a speedup from about 270ms down to about 80-85ms on the same tracker Marcus was using (see https://github.com/time-track-tool/time-track-tool). So from my side: Lets go for it. I'll keep you posted when we have rolled the changes including the feature-flag in production.
msg7596	Author: [hidden] (rouilj)	Date: 2022-07-01 11:18
In message <20220701064342.5gxgebwj2mq5ksh6@runtux.com>, Ralf Schlatterbeck writes: >Ralf Schlatterbeck added the comment: > >On Thu, Jun 30, 2022 at 11:53:23PM +0000, John Rouillard wrote: >> >> Marcus, have you tested my latest patch and verified that it works for you >> if you set the feature flag as documented in doc/upgrading.txt? > >He's on vacation, I'll look into your patch and test it here. >Note that I think that the feature flag default should be on >eventually: If it still oooks fine by the time 2.3.0 is released, the feature flag will become a no-op and only the patched version will be present. I only threw the feature flag in because it came in at the last minute and I wanted to make it available for testing. >We may want to set it to off now and warn users in the >release note that they should test it with the feature turned on because >it will change in the next release? What do you think? That's my intent for the 2.3.0 release. The docs include the following lines (the patch attached tothe issue has a typo 2.0.2 in place of 2.2.0) : Note that this is experimental and was added during the 2.2.0 beta period, so it is hidden behind a feature flag. If you use this and it works better for you please followup with an email on the roundup-users at lists.sourceforge.net. Thanks.
msg7597	Author: [hidden] (rouilj)	Date: 2022-07-01 11:21
In message <20220701081837.wrp2izp3rmo7h2kn@runtux.com>, Ralf Schlatterbeck writes: > > >Ralf Schlatterbeck added the comment: > >On Tue, Jun 28, 2022 at 01:54:03AM +0000, John Rouillard wrote: >> >> I am attaching a new patch. Marcus can you apply this and verify that >> using the feature flag as documented enables the same speedup you >> reported with your tests. > >John, I've just looked at your patch and it seems to require that the >user changes some things in the guts of roundup? >What do you think about making this a config option that for now is off >by default and will later becom on by default (next release)? We can >warn in the release notes that people should test the option and that it >boosts performance (especially if there are many detectors and/or >templates). Npe the only things that should change to enable it is the wsgi.py wrapper written by the user or retreived from the frontends directory. No internal code changes need to be done by the user to enable it. At the point where the patch's action has to happen in roundup/cgi/wsgi_hander.py there no tracker is open, so no way to read the config file.
msg7598	Author: [hidden] (rouilj)	Date: 2022-07-02 18:08
Committed with doc changes, testing with feature enabled/disabled. Also enabled by default in frontends/wsgi.py. Thanks Marcus and Ralf. changeset: 6747:d32d43e4a5ba (Also I noticed that wsgi.py wasn't being installed like roundup.cgi. I fixed this as well on changeset: 6745:3c198cd7c24e)
msg7599	Author: [hidden] (schlatterbeck)	Date: 2022-07-04 08:27
On Sat, Jul 02, 2022 at 06:08:06PM +0000, John Rouillard wrote: > > Committed with doc changes, testing with feature enabled/disabled. > > Also enabled by default in frontends/wsgi.py. Thanks Marcus and Ralf. Thanks John! Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com

History
Date	User	Action	Args
2022-07-04 08:27:50	schlatterbeck	set	messages: + msg7599
2022-07-02 18:08:06	rouilj	set	priority: normal status: open -> fixed resolution: fixed messages: + msg7598
2022-07-01 11:21:41	rouilj	set	messages: + msg7597
2022-07-01 11:18:05	rouilj	set	messages: + msg7596
2022-07-01 09:26:02	schlatterbeck	set	messages: + msg7595
2022-07-01 08:18:39	schlatterbeck	set	messages: + msg7594
2022-07-01 08:07:25	schlatterbeck	set	messages: + msg7593
2022-07-01 06:43:45	schlatterbeck	set	messages: + msg7592
2022-06-30 23:53:23	rouilj	set	messages: + msg7591
2022-06-28 01:54:03	rouilj	set	status: new -> open assignee: rouilj messages: + msg7590 files: + wsgi_cache_tracker.patch
2022-06-28 01:47:01	admin	set	files: + wsgi_handler_speedup.patch
2022-06-28 01:42:21	rouilj	set	files: - wsgi_handler_speedup.patch
2022-06-27 18:05:29	rouilj	set	messages: + msg7589
2022-06-27 17:15:49	mbehrle	set	nosy: + mbehrle
2022-06-27 15:18:31	schlatterbeck	set	messages: + msg7588
2022-06-27 14:10:06	rouilj	set	messages: + msg7587
2022-06-27 08:31:58	marcus.priesch	create

Roundup Tracker - Issues

Issue 2551212