Roundup Tracker - Issues

Issue 2551309

classification
Built-in mailbox polling scheduler
Type: behavior Severity: normal
Components: Mail interface Versions: devel, 2.4.0
process
Status: new
:
: : asavchuk, rouilj
Priority: :

Created on 2023-12-14 23:10 by asavchuk, last changed 2024-05-29 15:03 by rouilj.

Messages
msg7887 Author: [hidden] (asavchuk) Date: 2023-12-14 23:10
Hello! I'm also interested in polling IMAP mailboxes when using containerized Roundup. Documentation says the only way is using cron tasks. 

I think I have an idea. It would be possible to implement built-in scheduler using native Python sched module. In this case, all the necessary values can be stored in the Roundup configuration file (or as secrets) and the roundup.mailgw module will be launched directly by the main Roundup process.

This could solve the permissions issues and the like.

What do you think?
msg7888 Author: [hidden] (rouilj) Date: 2023-12-15 00:29
Hi Anton:

In message <1702595412.18.0.970424296848.issue2551309@roundup.psfhosted.org>,
Anton Savchuk writes:
>Hello! I'm also interested in polling IMAP mailboxes when using
>containerized Roundup. Documentation says the only way is using cron tasks. 
>
>I think I have an idea. It would be possible to implement built-in
>scheduler using native Python sched module.

How would sched be invoked? Out of the main loop in the server, a
separate thread, a subprocess started from the main server?

Running roundup-server in the current containerization was a "get
something running" attempt.  For real deployments (50+
connections/sec) running Roundup via wSGI under gunicorn (as is done
with bugs.python.org) is the "production" deployment method.

Will using the sched module work if there is no master roundup-server?
IIUC wSGI instances basically sit idle until triggered from
upstream. I'm not sure how only one instance of sched can be run in
such an environment.

Also roundup-server can run in either forking or threaded mode. By
default we chose forking (multiprocess). In the future this may be
extended to include a subinterpreter method rather than a fork. Can
sched work in a threaded environment? (There is a report that threads
respond faster (2x+) under light load, but the response times increase
faster than the forking model under load. Why I'm not sure.)

I'd like to use subinterpreters in roundup-server when it is
implemented in a future Python. It will make roundup-server scale
better (not to gunicorn levels but) and be usable in more scenarios.
But it is too early to understand how sched and subinterpreter will
co-exist.

https://tonybaloney.github.io/posts/sub-interpreter-web-workers.html
https://realpython.com/python312-subinterpreters/
https://peps.python.org/pep-0734/

Hopefully subinterpreter will be like multiprocess.

Thoughts?

Have a great weekend.
msg8077 Author: [hidden] (asavchuk) Date: 2024-05-29 01:46
Hello John, sorry for the long silence.

Thanks for your questions. I thought about this issue. Anything that requires special wsgi server settings doesn't look very good. I can't seem to imagine anything better than using a second container that has the same mounted volume as the main roundup container, but runs a permanent process that has access to the trackers and only serves the mail gateway.

This must be something other than a cron running roundup-mailgw, since a cron job can only use one mailbox and one tracker. We need to be able to set polling times and mailboxes for each tracker. I think it's best if the process gets them from the tracker settings file. This allows us to save mailbox credentials in secret files.

I found a wrapper around mailgw.py that runs as a daemon and polls IMAP mailboxes. It reads the arguments from the input, but I think it’s not difficult to rewrite it to read settings from configuration files. It is located in the scripts/imapServer.py file of the current Roundup repository:

https://sourceforge.net/p/roundup/code/ci/default/tree/scripts/imapServer.py

I'm not sure we need some kind of asynchronous process. To begin with, we can take this wrapper as a starting point.

Also I'm not sure that for container environments we need anything other than IMAP support. POP3 is outdated. Also, no one seems to be mounting mailbox directories into a third party container.

But, wait... maybe we can using REST API to read the settings and communicate with the mail gateway? This looks better than searching for mailgw executable on a mounted volume.

I will try to answer in other threads soon. Sorry again.
msg8079 Author: [hidden] (rouilj) Date: 2024-05-29 03:03
[due to an encoding glitch, this was lost from the ticket.]
Hi Anton:

In message <1716947202.87.0.721227994857.issue2551309@roundup.psfhosted.org>,
Anton Savchuk writes:
>Hello John, sorry for the long silence.

No problem. Async communication is the norm 8-).

>[...] I can't seem to imagine anything better than using a second
>container that has the same mounted volume as the main roundup
>container, but runs a permanent process that has access to the
>trackers and only serves the mail gateway.

They are called sidecar containers in kubernetes and yes I think
that's the way to go.

Also I am assuming you are running one container that serves multiple
trackers. So you also want one container to get email for all of the
trackers.

>This must be something other than a cron running roundup-mailgw,
>since a cron job can only use one mailbox and one tracker.

Umm, not really. You can have multiple cron jobs scheduled running
under one cron binary. E.G.

   0,10,20,30,40,50 * * * * /usr/bin/roundup-mailgw /opt/roundup/trackers/support imap `cat 
/run/secrets/support.imap`  # using docker secrets
   0,10,20,30,40,50 * * * * /usr/bin/roundup-mailgw /opt/roundup/trackers/devel imap `cat 
devel.imap`
   0,10,20,30,40,50 * * * * /usr/bin/roundup-mailgw /opt/roundup/trackers/saleseng imap `cat 
saleseng.imap`

You could hard code the crontab into the image or use a startup script
that gets that info from somewhere and creates the crontab. If you
used docker secrets you can store the info there.

Granted this exposes the credentials on the command line. I thought
there was an option to get this from a config file, but roundup-mailgw
doesn't support reading options from a file (or stdin like
roundup-admin).

I had hoped to find a simple smtp -> program docker container, but all
the SMTP containers are running a full postfix/sendmail/... plus other
services (imap/anti-spam/...). I had hoped to develop somthing using
Python's smtpd module, but that's deprecated so...

>We need to be able to set polling times and mailboxes for each
>tracker. I think it's best if the process gets them from the tracker
>settings file. This allows us to save mailbox credentials in secret
>files.

Agreed. That would have to be added to the configuration
code. Probably in the [mailgw] section. Something like:
mbox_specification as a secret argument that can be obtained from a
file.

I am a little worried about pushing all of the possible arguments for
mailgw into the config file (e.g. schedule times). But putting the
credentials into config.ini makes sense. Maybe roundup-mailgw needs a
'-f' argument that takes a filename with arguments for mailgw.

>I found a wrapper around mailgw.py that runs as a daemon and polls
>IMAP mailboxes. It reads the arguments from the input, but I think
>it’s not difficult to rewrite it to read settings from configuration
>files.

or:

  cat config_file | imapServer.py

no rewriting needed.

>It is located in the scripts/imapServer.py file of the current
>Roundup repository:

I had forgotten about that script. I'm not sure what state it's in but
it looks like Ralf did some updates in 2022.

>I'm not sure we need some kind of asynchronous process. To begin
>with, we can take this wrapper as a starting point.

I'm not sure what you mean by asynchronous process. Do you mean the
ability to schedule each tracker separately?

>Also I'm not sure that for container environments we need anything
>other than IMAP support. POP3 is outdated. Also, no one seems to be
>mounting mailbox directories into a third party container.

Fair enough.

>But, wait... maybe we can using REST API to read the settings and
>communicate with the mail gateway?

Not sure what you mean here. Would you create a new rest endpoint that
triggers mailgw when called?

Or would you have a new rest endpoint return credentials for
roundup-mailgw? The tracker's config.ini has the credentials for
sending email. However IIRC config.ini doesn't have any
credentials/config for receiving email.

>This looks better than searching for mailgw executable on a mounted
>volume.

The roundup docker image doesn't have roundup installed in a mounted
volume.  The mounted volume is strictly for the tracker homes.

The docker image already supports number of subcommands (admin, shell,
demo ..). Adding one more for email would be possible.  Then you would
run the single docker image in two modes (and two containers). Default
mode (which runs roundup-server) and email mode which runs imapServer.

Have a great day.
msg8082 Author: [hidden] (asavchuk) Date: 2024-05-29 05:09
> Umm, not really. You can have multiple cron jobs scheduled running
> under one cron binary.

Yes, we can have multiple cron jobs, but if we want to get credentials or time from somewhere else, we need to write another wrapper to configure the crontab.

But mounting crontab using a docker/podman secret looks like a some solution. For example, it can be templated with Ansible.

> I'm not sure what you mean by asynchronous process. Do you mean the
> ability to schedule each tracker separately?

Something like a queue for each mailbox, independent of the others.

> Not sure what you mean here. Would you create a new rest endpoint that
> triggers mailgw when called?

I think we have several implementation options that we can think about.

For example, we can just create new issues using POST requests with authorization. But in this case we need to process emails on the poller side. It might be better to split the roundup package into roundup-www and roundup-mailgw so that they can be use separately, but I think that would require a lot of work.

Another option is to have some endpoint that will accept the serialized messages from the poller and pass it to mailgw for processing. In this case, the poller must perform preprocessing to serialize emails.

If we are trying to use the REST API, it looks like the poller needs to have its own configuration. But maybe this is a better way. It would also be nice to be able to allow requests from the poller URL.

> The roundup docker image doesn't have roundup installed in a mounted
> volume.  The mounted volume is strictly for the tracker homes.

Yes, but I should be able to create custom image where I can mount full virtual environment if I wish. And the poller should be able to work with this.

> The docker image already supports number of subcommands (admin, shell,
> demo ..). Adding one more for email would be possible.  Then you would
> run the single docker image in two modes (and two containers). Default
> mode (which runs roundup-server) and email mode which runs imapServer.

Yes, that's an option. At least it requires a less work than in other cases.

But maybe it would be nice to teach mailgw and trackers to communicate via REST API?
msg8085 Author: [hidden] (rouilj) Date: 2024-05-29 15:03
Hi Anton:

In message <trinity-d00a0ece-b2c8-45e0-bc65-77d8b1d41bc6-1716959364829@3c-app-m
ailcom-bs09>,
Anton Savchuk writes:
>> Umm, not really. You can have multiple cron jobs scheduled running
>> under one cron binary.
>
>Yes, we can have multiple cron jobs, but if we want to get
>credentials or time from somewhere else, we need to write another
>wrapper to configure the crontab.

True, but you are writing something in any case. Write a wrapper to
configure cron or you write a daemon to persistently poll for email.

>But mounting crontab using a docker/podman secret looks like a some
>solution. For example, it can be templated with Ansible.

Yup.

>> I'm not sure what you mean by asynchronous process. Do you mean the
>> ability to schedule each tracker separately?
>
>Something like a queue for each mailbox, independent of the others.

If you mean only allow one process to access a remote mailbox at a
time, yes I agree. Otherwise you have the possibility of processing a
message multiple times. Nobody want 10,000,000 issues because of a
spinning poller.

>> Not sure what you mean here. Would you create a new rest endpoint that
>> triggers mailgw when called?
>
>I think we have several implementation options that we can think about.
>
>For example, we can just create new issues using POST requests with
>authorization.

One issue with this is that the mailgw impersonates the email sender
inside of roundup. This is not supported over REST. It's not
impossible, but it is complex and would mean that at minimum you have
to:

  map email sender to a roundup user

  potentially create a new user (if allowed by config.ini)

  become the new user (maybe using a JWT) for multiple requests to
     create the issue and new messages, timelogs etc.

  handle all the options mailgw currently does -- dropping signatures,
     handling quoted material, prefix parsing to get the right issue
     to update etc.

and this is just 30 seconds of thought. I am sure there are a lot more
issues to deal with to create an email -> REST gateway that
reimplements roundup-mailgw.

>But in this case we need to process emails on the
>poller side.

The poller will always have to process the emails.
The question is how much processing.

>Another option is to have some endpoint that will accept the
>serialized messages from the poller and pass it to mailgw for
>processing.

You mean like smtp, or (using netcat)

  nc -p 2025 -l | roundup-mailgw /path/to/tracker

where the poller sends messages to port 2025 (or 25) serially or in
parallel.

>In this case, the poller must perform preprocessing to
>serialize emails.

The poller has to 'lock' the IMAP source, and trigger processing for
each email to make sure an email isn't procesed twice. But multiple
emails can be processed in parallel. Just depends on how you set up
your endpoint.

(IIRC, there is an outstanding request to have roundup-mailgw do it's
own locking so multiple mailgw pollers with the same options can't run
in parallel. It's not as simple as one mailgw poller per tracker as
you can have multiple email addresses per tracker which shouldn't
block each other.)

>> The docker image already supports number of subcommands (admin, shell,
>> demo ..). Adding one more for email would be possible.  Then you would
>> run the single docker image in two modes (and two containers). Default
>> mode (which runs roundup-server) and email mode which runs imapServer.
>
>Yes, that's an option. At least it requires a less work than in other
>cases.
>
>But maybe it would be nice to teach mailgw and trackers to
>communicate via REST API?

That would be tricky as I outline above. You could just connect to an
endpoint served up under /rest and post an email to it. The endpoint
will do all the work of roundup-mailgw. But that's not really
REST. It's really a remote procedure call at that point.

Have a great day.
History
Date User Action Args
2024-05-29 15:03:31rouiljsetmessages: + msg8085
2024-05-29 05:09:26asavchuksetmessages: + msg8082
2024-05-29 03:03:59rouiljsetmessages: + msg8079
2024-05-29 01:46:42asavchuksetmessages: + msg8077
2023-12-15 00:29:59rouiljsetnosy: + rouilj
messages: + msg7888
2023-12-14 23:10:12asavchukcreate