Roundup Tracker - Issues

Issue 2550799

classification
Title: provide basic support for handling html only emails
Type: rfe Severity: normal
Components: Mail interface Versions: 1.4
process
Status: new Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ber, marlowa, rouilj
Priority: high Keywords: Effort-Medium

Created on 2013-03-07 04:39 by rouilj, last changed 2014-10-18 02:13 by rouilj.

Files
File name Uploaded Description Edit Remove
dehtml.py rouilj, 2013-03-07 04:39
unnamed marlowa, 2014-03-20 15:05
unnamed marlowa, 2014-03-20 15:07
Messages
msg4816 Author: [hidden] (rouilj) Date: 2013-03-07 04:39
Currently when html only email is sent roundup rejects the email.

We should make roundup extract the text from the html and post that
(possibly with adding the html as an attachment).

To do this we need to change the mail gateway to find the html
portion of the email and convert to text. There are a few ways
to do the conversion:

  1) use an external program like links -dump
  2) use code like beautiful soup, nltk.clean_html()
  3) use the stupid little class/function attached that you
     can drop in utils as well if you wish.
msg4817 Author: [hidden] (ber) Date: 2013-03-07 09:40
I agree, this was one the wishes mentioned twice in our short user
survey.
msg4818 Author: [hidden] (rouilj) Date: 2013-03-07 14:07
In message <1362649218.01.0.546514816848.issue2550799@psf.upfronthosting.co.za>
 <1362649218.01.0.546514816848.issue2550799@psf.upfronthosting.co.za>,
Bernhard Reiter writes:
>Bernhard Reiter added the comment:
>I agree, this was one the wishes mentioned twice in our short user
>survey.
>
>----------
>keywords:  -Effort-Medium

Do you think it's a high effort task?

I gave it a medium rating because I think it's an isolated task to the
mail gateway. Not trivial as you will have to learn about the mail
gateway's code. However understanding the gateway code to find out
where it exits if there is only an html part will tel you where to
hook the new code into the flow. Then converting the html to text and
creating a text/plain part should be person days of work not person
weeks or longer right?
msg4819 Author: [hidden] (ber) Date: 2013-03-07 15:05
(I removed the keyword by accident.)
msg5041 Author: [hidden] (marlowa) Date: 2014-03-19 09:25
This issue was brought up way back in 2007. See the thread at
http://sourceforge.net/p/roundup/mailman/message/13189177/. The thread
discusses some open source software called the ASCII-nator, which
converts HTML to ASCII for just such a case as this. The discussion sort
of fizzled out but I am pleased to see that other developers still
consider this to be an issue. AFAICS it has a simple solution with the
ASCII-nator.
msg5042 Author: [hidden] (ber) Date: 2014-03-20 10:54
Hi Andre,
as far as I can see, ASCII-nator still has license problems.
As John has pointed out, there are other solutions.
The solution chosen should be reasonable save of course. :)

So now we just need someone to do the work. >;)
Bernhard
msg5043 Author: [hidden] (rouilj) Date: 2014-03-20 13:40
Hi Bernhard:

In message <1395312895.49.0.517509067968.issue2550799@psf.upfronthosting.co.za>
 <1395312895.49.0.517509067968.issue2550799@psf.upfronthosting.co.za>,
Bernhard Reiter writes:
>as far as I can see, ASCII-nator still has license problems.

Is the problem you are referring to GPL V3's more restrictive license
and viral nature. The current roundup/zope page templates is more
permissive and almost BSD like.

Does the GPL V3 kick in on roundup code if we include ASCII-nator
source and call it as an external program, as I suggested with links
-dump or whatever? Having a native python mechanism even if accessed
via fork seems to be better alternative than a totally third party
program that has to be built. (I am thinking of a windows install
here, but I realise that the fork mechanism may not be possible on
windows.)
msg5044 Author: [hidden] (ber) Date: 2014-03-20 14:43
On Thursday 20 March 2014 at 14:40:50, John Rouillard wrote:
> Is the problem you are referring to GPL V3's more restrictive license
> and viral nature. 

It is more like a vaccination effect, if you ask me. :)
Yes, I believe that there may be a problem and roundup or a solution build on 
roundup could be considered a derived work. So when it doubt, we should
consider alternative solutions.

The subprocess would work on windows, but I don't think it is a particular ice 
technical solution. So before I recommend looking at the alternatives first.
msg5045 Author: [hidden] (marlowa) Date: 2014-03-20 15:05
On 20 March 2014 14:43, Bernhard Reiter <issues@roundup-tracker.org> wrote:

>
> Bernhard Reiter added the comment:
>
> On Thursday 20 March 2014 at 14:40:50, John Rouillard wrote:
> > Is the problem you are referring to GPL V3's more restrictive license
> > and viral nature.
>
> It is more like a vaccination effect, if you ask me. :)
>

It has been compared to taking a cutting. One takes a cutting consciensly,
knowing what will happen. Whereas a virus is caught by accident.

> Yes, I believe that there may be a problem and roundup or a solution build
> on
> roundup could be considered a derived work. So when it doubt, we should
> consider alternative solutions.
>

Maye this was the view back in 2007.

>
> The subprocess would work on windows, but I don't think it is a particular
> ice
> technical solution. So before I recommend looking at the alternatives
> first.
>

I think we might be able to go with what was suggested back in 2007, namely
that the code could try to do the import and use it if successful. The
documentation could mention that the ASCIInator will be used if present but
that its absence is not harmful. Thus the ASCIInator could be installed on
the same system as roundup and roundup may use it if present but it doesnt
matter that the two pieces of software have different licences.

>
> ________________________________________________
> Roundup tracker <issues@roundup-tracker.org>
> <http://issues.roundup-tracker.org/issue2550799>
> ________________________________________________
>

-- 
Regards,

Andrew Marlow
http://www.andrewpetermarlow.co.uk
msg5046 Author: [hidden] (marlowa) Date: 2014-03-20 15:07
On 20 March 2014 15:05, Andrew Marlow <issues@roundup-tracker.org> wrote:

>
> Andrew Marlow added the comment:
>
> On 20 March 2014 14:43, Bernhard Reiter <issues@roundup-tracker.org>
> wrote:
>
> >
> > Bernhard Reiter added the comment:
> >
> > On Thursday 20 March 2014 at 14:40:50, John Rouillard wrote:
> > > Is the problem you are referring to GPL V3's more restrictive license
> > > and viral nature.
> >
> > It is more like a vaccination effect, if you ask me. :)
> >
>
> It has been compared to taking a cutting. One takes a cutting consciensly,
> knowing what will happen. Whereas a virus is caught by accident.
>
> > Yes, I believe that there may be a problem and roundup or a solution
> build
> > on
> > roundup could be considered a derived work. So when it doubt, we should
> > consider alternative solutions.
> >
>
> Maye this was the view back in 2007.
>
> >
> > The subprocess would work on windows, but I don't think it is a
> particular
> > ice
> > technical solution. So before I recommend looking at the alternatives
> > first.
> >
>
> I think we might be able to go with what was suggested back in 2007, namely
> that the code could try to do the import and use it if successful. The
> documentation could mention that the ASCIInator will be used if present but
> that its absence is not harmful. Thus the ASCIInator could be installed on
> the same system as roundup and roundup may use it if present but it doesnt
> matter that the two pieces of software have different licences.
>
> >
> > ________________________________________________
> > Roundup tracker <issues@roundup-tracker.org>
> > <http://issues.roundup-tracker.org/issue2550799>
> > ________________________________________________
> >
>
> --
> Regards,
>
> Andrew Marlow
> http://www.andrewpetermarlow.co.uk
>
> ________________________________________________
> Roundup tracker <issues@roundup-tracker.org>
> <http://issues.roundup-tracker.org/issue2550799>
> ________________________________________________
>

-- 
Regards,

Andrew Marlow
http://www.andrewpetermarlow.co.uk
msg5150 Author: [hidden] (rouilj) Date: 2014-10-18 02:13
Another (sadly also GPL V3) choice is:

   https://github.com/aaronsw/html2text

which produces markdown from html (given that markdown is safer
than reStructured text it may be a better choice for the conversion).

Then convert to reStructured text (maybe pandoc
--from=markdown --to=rst --output=message.rst message.md
could work.)

In any case, when saved as a file the mime type could be
text/reStructured text and if the libraries are present,
the message could be converted to html.

If anybody decides to do this, make sure to secure
the conversion according to:

 http://docutils.sourceforge.net/docs/howto/security.html
History
Date User Action Args
2014-10-18 02:13:45rouiljsetmessages: + msg5150
2014-03-20 15:07:02marlowasetfiles: + unnamed
messages: + msg5046
2014-03-20 15:05:17marlowasetfiles: + unnamed
messages: + msg5045
2014-03-20 14:43:15bersetmessages: + msg5044
2014-03-20 13:40:50rouiljsetmessages: + msg5043
2014-03-20 10:54:55bersetmessages: + msg5042
2014-03-19 09:25:32marlowasetnosy: + marlowa
messages: + msg5041
2013-03-07 15:05:03bersetkeywords: + Effort-Medium
messages: + msg4819
2013-03-07 14:07:11rouiljsetmessages: + msg4818
2013-03-07 09:40:17bersetkeywords: - Effort-Medium
priority: normal -> high
messages: + msg4817
nosy: + ber
2013-03-07 04:39:37rouiljsettitle: rovide basic support for handling html only emails -> provide basic support for handling html only emails
2013-03-07 04:39:26rouiljcreate