Roundup Tracker - Issues

Issue 2551096

classification
Markdown autolink
Type: behavior Severity: normal
Components: Web interface Versions: 2.0.0
process
Status: fixed fixed
:
: rouilj : ced, cmeerw, rouilj
Priority: normal : patch

Created on 2020-10-27 23:44 by ced, last changed 2020-10-30 03:17 by rouilj.

Files
File name Uploaded Description Edit Remove
roundup-markdown-url.patch ced, 2020-10-28 08:27
roundup-markdown-url.patch ced, 2020-10-28 09:36
Messages
msg7003 Author: [hidden] (ced) Date: 2020-10-27 23:44
The SimpleMDE editor has GFM activated by default [1] which means that it does autolinks [2]. The problem is that markdown2 does not. So user put raw links thinking they will be rendered as link (because of the SimpleMDE preview).
I could not find any option in SimpleMDE to disable autolinks. But I found that markdown2 can be configured to do it [3]. markdow does not have such extension but I guess it is doable to write one. For mistune, there is a url plugin [4].
I think we should activate by default this behavior (and maybe make it configurable).


[1] https://marked.js.org/using_advanced#options
[2] https://github.github.com/gfm/#autolinks
[3] https://github.com/trentm/python-markdown2/wiki/link-patterns#user-content-converting-links-into-links-automatically
[4] https://github.com/lepture/mistune/blob/master/mistune/plugins/extra.py#L15
msg7004 Author: [hidden] (rouilj) Date: 2020-10-28 00:25
Hi Cédric:

Including Christof.

In message <1603842250.81.0.747619754176.issue2551096@roundup.psfhosted.org>,
=?utf-8?q?C=C3=A9dric_Krier?= writes:
>
>The SimpleMDE editor has GFM activated by default [1] which means
>that it does autolinks [2]. The problem is that markdown2 does
>not. So user put raw links thinking they will be rendered as link
>(because of the SimpleMDE preview).

Rather than modfying the back end could we identify bare URL's in the
markdown text and surround them with < > making them first class links
on the fly?

I think we do something similar to hyperlink issue2345 and msg234 in
markdown right Christof? At least I thing there is something that
makes markdown play with local_replace.
msg7005 Author: [hidden] (rouilj) Date: 2020-10-28 03:14
Cedric, silly question how are you calling markdown in your template?

According to the code it should be hyperlinking bare url's:

      def markdown(self, hyperlink=1):
        """ Render the value of the property as markdown.

            This requires markdown2 or markdown to be installed
            separately.
        """
 
Are you setting hyperlink to 0?

Also probably need to update the list to include mistune as well but...
msg7006 Author: [hidden] (ced) Date: 2020-10-28 08:27
On 2020-10-28 03:14, John Rouillard wrote:
> Cedric, silly question how are you calling markdown in your template?

Just like the jinja2 template:

msg.content.markdown()

> According to the code it should be hyperlinking bare url's:
> 
>       def markdown(self, hyperlink=1):
>         """ Render the value of the property as markdown.
> 
>             This requires markdown2 or markdown to be installed
>             separately.
>         """
>  
> Are you setting hyperlink to 0?

No. But I checked the code and indeed the markdown method does not do
anything with the url or email matching.
Here is a patch that implement this behavior for markdown.
msg7007 Author: [hidden] (ced) Date: 2020-10-28 08:42
The patch does not work properly because the regexp should not match url that are already surrounded by '< >'.
msg7008 Author: [hidden] (ced) Date: 2020-10-28 09:36
Here is a better patch that does not break standard markdown URL syntax.
msg7009 Author: [hidden] (rouilj) Date: 2020-10-28 13:28
Hmm, do any of the engines allow us to set rel="nofollow" and or 
rel="nofollow noopener" on the links?

This is done in regular mesages to combat link spam.
msg7010 Author: [hidden] (ced) Date: 2020-10-28 13:58
On 2020-10-28 13:28, John Rouillard wrote:
> Hmm, do any of the engines allow us to set rel="nofollow" and or 
> rel="nofollow noopener" on the links?

markdown2 has https://github.com/trentm/python-markdown2/wiki/nofollow
markdown does not seem to have such option
mistune does not seem to have neither

Any way, I think this is out of the scope of this issue as it is already
possible to make roundup render a link in markdown using the proper
syntax.
msg7011 Author: [hidden] (rouilj) Date: 2020-10-29 01:27
Hi Cédric:

Is this patch supposed to create links for emails as well?

+        for group in ['url', 'email']:
+            if match.group(group):

If so the patch is missing the email test cases.

(I am still mystified what this code does and how it works.
Is it just taking some random re/match and looking for a named group?
Then it uses the presence of the group to determine what was matched?)

Also I had to modify the tests a bit. Markdown doesn't include a 
trailing \n on the rendered lines, so I am using m.rstrip('\n') and
removing the trailing \n. Also where you have self.assertIn I added:

 '<p><img alt="" src="http://example.com/" /></p>',

for markdown (apparently it sorts attributes by name).

Agreed on setting rel on the links. That should be a new ticket most
likely. Also looks like I need some doc added for markdown.
Explain settings, url rendering, differences from classic text
rendering (lack of rel=nofollow) etc.
msg7012 Author: [hidden] (ced) Date: 2020-10-29 08:21
On 2020-10-29 01:27, John Rouillard wrote:
> Is this patch supposed to create links for emails as well?

Yes.

> +        for group in ['url', 'email']:
> +            if match.group(group):
> 
> If so the patch is missing the email test cases.

I do not see the point to add a test that will test the exact same code.

> (I am still mystified what this code does and how it works.
> Is it just taking some random re/match and looking for a named group?

It does the same as for plain rendering but it takes care of links that
are already surrounded by <> or ().

> Then it uses the presence of the group to determine what was matched?)

Yes, the regexp defines some group.

> Also I had to modify the tests a bit. Markdown doesn't include a 
> trailing \n on the rendered lines, so I am using m.rstrip('\n') and
> removing the trailing \n. Also where you have self.assertIn I added:
> 
>  '<p><img alt="" src="http://example.com/" /></p>',
> 
> for markdown (apparently it sorts attributes by name).

Indeed this is a pain to have to support three markdown rendering.
I do not see why it is like that.
Also it would be better to test the markdown source instead of the
markdown rendering result because it is the source roundup is working
on.
msg7015 Author: [hidden] (rouilj) Date: 2020-10-29 22:08
Hi Cedric:

Thanks for confirming my thoughts on how the matching code works.

We support multiple dialects of markdown the same way we support
multiple templating languages. Different dialects have different
features. In theory it is possible to override the import function in 
interfaces.py so a site could use a custom configuration of one
of the markdown back ends with the features/plugins they need.

Yup testing the common subset is a pain because the language isn't 
specified in a way that maps input to output. Ideally every markdown 
implementation would produce the same html. But ...

Testing the output from the markdown process allows us to verify
that the processing pipeline is producing expected html for
presentation to the user. But it is a pain.

Regarding email, have you looked at what markdown2 emits? Mistune
puts out:

  &lt;a href="mailto:user@example.com"&gt;user@example.com&lt;/a&gt;

while markdown2 seems to have an ever changing set of entities:

  href="&#x6d;&#97;&#x69; ..."

and sometimes they throw in actual characters.

markdown uses decimal entities and is the same from run to run.

So yeah you have convinced me that validating the email matching
path is more trouble than it's worth.
msg7016 Author: [hidden] (rouilj) Date: 2020-10-30 03:17
Committed in rev 6280:6ed5152a92d0
History
Date User Action Args
2020-10-30 03:17:16rouiljsetstatus: open -> fixed
resolution: fixed
messages: + msg7016
2020-10-29 22:08:19rouiljsetpriority: normal
assignee: rouilj
status: new -> open
messages: + msg7015
2020-10-29 08:21:03cedsetmessages: + msg7012
2020-10-29 01:27:25rouiljsetmessages: + msg7011
2020-10-28 13:58:02cedsetmessages: + msg7010
2020-10-28 13:28:31rouiljsetmessages: + msg7009
2020-10-28 09:36:53cedsetfiles: + roundup-markdown-url.patch
messages: + msg7008
2020-10-28 08:42:11cedsetmessages: + msg7007
2020-10-28 08:27:03cedsetfiles: + roundup-markdown-url.patch
keywords: + patch
messages: + msg7006
2020-10-28 03:14:11rouiljsetmessages: + msg7005
2020-10-28 00:25:59rouiljsetnosy: + rouilj, cmeerw
messages: + msg7004
2020-10-27 23:44:10cedcreate