Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: org/re.po#123 links do not redirect to our togithub domain #17819

Conversation

MaronHatoum
Copy link
Contributor

@MaronHatoum MaronHatoum commented Sep 15, 2022

Changes

  • fix regex to identify if repositories contains . in GitHub pull requests/discussions/issues URLs.
  • reducing the URLs to <org>/<repo>#<number>

Check the release notes in this real repository example:
MaronHatoum/17284#11

Context

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please tick one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests but ran on a real repository, or
  • Both unit tests + ran on a real repository

@MaronHatoum MaronHatoum marked this pull request as ready for review October 3, 2022 06:25
@MaronHatoum MaronHatoum requested review from rarkins and viceice October 3, 2022 15:13
@rarkins
Copy link
Collaborator

rarkins commented Oct 6, 2022

I have a concern that this is making a large change to our link approach to solve only a small edge case. The impact of a mistake would be very high in terms of bad reputation, so we try to be very careful about changes to backlinks.

Before merging, I would like to ask this: if it causes any mass backlink spam by accident (e.g. a regression error in our backlink prevention) then do we have any way to detect this quickly without waiting for some outraged OSS maintainer to tell us?

@MaronHatoum
Copy link
Contributor Author

I have a concern that this is making a large change to our link approach to solve only a small edge case. The impact of a mistake would be very high in terms of bad reputation, so we try to be very careful about changes to backlinks.

Before merging, I would like to ask this: if it causes any mass backlink spam by accident (e.g. a regression error in our backlink prevention) then do we have any way to detect this quickly without waiting for some outraged OSS maintainer to tell us?

in this PR I tried to cover all the links we are using and convert them from github to togithub, I don't know if we have any way to identify if it causes any spam except to check if the links start with togithub.

@viceice
Copy link
Member

viceice commented Nov 15, 2022

@rarkins I don't see any a big risk to false backlinks generated by this pr

@@ -11,12 +11,19 @@ interface UrlMatch {
}

const urlRegex =
/(?:https?:)?(?:\/\/)?(?:www\.)?(?<!api\.)(?:to)?github\.com\/[-_a-z0-9]+\/[-_a-z0-9]+\/(?:discussions|issues|pull)\/[0-9]+(?:#[-_a-z0-9]+)?/i; // TODO #12872 (?<!re) after text not matching
/(?:https?:)?(?:\/\/)?(?:www\.)?(?<!api\.)(?:to)?github\.com\/[-_a-z0-9]+\/[a-zA-Z1-9\-_.]+\/(?:discussions|issues|pull)\/[0-9]+(?:#[-_a-z0-9]+)?/i; // TODO #12872 (?<!re) after text not matching
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viceice if we look at the changes here, this is the only change in the middle:
/[a-zA-Z1-9-_.]+/
/[-_a-z0-9]+/
you approved this so hopefully you can explain to me
is it correct to remove capital letters backward slashes and dots(which not skipped for some reason) here? is the URL always with small letters? or is it just so we won't catch any user names that have capital letters?
the intention isn't clear to me here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, there's insensitive flag at the end it means no need for big letters...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/(?:https?:)?(?:\/\/)?(?:www\.)?(?<!api\.)(?:to)?github\.com\/[-_a-z0-9]+\/[a-zA-Z1-9\-_.]+\/(?:discussions|issues|pull)\/[0-9]+(?:#[-_a-z0-9]+)?/i; // TODO #12872 (?<!re) after text not matching

const reduceUrlRegex =
/(?:https?:)?(?:\/\/)?(?:www\.)?github\.com\/(?<org>[a-zA-Z1-9\-_.]*)\/(?<repo>[a-zA-Z1-9\-_.]*)\/([a-zA-Z1-9\-_.]*)\/(?<number>[\d]+)/g;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viceice
can't we just use the same URL above and add groups to it, for reducing? why do we make here another regex

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont see any reason to keep this regex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try to integrate it. 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viceice maroon made this regex because of
(?<!api\.)
this is failing validation on regEx, so he made a new almost identical regex without it

what is this? (?<!api\.) ???

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt include api?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's look ahead regex

@PhilipAbed
Copy link
Collaborator

@rarkins do not merge this, im changing it.
adding an extra regex doesnt make sense at all

@viceice viceice marked this pull request as draft November 16, 2022 13:15
@PhilipAbed
Copy link
Collaborator

closing this as maroon is no longer working on it, opened #18944 instead

@PhilipAbed PhilipAbed closed this Nov 16, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

org/re.po#123 links do not redirect to our togithub domain
4 participants