Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

No preview for some URLs #15142

Closed
JeanPaulLucien opened this issue Feb 23, 2023 · 6 comments
Closed

No preview for some URLs #15142

JeanPaulLucien opened this issue Feb 23, 2023 · 6 comments
Labels
A-URL-Preview Issues related to generating server-side previews of remote URLs O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@JeanPaulLucien
Copy link

JeanPaulLucien commented Feb 23, 2023

Description

Could you move this issue?
element-hq/element-web#24626

Steps to reproduce

Another URLs without preview:
https://twitter.com/dshevchenko_biz/status/943829211957092353

Homeserver

matrix.org

Synapse Version

matrix.org

Installation Method

I don't know

Database

matrix.org

Workers

I don't know

Platform

matrix.org

Configuration

No response

Relevant log output

...

Anything else that would be useful to know?

Matrix.org is homeserver for room and user, and me.

@clokep
Copy link
Member

clokep commented Feb 23, 2023

I don't understand what the issue is here? Is it a particular URL that isn't previewing? (If so, please provide the URL.) The linked issue mentions something about the Access-Control-Allow-Origin header, but I don't see how that relates here? Is it not being returned in one situation?

@clokep
Copy link
Member

clokep commented Feb 23, 2023

The linked issue mentions something about the Access-Control-Allow-Origin header, but I don't see how that relates here?

The log line was very long, it seems the error is:

Access to fetch at 'https://matrix-client.matrix.org/_matrix/media/r0/preview_url?url=https%3A%2F%2Fodysee.com%2F%40Neroke5%3A8&ts=1674980040000'
from origin 'vector://vector' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present
on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors'
to fetch the resource with CORS disabled.

Still it isn't clear -- does this always happen with this specific URL? If there are exactly 3 URLs?

@JeanPaulLucien
Copy link
Author

JeanPaulLucien commented Feb 23, 2023

In fact I can't check all links for preview. While it the 1st URL in my collection. No preview for this URL (https://odysee.com/@Neroke5:8) always in public room without ecryption. When I visit room with this URL at the first time I get error about CORS and later only: rageshake.ts:73 Failed to get URL preview: ConnectionError: fetch failed: Failed to fetch

@clokep
Copy link
Member

clokep commented Feb 24, 2023

Pulling the logs from that it seems to be:

2023-02-22 10:11:33,605 - synapse.http.server - 107 - INFO - GET-79d6f0304927d94f-HEL - <XForwardedForRequest at 0x7fef97109be0 method='GET' uri='/_matrix/media/r0/preview_url?url=https%3A%2F%2Fodysee.com%2F%40Neroke5%3A8&ts=1674980040000' clientproto='HTTP/1.1' site='19103'> SynapseError: 502 - Requested file's content type not allowed for this operation: text/plain; charset=utf-8
2023-02-22 10:11:33,607 - synapse.access.http.19103 - 460 - INFO - GET-79d6f0304927d94f-HEL - XXX.XXX.XXX.XXX - 19103 - {XXX} Processed request: 1.032sec/0.002sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 121B 502 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fodysee.com%2F%40Neroke5%3A8&ts=1674980040000 HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.11.23 Chrome/108.0.5359.215 Electron/22.2.0 Safari/537.36" [0 dbevts]

The first log line is interesting though -- it seems we're rejecting it due to the content type. This comes from:

if is_allowed_content_type and b"Content-Type" in resp_headers:
content_type = resp_headers[b"Content-Type"][0].decode("ascii")
if not is_allowed_content_type(content_type):
raise SynapseError(
HTTPStatus.BAD_GATEWAY,
(
"Requested file's content type not allowed for this operation: %s"
% content_type
),
)

We call that via:

length, headers, uri, code = await self.client.get_file(
url,
output_stream=output_stream,
max_size=self.max_spider_size,
headers={
b"Accept-Language": self.url_preview_accept_language,
# Use a custom user agent for the preview because some sites will only return
# Open Graph metadata to crawler user agents. Omit the Synapse version
# string to avoid leaking information.
b"User-Agent": [
"Synapse (bot; +https://github.com/matrix-org/synapse)"
],
},
is_allowed_content_type=_is_previewable,
)

This all hinges on the _is_previewable function:

def _is_media(content_type: str) -> bool:
return content_type.lower().startswith("image/")
def _is_html(content_type: str) -> bool:
content_type = content_type.lower()
return content_type.startswith("text/html") or content_type.startswith(
"application/xhtml"
)
def _is_json(content_type: str) -> bool:
return content_type.lower().startswith("application/json")
def _is_previewable(content_type: str) -> bool:
"""Returns True for content types for which we will perform URL preview and False
otherwise."""
return _is_html(content_type) or _is_media(content_type) or _is_json(content_type)

The tl;dr is that for some (?) requests to this URL we're getting back a content type of text/plain. I tried this locally though and got HTML back, so not sure what's going on there.


We should be passing back the proper headers with this error though, I think? But realistically it is still an error response.

@H-Shay H-Shay added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. labels Feb 27, 2023
@JeanPaulLucien
Copy link
Author

No preview for URLs that do not exist (404 error). It means when this issue will be resolved we can see a difference between good and others links. This is the practice, UX effect.

@clokep
Copy link
Member

clokep commented Mar 1, 2023

No preview for URLs that do not exist (404 error). It means when this issue will be resolved we can see a difference between good and others links. This is the practice, UX effect.

The spec for previews provides no way to give error information back like that, and I think clients would still just not show a preview.

I'm not sure there's any action for the Synapse team to take here. (We should fix the Access-Control-Allow-Origin header, but that's not the root problem.)

@clokep clokep closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2023
@MadLittleMods MadLittleMods added the A-URL-Preview Issues related to generating server-side previews of remote URLs label Apr 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-URL-Preview Issues related to generating server-side previews of remote URLs O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

4 participants