-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] don't sniff the filename to determine the content type #4545
Conversation
hrm... is this the best solution? Or should we strip the query parameters from the url before passing it to serveContent? |
also tests pls |
Then we may end up serving, e.g., HTML files as |
hrm... Right. Because the file in the original issue has a .php extension, but its really an html file that is a snapshot of a server response... Thats annoying |
Maybe it would be best to detect and set the Content-Type header for the response before calling ServeContent, since the http package will re-use it if it's set (reference:1 & 2). However I'm not sure how important it is to cover more than what is already covered by the http package. Is it important for the response type to be accurate for things other than say HTML documents which are already covered by pkg http? And later, if a type is manually stored, should it be trusted or only used as a fallback if detection cannot determine what type the content is? |
The ideal solution would be to:
|
I agree with @djdv that we should detect the content type before hand so we have control over the process. For example, and for some types using the extension may be better then content sniffing (for example distinguishing between a plain text and html documents, a plain text file could start with something that looks like html). Unless I am missing something MIME types are not really used in unixfs right now. And even if there where how are they set? There has to be some auto-detecting going on somewhere. |
@kevina go's HTTP server is doing this. Unfortunately, we need to do it somewhere because web browsers expect it. |
@Stebalien yes I know that. My point was I agree with @djdv that we should be detecting the content type and set the Content-Type header before calling |
Ah. Sorry, I thought you were asking where we currently do the detection. Yes, I agree. We should be detecting on add (and should allow adding files directly from an http server so we can use the reported content types). |
I didn't say we should be detecting on
we do
|
Got it. I don't really see any reason to do that now if we're not going to actually do anything with it. Manually setting the content type before serving the file by using |
@Stebalien we were just looking at #5369 and that made us wonder what the status of this PR was? Was the decision made to close drop this fix? Should we close this PR? |
No, I still believe this is the correct solution (for now). I just got distracted. @djdv, @kevina when we actually add proper MIME-Type support, we can pre-set the content type in // read a chunk to decide between utf-8 text and binary
var buf [sniffLen]byte
n, _ := io.ReadFull(content, buf[:])
ctype = DetectContentType(buf[:n])
_, err := content.Seek(0, io.SeekStart) // rewind to output whole file
if err != nil {
Error(w, "seeker can't seek", StatusInternalServerError)
return
}
w.Header().Set("Content-Type", ctype) |
fixes #4543, may break other things IMO, this is the best way to do this (for now, until we start manually storing the content type along with the files). I'd rather not guess at all but we can at least avoid guessing by filename. License: MIT Signed-off-by: Steven Allen <[email protected]>
My prior implication is that Since the previous post, You're right that it's the same process, the only difference here would be in the breadth of supported types, seemingly nothing else. And with that, we likely have the ability to decide on proper icons/thumbnails for anything we'd need, but if we want to support anything outside of that range, we would have to do it ourselves. For context, mp4's were not detected when I made the previous post, among other common formats. |
@Stebalien my main concern is that by using only the context type it is easy to get the "text/plain" and "text/html" wrong. It is very easy to construct a text file that looks like HTML but is not really valid HTML. |
@@ -387,14 +383,14 @@ func (s *sizeSeeker) Seek(offset int64, whence int) (int64, error) { | |||
return s.sizeReadSeeker.Seek(offset, whence) | |||
} | |||
|
|||
func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, name string, modtime time.Time, content io.ReadSeeker) { | |||
func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, modtime time.Time, content io.ReadSeeker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this parameter but ignore it, to make it easier to change the logic later down the road, just add a comment that the name is ignored for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
Yeah... I agree. Unfortunately, I can't think of any other way to fix this. However, I've filed a new PR (#5564) to fix the new issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearing this from the review queue, remark was made here:
#4545 (comment)
Closing this as "wait for unixfs-2.0". |
fixes #4543, may break other things
IMO, this is the best way to do this (for now, until we start manually storing the content type along with the files). I'd rather not guess at all but we can at least avoid guessing by filename.