Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SourceId: Test and fix ambiguous serialization of Git references #11086

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 53 additions & 3 deletions src/cargo/core/source/source_id.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,26 @@ use std::ptr;
use std::sync::Mutex;
use url::Url;

// Encode a name or value for a query string
//
// https://url.spec.whatwg.org/#urlencoded-serializing
//
// The result of running percent-encode after encoding with encoding,
// tuple’s name, the application/x-www-form-urlencoded percent-encode
// set, and true
//
// We can't use percent_encoding directly because spaces are handled in
// a peculiar way.
fn write_form_urlencoded_component(
fmt: &mut Formatter<'_>,
component: &str,
) -> Result<(), fmt::Error> {
for s in url::form_urlencoded::byte_serialize(component.as_bytes()) {
fmt.write_str(s)?;
}
Ok(())
}

lazy_static::lazy_static! {
static ref SOURCE_ID_CACHE: Mutex<HashSet<&'static SourceIdInner>> = Default::default();
}
Expand Down Expand Up @@ -714,9 +734,18 @@ pub struct PrettyRef<'a> {
impl<'a> fmt::Display for PrettyRef<'a> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match *self.inner {
GitReference::Branch(ref b) => write!(f, "branch={}", b),
GitReference::Tag(ref s) => write!(f, "tag={}", s),
GitReference::Rev(ref s) => write!(f, "rev={}", s),
GitReference::Branch(ref b) => {
f.write_str("branch=")?;
write_form_urlencoded_component(f, b)
}
GitReference::Tag(ref s) => {
f.write_str("tag=")?;
write_form_urlencoded_component(f, s)
}
GitReference::Rev(ref s) => {
f.write_str("rev=")?;
write_form_urlencoded_component(f, s)
}
GitReference::DefaultBranch => unreachable!(),
}
}
Expand All @@ -742,4 +771,25 @@ mod tests {
let s3 = SourceId::new(foo, loc, None).unwrap();
assert_ne!(s1, s3);
}

#[test]
fn gitrefs_roundtrip() {
let base = "https://host/path".into_url().unwrap();
let branch = GitReference::Branch("*-._+20%30 Z/z#foo=bar&zap[]?to\\()'\"".to_string());
let s1 = SourceId::for_git(&base, branch).unwrap();
let ser1 = format!("{}", s1.as_url());
let s2 = SourceId::from_url(&ser1).expect("Failed to deserialize");
let ser2 = format!("{}", s2.as_url());
// Serializing twice should yield the same result
assert_eq!(ser1, ser2, "Serialized forms don't match");
// SourceId serializing the same should have the same semantics
// This used to not be the case (# was ambiguous)
assert_eq!(s1, s2, "SourceId doesn't round-trip");
// Freeze the format to match an x-www-form-urlencoded query string
// https://url.spec.whatwg.org/#application/x-www-form-urlencoded
assert_eq!(
ser1,
"git+https://host/path?branch=*-._%2B20%2530+Z%2Fz%23foo%3Dbar%26zap%5B%5D%3Fto%5C%28%29%27%22"
);
}
}
2 changes: 1 addition & 1 deletion tests/testsuite/git.rs
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ fn cargo_compile_git_dep_pull_request() {
.cargo("build")
.with_stderr(&format!(
"[UPDATING] git repository `{}`\n\
[COMPILING] dep1 v0.5.0 ({}?rev=refs/pull/330/head#[..])\n\
[COMPILING] dep1 v0.5.0 ({}?rev=refs%2Fpull%2F330%2Fhead#[..])\n\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unlucky that we display an encoded format 😢
If a branch name contains non-ASCII character it will be hard to read.

[COMPILING] foo v0.0.0 ([CWD])\n\
[FINISHED] dev [unoptimized + debuginfo] target(s) in [..]\n",
path2url(&git_root),
Expand Down