Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong sitemap for keep_path #1008

Closed
fkastner opened this issue Apr 17, 2023 · 3 comments
Closed

wrong sitemap for keep_path #1008

fkastner opened this issue Apr 17, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@fkastner
Copy link
Contributor

Hi,
I just noticed that the sitemap is wrong for paths wich are in keep_path.
To reproduce:

julia> using Franklin

julia> newsite(".");

shell> vim config.md # add keep_path = ["test.html"]

shell> touch test.html

julia> serve(single=true);
                      
shell> tail __site/sitemap.xml
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
</url>
<url>
    <loc>https://tlienart.github.io/FranklinTemplates.jl/test/</loc>
    <lastmod>2023-04-17</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
</url>
</urlset>

I expected https://tlienart.github.io/FranklinTemplates.jl/test.html as location in the sitemap.
I tried some minutes to find where this goes wrong but didn't get very far.

@tlienart tlienart added the bug Something isn't working label Apr 18, 2023
@tlienart
Copy link
Owner

tlienart commented Apr 18, 2023

Thanks for the report,

Not yet a fix but some notes on this from a quick look:

add_sitemap_item here:

function add_sitemap_item(; html=false)
loc = url_curpage()
locvar(:sitemap_exclude)::Bool && return nothing
if !html
lastmod = locvar(:fd_mtime_raw)
changefreq = locvar(:sitemap_changefreq)
priority = locvar(:sitemap_priority)
else
# use default which can be overwritten in a {{sitemap_opts ...}}
fp = joinpath(path(:folder), locvar(:fd_rpath)::String)
lastmod = Date(unix2datetime(stat(fp).mtime))
changefreq = "monthly"
priority = 0.5
end
res = SITEMAP_DICT[loc] = SMOpts(lastmod, changefreq, priority)
return res
end

is the function that adds sitemap entries. They are effectively written to the XML file in this loop:

for (k, v) in SITEMAP_DICT
key = joinpath(escapeuri.(split(k, '/'))...)
loc = "<loc>$(joinpath(base_url, key))</loc>"
lastmod = "<lastmod>$(v.lastmod)</lastmod>"
changefreq = "<changefreq>$(v.changefreq)</changefreq>"
priority = "<priority>$(v.priority)</priority>"
write(io, """
<url>
$loc
$lastmod
$changefreq
$priority
</url>
""")
end
where the loc part is

joinpath(base_url, key)

where the base_url is the website landing page (https://tlienart.github.io/FranklinTemplates.jl/ in the example you show, it takes prepath into account) and key is

key = joinpath(escapeuri.(split(k, '/'))...)

So we need to investigate url_curpage() which is defined here:

function url_curpage()
# get the relative path to current page and split extension (.md)
fn, ext = splitext(locvar(:fd_rpath))
if ext != ".html"
# if it's not `index` then add `index`:
if splitdir(fn)[2] != "index"
fn = joinpath(fn, "index")
end
fn *= ".html"
end
# unixify
fn = unixify(fn)
# if it does not start with "/", add a "/" in front
startswith(fn, "/") || (fn = "/" * fn)
return fn
end

adding the following at the end of that function

 if contains(fn, "test")
   @show locvar(:fd_rpath)
   @show fn
end

and going through OP's example:

→ Initial full pass...
locvar(:fd_rpath) = "test.html"
fn = "/test/"

So that's where the culprit is.

Fix

The fd_rpath is correctly recovered in url_curpage, what is not correctly done is that keep_path is not checked. What should be done is:

  • keep = globvar(:keep_path)::Vector{String}
  • check if locvar(:fd_rpath) is in keep; if so return as is
  • if not leave stuff unchanged.
rpath   = locvar(:fd_rpath)
keep    = globvar(:keep_path)::Vector{String}
rpath in keep && return rpath

fn, ext = splitext(rpath)

and, sure enough,

shell> tail __site/sitemap.xml
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
</url>
<url>
    <loc>https://tlienart.github.io/FranklinTemplates.jl/test.html</loc>
    <lastmod>2023-04-18</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
</url>
</urlset>

PR incoming.

other notes: am a bit nervous about the use of joinpath here, it might generally be ok but I don't think it's guaranteed to do the right thing on windows. might be better to use URIs.

@tlienart
Copy link
Owner

closed by #1009

@fkastner
Copy link
Contributor Author

Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants