Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: server-side routing, and SEO, for new classifier URLs #1997

Closed
eatyourgreens opened this issue Jan 22, 2021 · 43 comments
Closed

RFC: server-side routing, and SEO, for new classifier URLs #1997

eatyourgreens opened this issue Jan 22, 2021 · 43 comments
Labels
enhancement New feature or request

Comments

@eatyourgreens
Copy link
Contributor

eatyourgreens commented Jan 22, 2021

Package

app-project

Description

As part of Engaging Crowds, we're retiring the /classify catch-all URL for a project, which loads a classifier page for your currently chosen workflow. Every workflow has the same page URL.

We're replacing that with individual URLs for workflows, subject sets and subjects (the latter two only for projects that allow volunteers to select a subject set and subject to classify.) Each of the following will have its own Classify page, built and served by NextJS:

  • workflows /classify/workflow/:workflowID.
  • subject sets /classify/workflow/:workflowID/subject-set/:subjectSetID.
  • subjects classify/workflow/:workflowID/subject-set/:subjectSetID/subject/:subjectID

At the moment, pages are built at request time, with each page being built afresh on every incoming request for that page.

Consequences

SEO

Each of these URLs can be crawled, and indexed, by search engines. When a project home page is indexed, the Indexing Tool (if present) will be crawled and requests made for every subject set and subject listed. NextJS will then try to build pages for every subject set and subject. Do we want this to happen? What are the consequences of allowing/blocking search-engine indexing of the Classify pages?

Over time, pages will become stale: workflows will be deactivated, subject sets and subjects completed. Do we want page URLs to persist in search engine indexes when workflows, subject sets or subjects have been fully classified? Here, there might be an argument for persisting URLS, for workflows at least, so that they can be cited in papers.

Server responses

Related to the above, I've started setting up responses for workflows (but not for subject sets or subjects yet.) When a workflow's finished, and maybe replaced with a new workflow (new URL), I'm not completely sure what the appropriate response should be:

  • 200: the workflow page essentially lives for ever. Links from across the web continue to work. The Classify page for that workflow continues to be indexed by search engines and show up in search results. Any PageRank it has from incoming links is preserved.
  • 301 permanently moved (or 302 if it might come back one day): the workflow redirects to a new workflow, which has replaced it as the active workflow for the project. Links continue to work. Search engines crawl the active workflow URL, and replace the old workflow page with the new active workflow page in search results. PageRank is transferred from the old workflow to the active workflow.
  • 404 Not Found (or 410 Destroyed if it isn't coming back) Incoming links break. Search engines stop crawling the page and remove it from their indexes. Any PageRank the page had accumulated from incoming links etc. is gone. Our servers show a 404 error page, which should offer help to the visitor and direct them to where they should be going. We can get some hints, from the request URL, as to what they were looking for and direct them accordingly. Pinging @beckyrother here because we don't have any 404 pages designed yet. The default 404 page in NextJS is a blank white page that says '404: Not Found'.

301 redirects are best practice SEO when you move a page, so I've set up a redirect at the workflow level in #1965 eg. Galaxy Zoo retires an old workflow, and starts a new one. Links to the old workflow Classify page would now automatically point to the new one. I'm not sure if that's the correct approach. Maybe the old page should live on, at the old address? Note that the new page might start off with a lower search ranking, since all the incoming links on the web will point to the old workflow URL.

I think we are all agreed that /classify should 301 redirect to /classify/workflow/:workflowID for projects like PH-TESS and Galaxy Zoo, where there's only ever one active workflow.

Canonical URLs and page titles

Since we're minting individual page URLs for workflow, subject sets and subjects, each of those URLs should have a unique page title for SEO and bookmarking. We should probably publish canonical URLs, at the very least for projects with a single workflow, where /classify points to /classify/workflow/:activeWorkflowID.

At the moment, the project app is hardcoded to use a single, constant title for all project pages.

Performance

At the moment, we run a page build on every incoming request for a URL. We could improve performance, and lower costs by leveraging static optimisation and serving our HTML via a CDN cache.

Security

We should avoid building a page for any incoming request URL. If a malicious actor writes a script that generates a large number of fictitious workflow IDs, requesting /classify/workflow/:workflowID for each in turn, we don't want NextJS to start building those pages until it falls over. Is there a way to quickly validate the incoming request, via the API, and respond with 404 for obviously wrong or mistyped IDs?

@eatyourgreens eatyourgreens added the enhancement New feature or request label Jan 22, 2021
@srallen
Copy link
Contributor

srallen commented Jan 22, 2021

Ok, these are all useful things to consider, but let's step back and document why we're making some of these changes to begin with.

We've already proposed the new routes in ADR 18: https://github.com/zooniverse/front-end-monorepo/blob/master/docs/arch/adr-18.md

The primary reason is that the current PFE behavior has been a constant source of confusion volunteers, project owners, and internally in our team (I've had to explain the priority order of selection many, many times). Not only that, but the underlying code is difficult to maintain. What ADR 18 captures is that we want to use these routes for workflows, subject sets, and subjects, but what it doesn't capture and assumes is that the routes would still be 'automatically' selected resolved based on the same selection strategies.

However, in our discussion in #1132, I think we're generally moving toward not automatically selecting any workflow for a volunteer for most cases because that is still the source of the confusion. This means no more random selection, no auto redirects from one workflow to another. Combined with the project goals of Engaging Crowds, we've built a UI to present the volunteer with a choice.

Considering these two goals, to minimize confusion as well as to have better maintainable code, I think we should use a 404 for a resource that is not available and present the volunteer with the option to select a different workflow (or later on another project if no other workflows are available).

For the other HTTP response options, the 301 or 302 don't make sense to me because the page hasn't moved for that workflow resource when it becomes inactive and unavailable.

I think we are all agreed that /classify should 301 redirect to /classify/workflow/:workflowID for projects like PH-TESS and Galaxy Zoo, where there's only ever one active workflow.

I agree, though, @beckyrother may have some other thoughts about this having to do with consistency of UX when you arrive to the /classify page. If we do require to always prompt the user even in scenarios where there is only one active workflow, then we'll need to check in with the PH-TESS team about this before making the change.

For security, I'd recommend that all ids be validated against the serialized links array of active workflow ids on the project. I made a comment about this previously with regard to the default workflow: #1961 (comment). Second, we have to consider that certain authenticated user roles have permission to load workflows regardless of the workflow's active state: admins, project owners, project collaborators, testers, and maybe also experts(?). For these users, the route to a workflow regardless of its active state should load and not 404 as long as it actually exists.

I think what this means is that we have two stages of validation:

  • validate against the active workflow links array in the project resource's links
  • user authenticates, we get their roles
  • If the workflow is active and user is not authenticated, then load it
  • if the workflow is not active and is not authenticated, then 404 and prompt with other options
  • if the workflow is not active and user is authenticated, check user roles
  • if the user has a role that allow them to proceed and the workflow exists on Panoptes, load the workflow
  • if the user has a role that allows them to proceed and the workflow does not exist, then 404 and prompt with other options

@eatyourgreens
Copy link
Contributor Author

That's a great point. The PR merges this week removed default workflows from the project models in the classifier and the NextJS app. For projects with multiple workflows, if no workflow is specified, then the classifier will not select one for you. I should add that to the description. It's a big change, but not one we'd notice on staging because it doesn't affect PH-TESS.

Validating workflow IDs against a project's workflow links seems sensible. Once #1964 is merged, I can add 404 rules for workflows too. If I add them at the moment, I think the changes to the Classify page will clash with that PR. Or I could expand that PR to include workflows.

I'd be interested in hearing from project owners as to whether we should build Classify pages for inactive workflows. I can see big positives and big negatives to doing that. I think I'm kind of leaning towards positives, but mostly on the fence myself.

I forgot to say in the original description: the technical problem I've got at the moment is writing routing rules for the NextJS app. I'd like the outcome of this RFC to be a decision (in an ADR) as to what those routing rules should be.

NextJS has the concept of preview routes for pages that are in development/not yet public. I've been thinking those might be useful for routes that require authentication and workflows that are in development or testing mode.

@eatyourgreens
Copy link
Contributor Author

I'd recommend that all ids be validated against the serialized links array of active workflow ids

If we validate only against active workflows then we won't be able to publish pages for inactive workflows. Old workflows would vanish from the site, and any existing external links to the Classify page for those projects would break (if they had linked directly to the workflow by ID.) That's the biggest negative consequence of these changes, to my mind.

On projects like Galaxy Zoo, where there's a single active workflow that changes over time, we might be able to avoid link rot by making /classify the canonical URL for links to the Classify page.

@eatyourgreens
Copy link
Contributor Author

@shaunanoordin mentioned in passing that there might be implications for the URLs used by Zooniverse classrooms. I haven't mentioned those here because I'm not 100% sure how classrooms work.

@srallen
Copy link
Contributor

srallen commented Jan 22, 2021

I'd be interested in hearing from project owners as to whether we should build Classify pages for inactive workflows. I can see big positives and big negatives to doing that. I think I'm kind of leaning towards positives, but mostly on the fence myself.

I think it would be helpful to enumerate what the positives and negatives are. I'm leaning toward not building these pages. I can conceive of several reasons why this would be a negative and seems like a possible security or privacy issue to me. They could be workflows that were from the development phase of the project that really should not ever be activated or workflows that are intended to be private only for project experts.

NextJS has the concept of preview routes for pages that are in development/not yet public. I've been thinking those might be useful for routes that require authentication and workflows that are in development or testing mode.

This looks like a promising feature, we could note this as being a possible solution for these specific scenarios?

@shaunanoordin mentioned in passing that there might be implications for the URLs used by Zooniverse classrooms. I haven't mentioned those here because I'm not 100% sure how classrooms work.

The specific functionality depends on whether the educational program is Intro to Astro like or Wildcam like, but the heart of it is that these use cases really needed /classify/workflows/:workflow-id routes, so I think this is an improvement for classroom use. For the Wildcam type, a new cloned workflow is created for each assignment and the assignment is linked to the workflow. We want to make an enhancement to have the EduAPI send a request to delete the workflow after the assignment due date because Wildcam has 100s of now cruft workflows for past classroom sessions that will never get used again. So, a 404 is a good response for this case. Intro to Astro has a single assignment for multiple projects that don't change, so the workflow remains active and gets directly linked to.

@srallen
Copy link
Contributor

srallen commented Jan 22, 2021

We have an additional consequence of what does the default workflow and UPP stored workflow now mean if we're not going to automatically load these anymore in a preferred order. Perhaps we can indicate which workflow is default and which you last worked on in the selector modal UI? Default could render somehow as "Project suggested" and your UPP stored one as "Last worked on"?

@eatyourgreens
Copy link
Contributor Author

I think it would be helpful to enumerate what the positives and negatives are. I'm leaning toward not building these pages. I can conceive of several reasons why this would be a negative and seems like a possible security or privacy issue to me. They could be workflows that were from the development phase of the project that really should not ever be activated or workflows that are intended to be private only for project experts.

You're absolutely right. I've extended #1964 to add 404 pages for inactive workflows.

I was thinking of the case where a long-running project uses different workflows over the lifetime of a project. Workflow URLs allow us to preserve the workflow history, similar to how Galaxy Zoo 1, Galaxy Zoo 2 etc. are preserved, for research purposes. This would then allow papers to cite the exact workflow that was used to obtain results.

When a project like Galaxy Zoo takes down a workflow and replaces it with another, we do want to be careful about harming our search ranking. 404 will break all incoming links to the old workflow, which would lower the ranking of the Classify page. SEO best practice would be to redirect the old workflow ID to the new workflow ID, if possible. We can mitigate this by redirecting /classify so that it always points to the latest workflow for a project, but we have no control over external links that point to a specific workflow ID. I don't really have a good solution for that, hence asking for advice from project builders here.

@eatyourgreens
Copy link
Contributor Author

@chrislintott asked recently about projects that show a random workflow to each volunteer. It's possible to route /classify to a random workflow ID. The Classify link, in project navigation, can also be set to use a random URL for each session. I think we'd have to think very carefully about how to implement this, if it is needed.

@eatyourgreens
Copy link
Contributor Author

We have an additional consequence of what does the default workflow and UPP stored workflow now mean if we're not going to automatically load these anymore in a preferred order. Perhaps we can indicate which workflow is default and which you last worked on in the selector modal UI? Default could render somehow as "Project suggested" and your UPP stored one as "Last worked on"?

Wouldn't marking a workflow as 'Project suggested' bias classifiers towards choosing that workflow? I'm not sure about the consequences of that.

UPP definitely needs to be considered, but that's handled client-side and I'm trying to focus the discussion here towards how I should set up the server-side config for the NextJS app. I think the client-side auth, in the Next app, will have to be updated to load in UPP alongside the user. Then the UPP can be used to update the workflow menu in the browser.

At the moment, I have the workflow menu set to wait until the user has loaded, in order to prepare for this. That code could probably be updated to use Suspense in the browser.

@srallen
Copy link
Contributor

srallen commented Jan 26, 2021

@chrislintott asked recently about projects that show a random workflow to each volunteer. It's possible to route /classify to a random workflow ID. The Classify link, in project navigation, can also be set to use a random URL for each session. I think we'd have to think very carefully about how to implement this, if it is needed.

We discussed this in #1132 and hardly any projects use random workflow selection. Just as a reminder: #1132 (comment)

We only found three projects and two were likely work arounds for special circumstances. The one possible legitimate use case that exists, we have no idea why it may have been wanted as it was never documented. Because there is a single project out of the 100s, I think we should move forward with deprecation.

Wouldn't marking a workflow as 'Project suggested' bias classifiers towards choosing that workflow? I'm not sure about the consequences of that.

Default workflow selection already biases volunteers to work on it, just without their knowledge.

UPP definitely needs to be considered, but that's handled client-side and I'm trying to focus the discussion here towards how I should set up the server-side config for the NextJS app. I think the client-side auth, in the Next app, will have to be updated to load in UPP alongside the user. Then the UPP can be used to update the workflow menu in the browser.

This should be noted in whatever ADR comes out of this as consequences at the very least. I would recommend #1132 and this discussion merge into the same ADR. I don't think these can be separated discussions and decisions because they impact each other.

@srallen
Copy link
Contributor

srallen commented Jan 26, 2021

404 will break all incoming links to the old workflow, which would lower the ranking of the Classify page. SEO best practice would be to redirect the old workflow ID to the new workflow ID, if possible.

I think security and privacy concerns trump SEO concerns.

@camallen
Copy link
Contributor

camallen commented Jan 29, 2021

301 redirects are best practice SEO when you move a page, so I've set up a redirect at the workflow level in #1965 eg. Galaxy Zoo retires an old workflow, and starts a new one. Links to the old workflow Classify page would now automatically point to the new one. I'm not sure if that's the correct approach. Maybe the old page should live on, at the old address?

I'd be in favour of removing the complexity here and preserving all the existing workflow URLs, so a 200 response vs a redirection response.

I was thinking that any workflows that are finished and/or deactivated could have a different UI component that overlays the underlying classifier.

Perhaps a dismissible UI component that obfuscates or disables the interface till it's dismissed / interacted with. This finished / deactivated UI could also signal the user back to the workflow selection area to access the workflow that need more contributions.

I'll defer on the UX to those better able than myself.

The above would preserve the ability to still use old workflows for posterity but ensure we signal that this workflow is finished and doesn't need any more contributions in a clear way. This is something the PFE system has always had trouble with and confuses volunteers.

The primary reason is that the current PFE behavior has been a constant source of confusion volunteers, project owners, and internally in our team (I've had to explain the priority order of selection many, many times). Not only that, but the underlying code is difficult to maintain. What ADR 18 captures is that we want to use these routes for workflows, subject sets, and subjects, but what it doesn't capture and assumes is that the routes would still be 'automatically' selected resolved based on the same selection strategies.

Agree that automatic selection was often confusing and led to classifications being submitted where they weren't useful.

I agree that we should not do any automatic workflow selection. In my opinion we should be allow the user to make good judgements via UI signaling and providing the information to the user if a workflow is finished / deactivated.

I think we should use a 404 for a resource that is not available and present the volunteer with the option to select a different workflow (or later on another project if no other workflows are available).

What about when a user wants to access a historical workflow for a paper / demonstration / outreach session?

A hard 404 here would break this use case. I think a 404 would be reasonable if the old workflow was actually deleted from the API.

I vote strongly in favour of keeping the old workflows around and accessible on their existing URLs.

@srallen
Copy link
Contributor

srallen commented Jan 29, 2021

What about when a user wants to access a historical workflow for a paper / demonstration / outreach session?

I specifically mean using a 404 for a workflow that no longer exists as a resource on Panoptes or for workflows that the user does not have permission to load.

Finished workflows would 200 and would use a UI prompt to encourage them to work on something else. See #642

@eatyourgreens
Copy link
Contributor Author

Finished workflows on Galaxy Zoo (off the top of my head) are inactive, though. Right now, projects don't seem to distinguish between stuff that's experimental, in development etc. and workflows that used to be live but have been turned off because they've finished gathering classifications. I do like the idea of workflows having permanent URLs for posterity, but I'd defer to the wishes of project owners and builders here.

@srallen
Copy link
Contributor

srallen commented Jan 29, 2021

We don't allow most users to load inactive workflows with the exception of admins, owners, collaborators, testers, and experts I believe, therefore this is functionally a permissions feature and we are currently redirecting users if they do not have permission. In the new classifier, I'm proposing we inform users who do not have permission it's not available, which is what a 404 is and asking them what they want to do rather than redirect.

There are many projects that inactivate workflows not because they want to control who is able to load the workflow, but because we don't have a good UX solution for redirecting effort and people consistently do not read or understand the tiny 'finished' banners. The new prompt to ask users what they want to do will cover this case, so I predict we'll see less workflows set to be inactive and more that load and prompt asking what the user wants to do instead. Then inactive workflows will functionally behave like what I think they're really intended to be for, a way to control who has permission to load and view the workflow.

@eatyourgreens
Copy link
Contributor Author

I'm a big fan of 410 for stuff that's permanently deleted.

@eatyourgreens
Copy link
Contributor Author

418 is a good response too. https://http.cat/418

@beckyrother
Copy link

@chrislintott asked recently about projects that show a random workflow to each volunteer. It's possible to route /classify to a random workflow ID. The Classify link, in project navigation, can also be set to use a random URL for each session. I think we'd have to think very carefully about how to implement this, if it is needed.

Sorry to go back a bit but I'm extremely against showing people random workflows. Users need to know where they are on a site and how they got there – what if you go to a project, do some classifications on a project you really enjoy, come back the next day to do more, and it's a totally different task? That's frustrating and will result in negative feelings towards the site and likely a lost volunteer.
Since this is currently such a very rare occurrence, I don't think we should spend any time working on it.


In the new classifier, I'm proposing we inform users who do not have permission it's not available, which is what a 404 is and asking them what they want to do rather than redirect.

Agreed, we need to be sure we're giving volunteers a way to get back to the project's home page rather than finding an error and not having a way to recover. Here's a sort of ramble-y article about good 404 page UX that talks about the user flow after someone gets to a 404 page.

@eatyourgreens
Copy link
Contributor Author

#1964 shows the 404 page for inactive workflows and projects.

@eatyourgreens
Copy link
Contributor Author

Our servers show a 404 error page, which should offer help to the visitor and direct them to where they should be going. We can get some hints, from the request URL, as to what they were looking for and direct them accordingly. Pinging @beckyrother here because we don't have any 404 pages designed yet. The default 404 page in NextJS is a blank white page that says '404: Not Found'.

Working on a 404 page might be a good first issue too.

@beckyrother
Copy link

Oo good point, I'll work on one.

@camallen
Copy link
Contributor

camallen commented Jan 30, 2021

I specifically mean using a 404 for a workflow that no longer exists as a resource on Panoptes or for workflows that the user does not have permission to load.

Finished workflows would 200 and would use a UI prompt to encourage them to work on something else.

Excellent - this sounds great to me 👍. Agree on the 404 pages for resources the user doesn't have permission to access, this matches the API response as well.

@beckyrother
Copy link

404 page: https://projects.invisionapp.com/d/main#/console/12924056/443920170/preview

I'm working on gathering more blank images so there can be a random one each time

@eatyourgreens
Copy link
Contributor Author

Excellent. Is that also the 404 page for workflows and subject sets?

Speaking of which, subject sets and subjects don't have 404 responses at the moment. #1875 adds subject pages to a project, which is the next big piece of work for Engaging Crowds.

@beckyrother
Copy link

Is it possible to have different 404 pages for different circumstances?

@eatyourgreens
Copy link
Contributor Author

eatyourgreens commented Feb 3, 2021

Sure. I can double check how 404 responses work in Next 10, but Next 9 should be totally fine with using different Error components in different contexts.

I think the limiting factor, for us, is time and people available to build the pages.

@eatyourgreens
Copy link
Contributor Author

PH-TESS pages are listed in Google as nora-dot-eisner and 11235.
Google search listings for the Planet Hunters TESS Home and Classify pages. The page titles are identical so the URL is used to disambiguate them.

@eatyourgreens
Copy link
Contributor Author

I've opened #2173 to address page titles and main headings.

@eatyourgreens
Copy link
Contributor Author

eatyourgreens commented May 14, 2021

HMS NHS has been around long enough now that we're starting to see our first 404's for old workflows and subject sets: https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service/classify/workflow/16852/subject-set/82738

No big deal while the project is still in development, but it gives us our first practical example of what link rot might look like.

Maybe we do need to build a 404 page for each project, at something like https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service/404. The Next 404 page is a static page that's built for the entire app.

@eatyourgreens
Copy link
Contributor Author

Bumping this because Davy Notebooks is now at a stage where they are shutting down completed workflows, causing those pages to 404 (or worse, error when GoogleBot fetches them. See #2412.)

Here's an example of a broken URL for a completed notebook: https://www.zooniverse.org/projects/humphrydavy/davy-notebooks-project/classify/workflow/18244

A 404 page for workflows would work here, where we explain that the workflow has been finished/turned off for secret reasons/ eaten by a gru and then point the volunteer to active workflows that need work.

Pinging @snblickhan because we've been talking about this recently for Engaging Crowds projects.

@srallen
Copy link
Contributor

srallen commented Sep 15, 2021

There's been a design 404 page for some time. I would specifically recommend that there is a specific 404 type page for workflows where the workflow menu is shown so users can select from what is available (if none, then we would show the similar project recommendations, yet to be implemented).

I've recommended a 404 plus prompt asking them what in several past comments and it still continues to be my recommendation.

@eatyourgreens
Copy link
Contributor Author

Has anyone got thoughts about which project URLs should be indexed by Google and which should be marked as noindex?

At the moment, we’re getting emails from Google because there are high rates of 500 errors on URLs for workflows which had been indexed but are now complete.

@eatyourgreens
Copy link
Contributor Author

Forgot to add: I'm also wondering if subjects should be indexed by search engines. GoogleBot will go through every subject link in the subject picker, and index it, unless we tell it not too. That's a lot of URLs for a reasonable sized project.

@eatyourgreens
Copy link
Contributor Author

@srallen project-level and workflow-level 404 pages make sense to me too. NextJS only supports one, static 404 page per app, at pages/404 if I remember correctly, so we'd have to put some thought into how to publish those pages. Maybe dynamic pages where we override the response and set the status code to 404?

I haven't really thought about how we handle 'not found' errors at the subject set or subject level. I'm open to suggestions or ideas for that.

@srallen
Copy link
Contributor

srallen commented Sep 16, 2021

@eatyourgreens subject set or subjects not found perhaps could function similar to what I've done in #2418. The classifier is paused from loading and the selector modal is opened to prompt to select from what is available.

@eatyourgreens
Copy link
Contributor Author

@srallen Thanks! I'll take a look. We now have a finished project, from the Scarlets & Blues alpha, and URLs are kind of broken because the workflow is undefined, so I'm open to ideas as to how those might work.
https://www.zooniverse.org/projects/bogden/scarlets-and-blues/classify/
https://www.zooniverse.org/projects/bogden/scarlets-and-blues/classify/workflow/18504
https://www.zooniverse.org/projects/bogden/scarlets-and-blues/classify/workflow/18504/subject-set/96699

Re-using existing behaviour from Gravity Spy, rather than defining unique behaviour just for Engaging Crowds projects would definitely get my vote too.

@eatyourgreens
Copy link
Contributor Author

NextJS only supports one, static 404 page per app, at pages/404 if I remember correctly, so we'd have to put some thought into how to publish those pages. Maybe dynamic pages where we override the response and set the status code to 404?

I wonder if we could use getServerSideProps() instead of getStaticProps() at pages/404. In that case, we could pass the project and workflows props to the 404 page, then render page content accordingly.

@snblickhan
Copy link

The only point I really feel qualified to give here is that I think we shouldn't assume that project builders will leave workflows active (DNP is a great example). If this practice causes errors as in #2412 maybe that's a reason not to do it any longer. Otherwise, we need to communicate to project builders that leaving workflows set to 'Active' is best practice.

Would the benefit to indexing specific workflow URLs be for web archiving, for example? That's the only reason I could think of (but maybe these two things are unrelated -- I'm not an expert here). As a user I know I find it super annoying when I'm using Google to search for a project and I get a link that isn't just the project homepage.

@eatyourgreens
Copy link
Contributor Author

Deactivated workflows give you a Page Not Found error. Try this link, which fixes the bug.
https://frontend.preview.zooniverse.org/projects/humphrydavy/davy-notebooks-project/classify/workflow/18244

@eatyourgreens
Copy link
Contributor Author

That’s useful to know about Google search. We could block search engines from indexing all the Classify pages for a project. It seems, to me, like the Research pages should be searchable though.

@eatyourgreens
Copy link
Contributor Author

@snblickhan you’re right about indexing and archiving being related. Internet Archive honours the same protocols as GoogleBot. So it won’t harvest a URL that’s been blocked.

@srallen
Copy link
Contributor

srallen commented Sep 20, 2021

Becky provided a design for a prompt for when workflows unavailable: #2445

@srallen
Copy link
Contributor

srallen commented Sep 20, 2021

I'll be converting this to a discussion so there's one place to look for RFC discussions. Discrete tasks can be made into issues when they're identified and the decisions documented into the ADRs.

@zooniverse zooniverse locked and limited conversation to collaborators Sep 20, 2021
@srallen srallen closed this as completed Sep 20, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants