Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-cloud documentation #18872

Merged
merged 4 commits into from
Nov 4, 2022
Merged

Multi-cloud documentation #18872

merged 4 commits into from
Nov 4, 2022

Conversation

sophia-wiley
Copy link
Contributor

Users are able to:

  • Choose a default data residency (settings).
  • Choose data residency when creating a new connection.
  • Choose data residency in connection settings.
  • See IP addresses for EU, US, and Airbyte Default.

The documentation in this PR includes:

  • Set up a connection (Getting Started with Airbyte Cloud)
  • Allowlist IP addresses (Getting Started with Airbyte Cloud)
  • Choose your default data residency (Managing Airbyte Cloud)
  • Choose data residency for a connection (Managing Airbyte Cloud)

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Nov 2, 2022
@andyjih
Copy link
Contributor

andyjih commented Nov 2, 2022

@sophia-wiley One thought here: I think it might be useful to discuss the split between the Control Plane and the Data Plane. Currently the Data Residency option only changes where the data is processed e.g. the Data Plane, so all other data will still live in the Control Plane. Here's a Whimsical that lays out roughly what the split looks like. We can ask @Upmitt to make a prettier version of the visuals as well.

@andyjih
Copy link
Contributor

andyjih commented Nov 2, 2022

I also wonder if perhaps the Architecture Overview page could use an update.

@sophia-wiley
Copy link
Contributor Author

@andyjih Sounds good! I will add the info about the split between the control plane and the data plane to the Architecture Overview Page.

Copy link
Contributor

@malikdiarra malikdiarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Andy that the architecture diagram could use an update but I think we should address that separately.

| Parameter | Description |
|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Replication frequency | How often should the data sync? |
| Data residency | Where is the data processing location for this connection? To choose the preferred data processing location for all of your connections, set your default [data residency](https://docs.airbyte.com/cloud/managing-airbyte-cloud#choose-your-default-data-residency). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a bit of a nit but I find the wording here a bit odd: "Where is the data processing location for this connection". I'm not sure I would understand what this means if I were a customer (I haven't checked our documentation for a while, so my opinion might be off.) Every other entry uses the word should which really conveys that those options are directive for Airbyte Cloud.

I would go with something like this instead: "Where should Airbyte Cloud data synchronization workflow run"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this sounds weird.

Maybe something even simpler - "Where do I want Airbyte to process my data?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points! I will change the wording here

@malikdiarra
Copy link
Contributor

@andyjih Sounds good! I will add the info about the split between the control plane and the data plane to the Architecture Overview Page.

I do not think this is the best page to update, the concept of data residency is important mainly for Airbyte Cloud. While nothing is blocking someone from using it in OSS, there is additional documentation we probably need to provide to fully support that. I would avoid making that diagram more complicated just for that.

Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

One minor comment!

- 34.106.60.246
Depending on your data residency location, you may need to allowlist the following IP addresses to enable access to Airbyte:

### United States and Airbyte Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be explicit about saying that the US/Default IP addresses are GCP us-west-2
and the EU ip addresses are are both GCP us-west-2 and AWS eu-west-3 (Paris).

wdyt @davinchia and @malikdiarra?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. I think it only helps us to be more specific with the info.

### Choose your default data residency
Default data residency allows you to choose where your data is processed. When you set the default data residency, it applies that data residency to all new connections, but it will not affect existing connections.

For individual connections, you can choose a data residency that is different from the default. You can do this in the [connection settings](#choose-the-data-residency-for-a-connection) or when you create a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection). Your data will not leave the chosen data residency for your connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an area where we might want to talk about the control plane vs. the data plane. Because this setting doesn't necessarily mean where your data is stored, but it's about where your data is processed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point - we don't highlight the difference between the control/data plane today i.e. while data is processed in the specific region, configuration information is still stored in the US.

this means that users need to be careful to make sure they don't choose a sensitive cursor.

Andy, do you think this is worth calling out? Or is it too 'scary' to do so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth mentioning!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the fact that data is processed in one region and stored in another needs to be mentioned. I'm wondering if I should delete the last line "Your data will not leave the chosen data residency for your connection" and add a note like "Your data is processed in the chosen residency, but some data is still stored in the US." We could also list out types of data (like configuration information) we store in the US if that would help the users.

Any thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should delete the last line "Your data will not leave the chosen data residency for your connection" and add a note like "Your data is processed in the chosen residency, but some data is still stored in the US."

I like that. I would like the note to be a markdown note so it jumps out.

One tactical point, we don't have to wordsmith too much in this PR. We can merge this in for now and iterate on subsequent PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyjih @davinchia Do you think that a note like this would have enough info about data plane and control plane? Is there any other info you think the users will need to know? (I'll edit wording after we know what info we want to include):

Your data is processed on a data plane in the chosen residency, but some data, like data associated with the sync mode, cursor, and primary key, is still stored in the US on our control plane.

Copy link
Contributor

@davinchia davinchia Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty good. I would also add: "Because of this, please do not configure uses columns whose data must stay within chosen residency as cursor and primary keys configuration."

@andyjih
Copy link
Contributor

andyjih commented Nov 4, 2022

@andyjih Sounds good! I will add the info about the split between the control plane and the data plane to the Architecture Overview Page.

I do not think this is the best page to update, the concept of data residency is important mainly for Airbyte Cloud. While nothing is blocking someone from using it in OSS, there is additional documentation we probably need to provide to fully support that. I would avoid making that diagram more complicated just for that.

Do you think we should talk about the control plane/data plane split on this multi-cloud doc? I agree that it's Cloud-specific, but I do think it's important for people to know that some data is still stored in the control plane and where the data is actually processed is in the data plane.

Copy link
Contributor

@josephkmh josephkmh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Made a couple minor comments.


2. Click the **Settings** tab.

3. Click the **Data residency** dropdown and choose the location for your default data residency.
Copy link
Contributor

@josephkmh josephkmh Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note here about whether changing a connection's data residency will affect currently running syncs? @timroes mentioned this when reviewing the UI PR. I don't think currently running syncs will be affected, but I'm not 100% sure. Maybe @davinchia has an idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me Joey - yeah updating the workspace doesn't update existing connections so we should call that out too.

@sophia-wiley one more follow up change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so just to make sure I understand this correctly -- If you choose a different data residency in the connection settings while a sync is running, it will not affect the currently running sync, but it will apply the new data residency to future syncs for that connection?

Copy link
Contributor

@davinchia davinchia Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Data residency changes on a connection will only affect the next sync, and not any currently running sync.
  • Changing the workspace default residency only affects newly created connections. Already created connections have to be manually edited.

Does ^ make the behaviour clearer?


:::note

You can also choose data residency when creating a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection), or you can set the [default data residency](#choose-your-default-data-residency) for all of your connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also choose data residency when creating a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection), or you can set the [default data residency](#choose-your-default-data-residency) for all of your connections.
You can also choose data residency when creating a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection), or you can set the [default data residency](#choose-your-default-data-residency) for your workspace.

Just a suggestion: this could be interpreted such that changing your default data residency affects all of your connections, which it does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll edit this in another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants