-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-cloud documentation #18872
Multi-cloud documentation #18872
Conversation
Co-authored-by: Joey Marshment-Howell <[email protected]>
@sophia-wiley One thought here: I think it might be useful to discuss the split between the Control Plane and the Data Plane. Currently the Data Residency option only changes where the data is processed e.g. the Data Plane, so all other data will still live in the Control Plane. Here's a Whimsical that lays out roughly what the split looks like. We can ask @Upmitt to make a prettier version of the visuals as well. |
I also wonder if perhaps the Architecture Overview page could use an update. |
@andyjih Sounds good! I will add the info about the split between the control plane and the data plane to the Architecture Overview Page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Andy that the architecture diagram could use an update but I think we should address that separately.
| Parameter | Description | | ||
|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| Replication frequency | How often should the data sync? | | ||
| Data residency | Where is the data processing location for this connection? To choose the preferred data processing location for all of your connections, set your default [data residency](https://docs.airbyte.com/cloud/managing-airbyte-cloud#choose-your-default-data-residency). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a bit of a nit but I find the wording here a bit odd: "Where is the data processing location for this connection". I'm not sure I would understand what this means if I were a customer (I haven't checked our documentation for a while, so my opinion might be off.) Every other entry uses the word should
which really conveys that those options are directive for Airbyte Cloud.
I would go with something like this instead: "Where should Airbyte Cloud data synchronization workflow run"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this sounds weird.
Maybe something even simpler - "Where do I want Airbyte to process my data?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points! I will change the wording here
I do not think this is the best page to update, the concept of data residency is important mainly for Airbyte Cloud. While nothing is blocking someone from using it in OSS, there is additional documentation we probably need to provide to fully support that. I would avoid making that diagram more complicated just for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment!
- 34.106.60.246 | ||
Depending on your data residency location, you may need to allowlist the following IP addresses to enable access to Airbyte: | ||
|
||
### United States and Airbyte Default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should be explicit about saying that the US/Default IP addresses are GCP us-west-2
and the EU ip addresses are are both GCP us-west-2 and AWS eu-west-3 (Paris).
wdyt @davinchia and @malikdiarra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. I think it only helps us to be more specific with the info.
docs/cloud/managing-airbyte-cloud.md
Outdated
### Choose your default data residency | ||
Default data residency allows you to choose where your data is processed. When you set the default data residency, it applies that data residency to all new connections, but it will not affect existing connections. | ||
|
||
For individual connections, you can choose a data residency that is different from the default. You can do this in the [connection settings](#choose-the-data-residency-for-a-connection) or when you create a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection). Your data will not leave the chosen data residency for your connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an area where we might want to talk about the control plane vs. the data plane. Because this setting doesn't necessarily mean where your data is stored, but it's about where your data is processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point - we don't highlight the difference between the control/data plane today i.e. while data is processed in the specific region, configuration information is still stored in the US.
this means that users need to be careful to make sure they don't choose a sensitive cursor.
Andy, do you think this is worth calling out? Or is it too 'scary' to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth mentioning!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the fact that data is processed in one region and stored in another needs to be mentioned. I'm wondering if I should delete the last line "Your data will not leave the chosen data residency for your connection" and add a note like "Your data is processed in the chosen residency, but some data is still stored in the US." We could also list out types of data (like configuration information) we store in the US if that would help the users.
Any thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should delete the last line "Your data will not leave the chosen data residency for your connection" and add a note like "Your data is processed in the chosen residency, but some data is still stored in the US."
I like that. I would like the note to be a markdown note so it jumps out.
One tactical point, we don't have to wordsmith too much in this PR. We can merge this in for now and iterate on subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyjih @davinchia Do you think that a note like this would have enough info about data plane and control plane? Is there any other info you think the users will need to know? (I'll edit wording after we know what info we want to include):
Your data is processed on a data plane in the chosen residency, but some data, like data associated with the sync mode, cursor, and primary key, is still stored in the US on our control plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty good. I would also add: "Because of this, please do not configure uses columns whose data must stay within chosen residency as cursor and primary keys configuration."
Do you think we should talk about the control plane/data plane split on this multi-cloud doc? I agree that it's Cloud-specific, but I do think it's important for people to know that some data is still stored in the control plane and where the data is actually processed is in the data plane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Made a couple minor comments.
|
||
2. Click the **Settings** tab. | ||
|
||
3. Click the **Data residency** dropdown and choose the location for your default data residency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a note here about whether changing a connection's data residency will affect currently running syncs? @timroes mentioned this when reviewing the UI PR. I don't think currently running syncs will be affected, but I'm not 100% sure. Maybe @davinchia has an idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding me Joey - yeah updating the workspace doesn't update existing connections so we should call that out too.
@sophia-wiley one more follow up change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so just to make sure I understand this correctly -- If you choose a different data residency in the connection settings while a sync is running, it will not affect the currently running sync, but it will apply the new data residency to future syncs for that connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Data residency changes on a connection will only affect the next sync, and not any currently running sync.
- Changing the workspace default residency only affects newly created connections. Already created connections have to be manually edited.
Does ^ make the behaviour clearer?
|
||
:::note | ||
|
||
You can also choose data residency when creating a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection), or you can set the [default data residency](#choose-your-default-data-residency) for all of your connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also choose data residency when creating a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection), or you can set the [default data residency](#choose-your-default-data-residency) for all of your connections. | |
You can also choose data residency when creating a [new connection](https://docs.airbyte.com/cloud/getting-started-with-airbyte-cloud#set-up-a-connection), or you can set the [default data residency](#choose-your-default-data-residency) for your workspace. |
Just a suggestion: this could be interpreted such that changing your default data residency affects all of your connections, which it does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll edit this in another PR
Users are able to:
The documentation in this PR includes: