Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new source): add initial websocket source #17856

Closed
wants to merge 7 commits into from

Conversation

torrefatto
Copy link

@torrefatto torrefatto commented Jul 4, 2023

Hi, and thanks for this nice software!

At koyeb we are interested in using vector to allow our users to forward the logs of their applications towards external destinations. Our API for receiving logs is exposed as a websocket source. Our plan is to add support for a generic websocket source, and then, in a following PR, adding support for a custom koyeb source in order to make easier for our users to configure vector to use with our API, in the same spirit as the heroku_logs source. Do you think this is an acceptable plan?

I noticed that vector has an open issue (#6491) to track the addition of a websocket source, and some preliminary (although rough) work was done in this closed PR.
I did not start from there, as the fork from which it started was quite old. I tried instead to look at other existing sources and draw from there.
That said, rust is not my primary language and I'd really like some guidance, in order to improve the code currently submitted. I know that something missing are the tests (unit and integration). Can you point me to some code that you deem a good example for these? What else is missing?

Let me know your thoughts. And thanks again!

@torrefatto torrefatto requested a review from a team July 4, 2023 17:44
@torrefatto torrefatto requested a review from neuronull as a code owner July 4, 2023 17:44
@torrefatto torrefatto requested a review from a team July 4, 2023 17:44
@bits-bot
Copy link

bits-bot commented Jul 4, 2023

CLA assistant check
All committers have signed the CLA.

@netlify
Copy link

netlify bot commented Jul 4, 2023

Deploy Preview for vrl-playground canceled.

Name Link
🔨 Latest commit cfbd1cb
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/64a45a82b037cd0008f266f1

@netlify
Copy link

netlify bot commented Jul 4, 2023

Deploy Preview for vector-project canceled.

Name Link
🔨 Latest commit cfbd1cb
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/64a45a82bb34e80008db22be

@github-actions github-actions bot added domain: sources Anything related to the Vector's sources domain: sinks Anything related to the Vector's sinks labels Jul 4, 2023
@dsmith3197 dsmith3197 added the source: new A request for a new source label Jul 5, 2023
@neuronull
Copy link
Contributor

Hi @torrefatto, thanks for your proposed new integration!

Just an FYI, per https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md#new-sources-sinks-and-transforms , we will begin with proceduing through that checklist prior to reviewing the code. No need to write up answers to the checklist questions at this stage, I will inquire about anything in this PR's comment thread.

Our plan is to add support for a generic websocket source, and then, in a following PR, adding support for a custom koyeb source in order to make easier for our users to configure vector to use with our API, in the same spirit as the heroku_logs source. Do you think this is an acceptable plan?

Where possible, we like to avoid adding integrations that are specific to services (though obviously there have been cases of that historically). I would ask- what elements particularly in the source configuration for the koyeb source, would be unique? What extension would it provide over just using the websocket source?

Relatedly, we hopefully someday soon will have a plugin system where by some integrations could by maintained outside of the main Vector repo. That could be one potentiality for a future koyeb source.

That said, rust is not my primary language

Nice work!

I know that something missing are the tests (unit and integration). Can you point me to some code that you deem a good example for these? What else is missing?

I will help out with this post completion of the aforementioned checklist.

Regarding our checklist-

  • (Just considering the current PR for websocket source) Would you/your company be willing to commit to supporting this integration after it has been included?

We may reach out with follow-up queries in the coming days.

@neuronull neuronull self-assigned this Jul 5, 2023
@torrefatto
Copy link
Author

Hi @neuronull!

Just an FYI, per https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md#new-sources-sinks-and-transforms , we will begin with proceduing through that checklist prior to reviewing the code. No need to write up answers to the checklist questions at this stage, I will inquire about anything in this PR's comment thread.

Sorry! I totally overlooked that one.

Our plan is to add support for a generic websocket source, and then, in a following PR, adding support for a custom koyeb source in order to make easier for our users to configure vector to use with our API, in the same spirit as the heroku_logs source. Do you think this is an acceptable plan?

Where possible, we like to avoid adding integrations that are specific to services (though obviously there have been cases of that historically). I would ask- what elements particularly in the source configuration for the koyeb source, would be unique? What extension would it provide over just using the websocket source?

The protocol is the same (wss:). The fact is that we would like to conflate this with a functionality to retrieve the parameters needed to properly tail the logs, with this component. We index our log sources with an opaque id, and the API requires to specify such id, but we have another API to retrieve that id using an intelligible name. This would address the concern in the check-list you linked above

If the integration can be served with a workaround or more generic component, how painful is this for users?

We think it cloud be pretty painful. Of course, we would provide an adequate free-of-charge robot account to perform integration tests, if the second koyeb source gets accepted.

Relatedly, we hopefully someday soon will have a plugin system where by some integrations could by maintained outside of the main Vector repo. That could be one potentiality for a future koyeb source.

That would be awesome! Is there any roadmap, any plan in which the team commits to implementing such plugin system?

I know that something missing are the tests (unit and integration). Can you point me to some code that you deem a good example for these? What else is missing?

I will help out with this post completion of the aforementioned checklist.

Thanks! And again, sorry for missing it.

Regarding our checklist-

  • (Just considering the current PR for websocket source) Would you/your company be willing to commit to supporting this integration after it has been included?

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

We may reach out with follow-up queries in the coming days.

I am looking forward for them!

Thanks again!

@neuronull neuronull changed the title enhancement(websocket source): Websocket source feat(new source): add initial websocket source Jul 6, 2023
@neuronull
Copy link
Contributor

Sorry! I totally overlooked that one.

No worries! There is no realistic way to make sure everyone is aware of it.

The protocol is the same (wss:). The fact is that we would like to conflate this with a functionality to retrieve the parameters needed to properly tail the logs, with this component. We index our log sources with an opaque id, and the API requires to specify such id, but we have another API to retrieve that id using an intelligible name. This would address the concern in the check-list you linked above

If the integration can be served with a workaround or more generic component, how painful is this for users?

We think it cloud be pretty painful.

Just hypothesizing here to make sure I understand the pain- would the same thing be accomplishable by having some kind of shell script that queried that other API to retrieve the opaque ID , and generate the vector configuration for the websocket source with it (could use vector generate)? Just brainstorming on the alternatives. I will also raise this internally with the rest of the vector team.

Of course, we would provide an adequate free-of-charge robot account to perform integration tests, if the second koyeb source gets accepted.

Having a means to integration-test the prospective koyeb source would definitely be a requirement if we proceeded with it, so it's great that you're thinking of it already.

Relatedly, we hopefully someday soon will have a plugin system where by some integrations could by maintained outside of the main Vector repo. That could be one potentiality for a future koyeb source.

That would be awesome! Is there any roadmap, any plan in which the team commits to implementing such plugin system?

This has been an aspiration for a while now. It's not tracked on a public roadmap but it is definitely something we want to do, it's mostly a question of how soon. I can say we are not planning on it in the next 3 months.

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

Super! This helps.

@neuronull
Copy link
Contributor

Hi @torrefatto !

We had an internal discussion on this and wanted to lay out some more topics to consider:

  1. We are curious at a high level, about the general use case for Vector in a customer's pipeline with the Koyeb product, if you could describe that in further detail, it would be helpful. Do you have users (or is it just in anticipation) who want to forward data to other systems? It seems like there are a couple of limitations with the TailLogs API and Vector: Firstly, correct me if I'm wrong but it looks like this solution would only be able to pull live data from when it starts, and not any historic data. Would that be sufficient for your users? Secondly, it seems that in a single users pipeline, they would only be able to use a single vector instance to handle the entire volume of log events, is that something of concern?

  2. Regarding a koyeb source- our stance is that if we did not accept a koyeb source, that the websocket source's viability is a bit diminished (as there has not been much demand otherwise for a websocket source). This means the two scenarios would likely be- just websocket source (if there was confidence your users would be able to utilize it) , or both sources. For Vector it would be ideal to just have the websocket source, so we would just want to ensure there are definite reasons that Koyeb users would not be able to properly utilize it alone. Essentially we'd need to come to a conclusion on this point before proceeding.

Thanks!

@neuronull neuronull added the meta: awaiting author Pull requests that are awaiting their author. label Jul 18, 2023
@torrefatto
Copy link
Author

Hi @torrefatto !

Hi! Sorry for taking so long to reply. The heat wave hit hard 🥵

We had an internal discussion on this and wanted to lay out some more topics to
consider:

We are curious at a high level, about the general use case for Vector in a customer's
pipeline with the Koyeb product, if you could describe that in further detail, it
would be helpful. Do you have users (or is it just in anticipation) who want to
forward data to other systems?

We run a serverless platform: we basically let people run containers on our servers. We
provide a lightweight way for them to retrieve their logs via our control panel but we
do not have the capacity to index them on Koyeb nor provide a great user interface to
query them. That is why we want to allow users to forward their logs to more specialed
third-parties (e.g. Datadog, Splunk, Elasticsearch).
We had users request us the possibility to forward our logs to external systems! This is
a real business case for us.

It seems like there are a couple of limitations with the TailLogs API and Vector:
Firstly, correct me if I'm wrong but it looks like this solution would only be able to
pull live data from when it starts, and not any historic data. Would that be
sufficient for your users?

Our systems hold a backlog of the whole data and we expose in the API a start
parameter that allows the caller to specify the starting point in time of the tailing.
If no start value is specified, our systems reply with the last 1000 entries and then
begin streaming. This is one of the downsides of a pure websocket source for us: every
time vector restarts it pulls a possibly overlapping set of entries.

Secondly, it seems that in a single users pipeline, they would only be able to use a
single vector instance to handle the entire volume of log events, is that something of
concern?

This touches another part of why we would like to also include a koyeb source,
together with a pure websocket one. The workloads our users can deploy are enclosed in
single instances (Firecracker microVMs on our workers), we call it a deployment.
Different replicas of the same deployment form a service. Different services are
grouped into an app. Finally, a user might be part of different accounts. We would
like to allow the user to either:

  • Deploy one vector instance at any level of the hierarchy they want
  • Choose to forward a whole account logs with vector (either distributing the load
    somehow on more than one vector instance or with just one single instance, it's yet
    up to discussion)

This would really require something more elaborate than the websoket source, because
with this the burden would be on the user to retrieve the right identifier for each
deployment/service/app and they would not be dynamic.

Regarding a koyeb source- our stance is that if we did not accept a koyeb source, that
the websocket source's viability is a bit diminished (as there has not been much
demand otherwise for a websocket source). This means the two scenarios would likely
be- just websocket source (if there was confidence your users would be able to utilize
it) , or both sources. For Vector it would be ideal to just have the websocket source,
so we would just want to ensure there are definite reasons that Koyeb users would not
be able to properly utilize it alone. Essentially we'd need to come to a conclusion on
this point before proceeding.

I get your point, but I might also add that our API is really not much more than a
proxy for Loki, that we use internally. You might consider that this
websocket source, together with a koyeb source, would enable a loki source. We
might be willing to contribute to that as well.

Again, sorry for the late reply. Let me know what do you think of the picture I
outlined.

Thanks again!

@neuronull
Copy link
Contributor

Hey @torrefatto ! Thanks for providing all those details, that really helps us frame it, and also helps us understand better the value of a koyeb source.

I might also add that our API is really not much more than a
proxy for Loki, that we use internally. You might consider that this
websocket source, together with a koyeb source, would enable a loki source. We
might be willing to contribute to that as well.

This is an interesting development. We have had a solid demand for a loki source from the community (#6873) So is your API essentially using Loki behind the scenes?

Would that mean the koyeb source would essentially be a wrapper on top of a loki and websocket source?

In the meantime, I will share these new details with the team. Thanks!

@neuronull
Copy link
Contributor

One other thing to follow up on:

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

Curious if there was any traction on a commitment at the company-level, to maintain this/these sources?

@torrefatto
Copy link
Author

torrefatto commented Jul 27, 2023

Hi @neuronull!

Would that mean the koyeb source would essentially be a wrapper on top of a loki and websocket source?

Exactly!

One other thing to follow up on:

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have >> to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

Curious if there was any traction on a commitment at the company-level, to maintain this/these sources?

I talked with @bchatelard and he confirmed that @koyeb is willing to commit to maintain these sources, were them be accepted upstream 💪

@jszwedko jszwedko mentioned this pull request Aug 1, 2023
@neuronull
Copy link
Contributor

Hi @torrefatto , wanted to convey an update- we are still finalizing input from stakeholders but we're pretty confident that we would accept this new source, and the following ones. 🎉

I'll be taking a look at your code in this PR for some initial feedback.

@torrefatto
Copy link
Author

That's awesome @neuronull!

I see that I have a conflict. Would you like me to rebase or merge from master?

@jszwedko
Copy link
Member

jszwedko commented Aug 2, 2023

That's awesome @neuronull!

I see that I have a conflict. Would you like me to rebase or merge from master?

Merging is preferred to keep the commit history and make reviews easier (reviewers can just review new changes). When the PR merges it'll be squashed down to one commit.

@neuronull
Copy link
Contributor

In the same vein, avoiding force-pushing is greatly appreciated 🙏

Copy link
Contributor

@neuronull neuronull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extracting shared code with the sink! Overall this looks pretty good, I did a first pass and while it looks like a lot of suggestions I made, they aren't massive changes needed.

You mentioned in the comment unit tests, and yes those will be needed. You can probably do something reciprocal to the websocket sink. We have some assert_source_compliance test helpers you can grep for, you can run the source with this wrapper and it validates that the correct internal telemetry is emitted to adhere to the component specification.
Beyond that it's ideal to cover a happy path and as many error paths as reasonable.

I think this component is OK without an integration test.

@@ -30,6 +30,7 @@ pub mod unix;
mod unix_datagram;
#[cfg(all(unix, feature = "sources-utils-net-unix"))]
mod unix_stream;
pub mod websocket;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏We'll want to gate this on a new feature defined in the Cargo.toml , sources-websocket.

@@ -17,3 +17,8 @@ pub(crate) mod sqs;

#[cfg(any(feature = "sources-aws_s3", feature = "sinks-aws_s3"))]
pub(crate) mod s3;

pub mod websocket;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub mod websocket;
#[cfg(any(feature = "sources-websocket", feature = "sinks-websocket"))]
pub mod websocket;

pub mod websocket;

pub(crate) mod backoff;
pub(crate) mod ping;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏This one should arguably collapsed into the websocket module here. I understand the reasoning to make it available since it's generic enough, but it's easy enough to do if we need that later but otherwise it's just lost compilation time for any config not utilizing a websocket component.

let maybe_tls = self.tls_connect().await?;

let ws_config = WebSocketConfig {
max_send_queue: None, // don't buffer messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏If I recall, this will be the merge conflict to resolve

@@ -89,3 +89,24 @@ impl InternalEvent for WsConnectionError {
Some("WsConnectionError")
}
}

pub struct WsMessageReceived {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏This should adhere more closely to the component spec (https://github.com/vectordotdev/vector/blob/master/docs/specs/component.md)

, the properties of count and byte_size are missing.

Additionally, the source should emit https://github.com/vectordotdev/vector/blob/master/docs/specs/component.md#componentbytesreceived
, which the protocol should be websocket.

That might make sense to have as a separate event, because (this is another thing and isn't directly related to your changes but) I think the websocket sink might not be emitting the EventsReceived , in which case it could use this WsEventReceived. But the sink is using the run_and_assert_compliance_ unit test helper that should be validating that 🤔 hmm so that might not be valid.

See the HttpEventsReceived and HttpBytesReceived, for reference.

Comment on lines +164 to +167
}).map_err(|err| {error!("Failed to process binary message: {}", err);}) {
Ok(_) => Ok(()),
Err(e) => {
error!("Failed to send binary message: {:?}", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏I think these error cases should have a internal event structure that is emitted, and increments component_errors_total (https://github.com/vectordotdev/vector/blob/master/docs/specs/instrumentation.md#Error)

Comment on lines +160 to +169
}).map(|evt| async {
handle_text_message(&mut out.clone(), evt, config.uri.clone()).await
}).ok_or(())?.await;
Ok::<(), ()>(())
}).map_err(|err| {error!("Failed to process binary message: {}", err);}) {
Ok(_) => Ok(()),
Err(e) => {
error!("Failed to send binary message: {:?}", e);
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥜 nitpick: ‏this is a bit ugly from a readability perspective. Extracting the Message::Binary case to a helper function might help a little bit, but I can't help but wonder if there is a little cleaner way to write this.

Comment on lines +178 to +181
Ok(Message::Frame(_)) => {
warn!("Unsupported message type received: frame");
Ok(())
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what scenario(s) this might occur in?

},

Ok(Message::Close(_)) => {
info!("Received message: connection closed from server");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏I think since we are emitting the WsConnectionShutdown below which logs a warn , I think we should remove this one. WDYT?

async fn handle_text_message<'a>(
out: &mut SourceSender,
msg: WebSocketEvent<'a>,
endpoint: String,
Copy link
Contributor

@neuronull neuronull Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏I think the endpoint can be passed in as a reference to avoid the clone, and the WsMessageReceived can specify the lifetime like

pub uri: &'a str,

@torrefatto
Copy link
Author

Hi! I just wanted to say that I am working on the PR. Thanks for the useful comments, @neuronull!

@jszwedko don't worry, I won't force-push :pinky-promise:

@torrefatto
Copy link
Author

Hi team! Sorry for being silent for so long. The summer took quite some time away 🌴 🍹
I resumed working on this, I expect to push new commits soon!

@OmarTraderXBT
Copy link

very excited about the generic websocket source which would be amazing to have (esp if it handles reconnecting and such properly) <3

@yalinglee
Copy link
Contributor

Hi team! Sorry for being silent for so long. The summer took quite some time away 🌴 🍹 I resumed working on this, I expect to push new commits soon!

Hi @torrefatto! Wondering if you are still planning to work on this? We would also love to see this source added!

@torrefatto
Copy link
Author

Hi @yalinglee

Apologies for the long silence and thanks for reanimating this conversation.

I am still willing to work on this, but I am unfortunately not able to do so during working hours anymore (priorities changed at $DAYJOB).

I have to fit this in my scarce free time. The first thing that I need to do is to update this PR with the upstream changed that have happened so far.
Then I can return applying the recommendations here :)

I will try to update you by the end of next week.

@yalinglee
Copy link
Contributor

@torrefatto That's totally understandable! I was just curious about the status of this PR so no pressure! And really appreciate you using your precious free time to work on this!

@jszwedko jszwedko requested a review from a team as a code owner October 3, 2024 18:54
@pront
Copy link
Member

pront commented Jan 27, 2025

Thank you for your contribution to Vector! To keep the repository tidy and focused, we are closing this PR due to inactivity. We greatly appreciate the time and effort you've put into this PR.If you'd like to continue working on it, we encourage you to re-open the PR and we would be delighted to review it again. Before re-opening, please use git merge origin master to resolve any conflicts with origin/master.

@pront pront closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources meta: awaiting author Pull requests that are awaiting their author. source: new A request for a new source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants