Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init elasticsearch-logger.lua #7099

Closed
wants to merge 0 commits into from
Closed

init elasticsearch-logger.lua #7099

wants to merge 0 commits into from

Conversation

fanchangjifen
Copy link

Description

I followed the clickhouse-Logger implementation logic and the data store was partially adapted for ELasticsearch;By default, index names are divided by date (yyyy.MM.dd),Dbas can further optimize log storage by combining index templates.

@fanchangjifen
Copy link
Author

finally,i use elasticsearch Restful Api and Basic Authorization method,it an suitable for 5.x 、6.x and 7.x versions

@tzssangglass
Copy link
Member

tzssangglass commented May 23, 2022

Hi @fanchangjifen , thanks for your contribution, but please let me know if we ever discussed this on the APISIX mailing list?

@fanchangjifen
Copy link
Author

没讨论过,我们主要的日志存储、检索沿用ELK技术解决方案,所以看到Clickhouse的实现后,引入Elasticsearch的接入方案。
当然,我们现在的解决方案不仅限于直接直接与Es进行交互

Copy link
Member

@spacewander spacewander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add test like the clickhouse-logger?
You can refer to https://github.com/apache/apisix/blob/master/docs/en/latest/internal/testing-framework.md about the test framework

@fanchangjifen
Copy link
Author

for details:

【金山文档】 Plugin elasticsearch-logger test
https://kdocs.cn/l/cgTYj4imEMld

@@ -0,0 +1,182 @@
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add elasticsearch-logger to

plugins: # plugin list (sorted by priority)
so CI can be run

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's added

--- response_body
opentracing
--- error_log
elasticsearch body:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check the upload data in the body

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually,the data in body is unused for this plugin. for clickhouse-logger it print the SQL statement, for this plugin ,mybe the path is valuable to check ,and i do not know how to print it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check the log of the request dumped by the dummy HTTP server.
Reference:

apisix/t/plugin/loggly.t

Lines 652 to 657 in e6153b6

--- response_body
expired link
--- grep_error_log eval
qr/message received: [ -~]+/
--- grep_error_log_out eval
qr/message received: <9>1 [\d\-T:.]+Z [\d.]+ apisix [\d]+ - \[tok\@41058 tag="apisix"] \{"route_id":"1"\}/

my $http_config = $block->http_config // <<_EOC_;
server {
listen 10420;
location /elasticsearch-logger/test {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL should contain the index name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no,the index name is apisix-elasticsearch-logger and automatically add yyyy.MM. dd

local url_decoded = url.parse(conf.endpoint_addr)
local host = url_decoded.host
local port = url_decoded.port
local index = conf.index .. "-" .. os.date('%Y.%m.%d')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conf.index can't be empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can not be empty,so i can set default value . in elasticsearch ,the indexes are similar to tables in Relational DBMS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I think it's an antipattern - i.e. creating indices [each day] for storing logs.
Cons:

  1. Creating a new index might require elevated privileges [We can't force users to give us some hefty permission of their ES cluster].
  2. Issue with data query and aggregation.

It's better to add a timestamp at the log message level. Easy to query, aggregate logs and requires no extra privileges

end

local url_path = index .. "/_doc"
local user_pass = conf.user .. ":" .. conf.password
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should bypass the authentication if conf.user is empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should refer to elasticsearch xpack is enabled;by default,xpack is disabled,for security it should be enabled.if conf.user is empty,it can not pass the policy of elasticsearch xpack authentication.

@fanchangjifen
Copy link
Author

In the production environment, we may not use direct interface interaction with the storage engine. According to experience, there may be some problems in connection pool and performance. Therefore, we now use date slicing of HTTP-Logger and use FileBeat to collect data. Using ingest pipeline preprocessing specification data persistence, this is the full range of solutions to prevent. However, this plug-in might be an easy way to label Clickhouse-Logger

@spacewander
Copy link
Member

I see. Let's wait for someone who is willing to try this plugin in the product environment.

@fanchangjifen
Copy link
Author

I see. Let's wait for someone who is willing to try this plugin in the product environment.

I still want to know what do I need to do next for this plug-in to be accepted

@membphis
Copy link
Member

@fanchangjifen
can you submit a doc about how to use this plugin?
I think this plugin is meaningful now.

It can help more users follow this pr.

@fanchangjifen
Copy link
Author

for details:

【金山文档】 Plugin elasticsearch-logger test https://kdocs.cn/l/cgTYj4imEMld

@membphis hey,there it is.

@spacewander
Copy link
Member

@fanchangjifen
After discussing with @leslie-tsang and @tzssangglass, I am sorry to have to inform you that, given the cost of maintenance, we do not intend to move forward with this feature unless more users are willing to use it.

@fanchangjifen
Copy link
Author

@fanchangjifen After discussing with @leslie-tsang and @tzssangglass, I am sorry to have to inform you that, given the cost of maintenance, we do not intend to move forward with this feature unless more users are willing to use it.

This is just an idea based on the Clickhosue-Logger plugin, and while we regret the final result, we will continue to follow the Apisix project and contribute our thoughts on SM cryptography and logging

@spacewander
Copy link
Member

@fanchangjifen
I discussed it again with @tzssangglass.
We can accept this plugin but with a requirement: we need to test against real ES.

We can run ES like https://github.com/elastic/elastic-github-actions/blob/master/elasticsearch/run-elasticsearch.sh in https://github.com/apache/apisix/blob/master/ci/pod/docker-compose.yml.

for i = 1, #entries do
core.table.insert(log_table, core.json.encode(entries[i]))
end
data = core.table.concat(log_table, " ") -- assemble multi items as string "{} {}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, are the space-separated JSONs going to be treated as separate events?

Else, just a suggestion - we can always use the elasticsearch bulk API (endpoint/index/_bulk). The indexing speed is much faster there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would recommend just using the API for batch processing with ES. ES should be more read and less write.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, are the space-separated JSONs going to be treated as separate events?

Else, just a suggestion - we can always use the elasticsearch bulk API (endpoint/index/_bulk). The indexing speed is much faster there.

yes,using the bulk API is a good choice.I will try to improve it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would recommend just using the API for batch processing with ES. ES should be more read and less write.

I don't agree with ES should be more read and less write.In simple scenarios, we don't need to worry about reading and writing. In a complex scene,we can use index lifecycle management to optimize. In fact, I think we should pay attention to its ecology,for ELKB.

Copy link
Member

@tzssangglass tzssangglass Jun 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. experience shows that ES write performance is not high and that high-frequency writes create performance problems.
    2 For APISIX, writing data at high frequencies will also have performance problems, and APISIX provides batch process feature, and almost all logging plugins will follow this idea. I think combining the batch process of APISIX and the bulk of ES is a more suitable idea.

@fanchangjifen
Copy link
Author

The BULK API version has been submitted. In addition to separating data according to newline characters, the id of logs is automatically generated by ES or can be changed to SkyWalking ID late

Copy link
Member

@bisakhmondal bisakhmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice progress so far. Please run the linter once again. Thank you!

local url_decoded = url.parse(conf.endpoint_addr)
local host = url_decoded.host
local port = url_decoded.port
local index = conf.index .. "-" .. os.date('%Y.%m.%d')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I think it's an antipattern - i.e. creating indices [each day] for storing logs.
Cons:

  1. Creating a new index might require elevated privileges [We can't force users to give us some hefty permission of their ES cluster].
  2. Issue with data query and aggregation.

It's better to add a timestamp at the log message level. Easy to query, aggregate logs and requires no extra privileges

--- error_log
elasticsearch body:
elasticsearch headers: authorization:Basic ZWxhc3RpYzplbGFzdGlj
--- wait: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required I presume!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for creating indices [each day] for storing logs,refer to the beats component's log storage ,i think it works well with the data time range

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for creating indices [each day] for storing logs,refer to the beats component's log storage ,i think it works well with the data time range

@bisakhmondal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index in elasticsearch is unlike the one in RDMS. It is more like a partition or a log file.

AFAIK, the admin of elasticsearch creates a template to define the pattern of indexes, and the client can create any index with different suffix (usually a timestamp).

Elasticsearch will query each index parallelly. And log in old indexes can be purged to save the disk.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index in elasticsearch is unlike the one in RDMS. It is more like a partition or a log file.

AFAIK, the admin of elasticsearch creates a template to define the pattern of indexes, and the client can create any index with different suffix (usually a timestamp).

Elasticsearch will query each index parallelly. And log in old indexes can be purged to save the disk.

You may be referring to the index lifecycle policies..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Ack!

--- response_body
opentracing
--- error_log
elasticsearch body:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check the log of the request dumped by the dummy HTTP server.
Reference:

apisix/t/plugin/loggly.t

Lines 652 to 657 in e6153b6

--- response_body
expired link
--- grep_error_log eval
qr/message received: [ -~]+/
--- grep_error_log_out eval
qr/message received: <9>1 [\d\-T:.]+Z [\d.]+ apisix [\d]+ - \[tok\@41058 tag="apisix"] \{"route_id":"1"\}/

@fanchangjifen
Copy link
Author

It might be difficult for me to write test samples and i don’t know how to write

@spacewander
Copy link
Member

It might be difficult for me to write test samples and i don’t know how to write

You can refer to https://github.com/apache/apisix/blob/master/docs/en/latest/internal/testing-framework.md and run the test locally.

Also, some maintainers think we need to run the test with real elasticsearch. You can run ES like https://github.com/elastic/elastic-github-actions/blob/master/elasticsearch/run-elasticsearch.sh in https://github.com/apache/apisix/blob/master/ci/pod/docker-compose.yml. Then change the mock server to proxy the test requests to the real elasticsearch.

@fanchangjifen
Copy link
Author

fanchangjifen commented Jun 26, 2022

With reference to the official document submit related ci auxiliary https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-file.
you may can run the test with real elasticsearch
@spacewander What's next?

server {
listen 10420;
location /elasticsearch-logger/test {
content_by_lua_block {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to proxy the request to the real elasticsearch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh,try try

timeout: 10s
retries: 120

kibana:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the kibana?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the kibana?

I think you might need a visual tool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need kibana in the CI.

@spacewander
Copy link
Member

Let's merge master to make CI run.

@fanchangjifen
Copy link
Author

Let's merge master to make CI run.
@spacewander merge conflicts ?

@spacewander
Copy link
Member

Let's merge master to make CI run.
@spacewander merge conflicts ?

Let's merge master and solve the conflicts.

@ccxhwmy
Copy link
Contributor

ccxhwmy commented Aug 10, 2022

Hi~
I noticed that you seem to have no time to deal with this pr, and I'm interested in it, would you please let me continue to come up with this issue? : )

@fanchangjifen
Copy link
Author

Hi~ I noticed that you seem to have no time to deal with this pr, and I'm interested in it, would you please let me continue to come up with this issue? : )

@ccxhwmy I have no way to automate the automated testing,I think you can build on what I've done..
I have read the PR you submitted,and I noticed that you didn't complete the logic of X-PACK certification, but i have finished.

@spacewander @tzssangglass #7643 for this issue,It's up to you

@ccxhwmy
Copy link
Contributor

ccxhwmy commented Aug 11, 2022

Hi~ I noticed that you seem to have no time to deal with this pr, and I'm interested in it, would you please let me continue to come up with this issue? : )

@ccxhwmy I have no way to automate the automated testing,I think you can build on what I've done.. I have read the PR you submitted,and I noticed that you didn't complete the logic of X-PACK certification, but i have finished.

I will refer to what you have finished, thank you very much!

@tzssangglass
Copy link
Member

@spacewander @tzssangglass #7643 for this issue,It's up to you

Thanks for your contribution, let's do it.

@fanchangjifen
Copy link
Author

@spacewander @tzssangglass #7643 for this issue,It's up to you

Thanks for your contribution, let's do it.

@ccxhwmy As mentioned above, you should use the elasticsearch bulk API (endpoint/index/_bulk)..but I noticed you didn't.
@tzssangglass Why not recommend testing on my basis or use my plugin directly.

@tzssangglass
Copy link
Member

tzssangglass commented Aug 12, 2022

As mentioned above, you should use the elasticsearch bulk API (endpoint/index/_bulk)..but I noticed you didn't.

Yes, we need to implement this.

Why not recommend testing on my basis or use my plugin directly.

It's not a decision I made. I guess the reason is the read/write access of the branch, he may not have the write access of your PR branch

@ccxhwmy
Copy link
Contributor

ccxhwmy commented Aug 12, 2022

@spacewander @tzssangglass #7643 for this issue,It's up to you

Thanks for your contribution, let's do it.

@ccxhwmy As mentioned above, you should use the elasticsearch bulk API (endpoint/index/_bulk)..but I noticed you didn't.

Yes, I will implement this suggestion, Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants