-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init elasticsearch-logger.lua #7099
Conversation
finally,i use elasticsearch Restful Api and Basic Authorization method,it an suitable for 5.x 、6.x and 7.x versions |
Hi @fanchangjifen , thanks for your contribution, but please let me know if we ever discussed this on the APISIX mailing list? |
没讨论过,我们主要的日志存储、检索沿用ELK技术解决方案,所以看到Clickhouse的实现后,引入Elasticsearch的接入方案。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add test like the clickhouse-logger?
You can refer to https://github.com/apache/apisix/blob/master/docs/en/latest/internal/testing-framework.md about the test framework
for details: 【金山文档】 Plugin elasticsearch-logger test |
t/plugin/elasticsearch-logger.t
Outdated
@@ -0,0 +1,182 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add elasticsearch-logger
to
apisix/conf/config-default.yaml
Line 324 in 9b0cc7a
plugins: # plugin list (sorted by priority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's added
t/plugin/elasticsearch-logger.t
Outdated
--- response_body | ||
opentracing | ||
--- error_log | ||
elasticsearch body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the upload data in the body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually,the data in body is unused for this plugin. for clickhouse-logger it print the SQL statement, for this plugin ,mybe the path is valuable to check ,and i do not know how to print it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check the log of the request dumped by the dummy HTTP server.
Reference:
Lines 652 to 657 in e6153b6
--- response_body | |
expired link | |
--- grep_error_log eval | |
qr/message received: [ -~]+/ | |
--- grep_error_log_out eval | |
qr/message received: <9>1 [\d\-T:.]+Z [\d.]+ apisix [\d]+ - \[tok\@41058 tag="apisix"] \{"route_id":"1"\}/ |
t/plugin/elasticsearch-logger.t
Outdated
my $http_config = $block->http_config // <<_EOC_; | ||
server { | ||
listen 10420; | ||
location /elasticsearch-logger/test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL should contain the index name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no,the index name is apisix-elasticsearch-logger and automatically add yyyy.MM. dd
local url_decoded = url.parse(conf.endpoint_addr) | ||
local host = url_decoded.host | ||
local port = url_decoded.port | ||
local index = conf.index .. "-" .. os.date('%Y.%m.%d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conf.index
can't be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can not be empty,so i can set default value . in elasticsearch ,the indexes are similar to tables in Relational DBMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I think it's an antipattern - i.e. creating indices [each day] for storing logs.
Cons:
- Creating a new index might require elevated privileges [We can't force users to give us some hefty permission of their ES cluster].
- Issue with data query and aggregation.
It's better to add a timestamp at the log message level. Easy to query, aggregate logs and requires no extra privileges
end | ||
|
||
local url_path = index .. "/_doc" | ||
local user_pass = conf.user .. ":" .. conf.password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should bypass the authentication if conf.user is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should refer to elasticsearch xpack is enabled;by default,xpack is disabled,for security it should be enabled.if conf.user is empty,it can not pass the policy of elasticsearch xpack authentication.
In the production environment, we may not use direct interface interaction with the storage engine. According to experience, there may be some problems in connection pool and performance. Therefore, we now use date slicing of HTTP-Logger and use FileBeat to collect data. Using ingest pipeline preprocessing specification data persistence, this is the full range of solutions to prevent. However, this plug-in might be an easy way to label Clickhouse-Logger |
I see. Let's wait for someone who is willing to try this plugin in the product environment. |
I still want to know what do I need to do next for this plug-in to be accepted |
@fanchangjifen It can help more users follow this pr. |
@membphis hey,there it is. |
@fanchangjifen |
This is just an idea based on the Clickhosue-Logger plugin, and while we regret the final result, we will continue to follow the Apisix project and contribute our thoughts on SM cryptography and logging |
@fanchangjifen We can run ES like https://github.com/elastic/elastic-github-actions/blob/master/elasticsearch/run-elasticsearch.sh in https://github.com/apache/apisix/blob/master/ci/pod/docker-compose.yml. |
for i = 1, #entries do | ||
core.table.insert(log_table, core.json.encode(entries[i])) | ||
end | ||
data = core.table.concat(log_table, " ") -- assemble multi items as string "{} {}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, are the space-separated JSONs going to be treated as separate events?
Else, just a suggestion - we can always use the elasticsearch bulk API (endpoint/index/_bulk
). The indexing speed is much faster there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would recommend just using the API for batch processing with ES. ES should be more read and less write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, are the space-separated JSONs going to be treated as separate events?
Else, just a suggestion - we can always use the elasticsearch bulk API (
endpoint/index/_bulk
). The indexing speed is much faster there.
yes,using the bulk API is a good choice.I will try to improve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would recommend just using the API for batch processing with ES. ES should be more read and less write.
I don't agree with ES should be more read and less write.In simple scenarios, we don't need to worry about reading and writing. In a complex scene,we can use index lifecycle management to optimize. In fact, I think we should pay attention to its ecology,for ELKB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- experience shows that ES write performance is not high and that high-frequency writes create performance problems.
2 For APISIX, writing data at high frequencies will also have performance problems, and APISIX provides batch process feature, and almost all logging plugins will follow this idea. I think combining the batch process of APISIX and the bulk of ES is a more suitable idea.
The BULK API version has been submitted. In addition to separating data according to newline characters, the id of logs is automatically generated by ES or can be changed to SkyWalking ID late |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice progress so far. Please run the linter once again. Thank you!
local url_decoded = url.parse(conf.endpoint_addr) | ||
local host = url_decoded.host | ||
local port = url_decoded.port | ||
local index = conf.index .. "-" .. os.date('%Y.%m.%d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I think it's an antipattern - i.e. creating indices [each day] for storing logs.
Cons:
- Creating a new index might require elevated privileges [We can't force users to give us some hefty permission of their ES cluster].
- Issue with data query and aggregation.
It's better to add a timestamp at the log message level. Easy to query, aggregate logs and requires no extra privileges
t/plugin/elasticsearch-logger.t
Outdated
--- error_log | ||
elasticsearch body: | ||
elasticsearch headers: authorization:Basic ZWxhc3RpYzplbGFzdGlj | ||
--- wait: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required I presume!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for creating indices [each day] for storing logs,refer to the beats component's log storage ,i think it works well with the data time range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for creating indices [each day] for storing logs,refer to the beats component's log storage ,i think it works well with the data time range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index in elasticsearch is unlike the one in RDMS. It is more like a partition or a log file.
AFAIK, the admin of elasticsearch creates a template to define the pattern of indexes, and the client can create any index with different suffix (usually a timestamp).
Elasticsearch will query each index parallelly. And log in old indexes can be purged to save the disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index in elasticsearch is unlike the one in RDMS. It is more like a partition or a log file.
AFAIK, the admin of elasticsearch creates a template to define the pattern of indexes, and the client can create any index with different suffix (usually a timestamp).
Elasticsearch will query each index parallelly. And log in old indexes can be purged to save the disk.
You may be referring to the index lifecycle policies..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Ack!
t/plugin/elasticsearch-logger.t
Outdated
--- response_body | ||
opentracing | ||
--- error_log | ||
elasticsearch body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check the log of the request dumped by the dummy HTTP server.
Reference:
Lines 652 to 657 in e6153b6
--- response_body | |
expired link | |
--- grep_error_log eval | |
qr/message received: [ -~]+/ | |
--- grep_error_log_out eval | |
qr/message received: <9>1 [\d\-T:.]+Z [\d.]+ apisix [\d]+ - \[tok\@41058 tag="apisix"] \{"route_id":"1"\}/ |
It might be difficult for me to write test samples and i don’t know how to write |
You can refer to https://github.com/apache/apisix/blob/master/docs/en/latest/internal/testing-framework.md and run the test locally. Also, some maintainers think we need to run the test with real elasticsearch. You can run ES like https://github.com/elastic/elastic-github-actions/blob/master/elasticsearch/run-elasticsearch.sh in https://github.com/apache/apisix/blob/master/ci/pod/docker-compose.yml. Then change the mock server to proxy the test requests to the real elasticsearch. |
With reference to the official document submit related ci auxiliary https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-file. |
t/plugin/elasticsearch-logger.t
Outdated
server { | ||
listen 10420; | ||
location /elasticsearch-logger/test { | ||
content_by_lua_block { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to proxy the request to the real elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh,try try
ci/pod/docker-compose.yml
Outdated
timeout: 10s | ||
retries: 120 | ||
|
||
kibana: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the kibana?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the kibana?
I think you might need a visual tool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need kibana in the CI.
Let's merge master to make CI run. |
|
Let's merge master and solve the conflicts. |
Hi~ |
@ccxhwmy I have no way to automate the automated testing,I think you can build on what I've done.. @spacewander @tzssangglass #7643 for this issue,It's up to you |
I will refer to what you have finished, thank you very much! |
Thanks for your contribution, let's do it. |
@ccxhwmy As mentioned above, you should use the elasticsearch bulk API (endpoint/index/_bulk)..but I noticed you didn't. |
Yes, we need to implement this.
It's not a decision I made. I guess the reason is the read/write access of the branch, he may not have the write access of your PR branch |
Yes, I will implement this suggestion, Thank you. |
Description
I followed the clickhouse-Logger implementation logic and the data store was partially adapted for ELasticsearch;By default, index names are divided by date (yyyy.MM.dd),Dbas can further optimize log storage by combining index templates.