Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send a X-ClickHouse summary on the header for HTTP client with number of rows inserted #5116

Merged
merged 19 commits into from
May 25, 2019

Conversation

YiuRULE
Copy link
Contributor

@YiuRULE YiuRULE commented Apr 26, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature
  • Backward Incompatible Change

Short description (up to few sentences):

Add a X-ClickHouse-Summary header when we send a query using HTTP with the settings send_progress_in_http_headers is enabled. Return the usual information of X-ClickHouse-Progress, with additional information like how many rows and bytes were inserted in the query.

Detailed description (optional):

clickhouse client
    DROP TABLE IF EXISTS test.insert_number_query;
    CREATE TABLE test.insert_number_query (record UInt32) Engine = Memory;

seq 1 848484 | curl 'http://localhost:8123/?query=INSERT%20INTO%20test.insert_number_query%20FORMAT%20CSV&send_progress_in_http_headers=1&wait_end_of_query=1' --data-binary @- -v

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8123 (#0)
> POST /?query=INSERT%20INTO%20test.test%20FORMAT%20CSV&send_progress_in_http_headers=1&wait_end_of_query=1 HTTP/1.1
> Host: localhost:8123
> User-Agent: curl/7.61.0
> Accept: */*
> Content-Length: 5828283
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Date: Fri, 26 Apr 2019 03:55:50 GMT
< Connection: Keep-Alive
< Content-Type: text/plain; charset=UTF-8
< X-ClickHouse-Server-Display-Name: guillaume-OMEN-by-HP-Laptop-15-ce0xx
< Transfer-Encoding: chunked
< Keep-Alive: timeout=3
< X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","write_rows":"848484","write_bytes":"3393936","total_rows":"0"}
< 
* Connection #0 to host localhost left intact

link to #2825

@YiuRULE YiuRULE changed the title Send header Send a X-ClickHouse summary on the header for HTTP client with number of rows inserted Apr 26, 2019
@alexey-milovidov
Copy link
Member

It does not work:

2019.04.29 05:28:59.704779 [ 872 ] {} <Error> BaseDaemon: ########################################
2019.04.29 05:28:59.704981 [ 872 ] {} <Error> BaseDaemon: (version 19.7.1.404) (from thread 34) Received signal Segmentation fault (11).
2019.04.29 05:28:59.705097 [ 872 ] {} <Error> BaseDaemon: Address: NULL pointer.
2019.04.29 05:28:59.705122 [ 872 ] {} <Error> BaseDaemon: Access: read.
2019.04.29 05:28:59.705199 [ 872 ] {} <Error> BaseDaemon: Address not mapped to object.
2019.04.29 05:28:59.750496 [ 872 ] {} <Error> BaseDaemon: 0. clickhouse-server(std::__1::basic_ostream<char, std::__1::char_traits<char> >::sentry::sentry(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+0x23) [0xb5c3123]
2019.04.29 05:28:59.750644 [ 872 ] {} <Error> BaseDaemon: 1. clickhouse-server(std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long)+0x33) [0xb5c2cf3]
2019.04.29 05:28:59.750759 [ 872 ] {} <Error> BaseDaemon: 2. clickhouse-server(std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*)+0x3c) [0xb5c2bbc]
2019.04.29 05:28:59.750815 [ 872 ] {} <Error> BaseDaemon: 3. clickhouse-server(DB::WriteBufferFromHTTPServerResponse::writeHeaderSummary()+0x7a) [0xb6b979a]
2019.04.29 05:28:59.750861 [ 872 ] {} <Error> BaseDaemon: 4. clickhouse-server(DB::WriteBufferFromHTTPServerResponse::finishSendHeaders()+0x39) [0xb6b99d9]
2019.04.29 05:28:59.750905 [ 872 ] {} <Error> BaseDaemon: 5. clickhouse-server(DB::WriteBufferFromHTTPServerResponse::nextImpl()+0x205f) [0xb6bc00f]
2019.04.29 05:28:59.750982 [ 872 ] {} <Error> BaseDaemon: 6. clickhouse-server(DB::WriteBuffer::next()+0x5a) [0xb5fd59a]
2019.04.29 05:28:59.751065 [ 872 ] {} <Error> BaseDaemon: 7. clickhouse-server(DB::WriteBufferFromHTTPServerResponse::finalize()+0x39) [0xb6bc519]
2019.04.29 05:28:59.751113 [ 872 ] {} <Error> BaseDaemon: 8. clickhouse-server(DB::HTTPHandler::processQuery(Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&)+0xf7ba) [0xb686f8a]
2019.04.29 05:28:59.751159 [ 872 ] {} <Error> BaseDaemon: 9. clickhouse-server(DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&)+0x1302) [0xb68bec2]
2019.04.29 05:28:59.751315 [ 872 ] {} <Error> BaseDaemon: 10. clickhouse-server(Poco::Net::HTTPServerConnection::run()+0x948) [0x15a859e8]
2019.04.29 05:28:59.751357 [ 872 ] {} <Error> BaseDaemon: 11. clickhouse-server(Poco::Net::TCPServerConnection::start()+0x19) [0x15af3ba9]
2019.04.29 05:28:59.751433 [ 872 ] {} <Error> BaseDaemon: 12. clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x47d) [0x15af494d]
2019.04.29 05:28:59.751491 [ 872 ] {} <Error> BaseDaemon: 13. clickhouse-server(Poco::PooledThread::run()+0x82) [0x15caec52]
2019.04.29 05:28:59.751524 [ 872 ] {} <Error> BaseDaemon: 14. clickhouse-server() [0x15ca9bca]
2019.04.29 05:28:59.751566 [ 872 ] {} <Error> BaseDaemon: 15. clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x73) [0x15ca8683]
2019.04.29 05:28:59.751628 [ 872 ] {} <Error> BaseDaemon: 16. clickhouse-server(void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(void*), Poco::ThreadImpl*> >(void*)+0x204) [0x15cabad4]
2019.04.29 05:28:59.751673 [ 872 ] {} <Error> BaseDaemon: 17. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7facba9266db]

}


void AllProgressValueImpl::write(const ProgressValues & value, WriteBuffer & out, UInt64 /*client_revision*/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compatibility yes. First different implementation were made of this helpers because compared about how we count the read rows, it's not really on real time but only in the end of query. So it wouldn't made that much sense at the moment that we show this value during the progress.

So AllProgressValueImpl has been implemented as a default case as it's quite likely the one would need to use. While the other implementation (concerning the read and write progress), is quite a specific case as it's used when a client use an HTTP request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I just think of, do you talk about backward compatibility with new version of database and old client ? this PR don't handle this case right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we maintain compatibility between server and client (including drivers with native protocol) in both ways: old client with new server and new client with old server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks, now it's fixed. I will just need to probably need to modify the file Define.h before the code got merged if we got an another release

@YiuRULE
Copy link
Contributor Author

YiuRULE commented May 2, 2019

Thanks for the feedback, does it's possible to have more information about how you were able to produce the bugs ? Tried to do a simple test and wasn't able to reproduce it :/

@alexey-milovidov
Copy link
Member

alexey-milovidov commented May 5, 2019

Thanks for the feedback, does it's possible to have more information about how you were able to produce the bugs ? Tried to do a simple test and wasn't able to reproduce it :/

I did not done it manually, just clicked to "Details" link near the checks, then clicked to relevant logs.

@YiuRULE
Copy link
Contributor Author

YiuRULE commented May 6, 2019

Thanks ! Now both problems are resolved. We can use now a client or a server with an outdated version without any problem, the segfault is normally resolved as well

@alexey-milovidov
Copy link
Member

write_* -> written_* ?

@@ -80,6 +80,11 @@ class WriteBufferFromHTTPServerResponse : public BufferWithOwnMemory<WriteBuffer
/// but not finish them with \r\n, allowing to send more headers subsequently.
void startSendHeaders();

// Used for write the header X-ClickHouse-progress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading comment. We have X-ClickHouse-Progress (capital P)

@@ -80,6 +80,11 @@ class WriteBufferFromHTTPServerResponse : public BufferWithOwnMemory<WriteBuffer
/// but not finish them with \r\n, allowing to send more headers subsequently.
void startSendHeaders();

// Used for write the header X-ClickHouse-progress
void writeHeaderProgress();
// Used for write the header X-ClickHouse-summary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X-ClickHouse-Summary

void ProgressValues::writeJSON(WriteBuffer & out) const
{
WriteJSONImpl::writeJSON(*this, out);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks awkwardly structured for me.

};

struct AllProgressValueImpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of comments. (but first need to think how to rewrite this code...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's both ReadProgress and WriteProgress have read, write, writeJSON methods.
But better to rename read -> deserializeBinary, write -> serializeBinary, writeJSON -> serializeJSON for clarity.
And let's rename AllProgress to ReadWriteProgress.
And let's ReadWriteProgress has ReadProgress and WriteProgress as its members. And its serde methods will take revision as an argument and call child methods appropriately.

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@alexey-milovidov alexey-milovidov added the pr-feature Pull request with new product feature label May 9, 2019
@alexey-milovidov
Copy link
Member

X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","write_rows":"848484","write_bytes":"3393936","total_rows":"0"}

In this example, the total_rows is absolutely misleading! We cannot proceed with that. What alternative do you suggest?

WriteBufferFromOwnString progress_string_writer;
accumulated_progress.writeJSON<ReadProgressValueImpl>(progress_string_writer);

#if defined(POCO_CLICKHOUSE_PATCH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it few lines above?

@YiuRULE
Copy link
Contributor Author

YiuRULE commented May 13, 2019

Ok thanks ! The changes were applied,

Just two notes :

  • Concerning how we write the progress, I reverted it on the old way. So it's apply also for how we write on JSON. The problem is currently the progress concerning the writing is not on real time (we get it on the end), so it's mean we send values who will not change on the progress but only in the summary, does it's ok ?
  • I also renamed total_rows to rows_in_set, does it's ok to have this kind of breaking change ? I think particularly for some ClickHouse client library implementation or just some company who use the HTTP interface directly. :/


void read(ReadBuffer & in, UInt64 server_revision);
void write(WriteBuffer & out, UInt64 client_revision) const;
void writeJSON(WriteBuffer & out) const;
};

struct ReadProgress
{
size_t rows;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also rename here and above for consistency: read_rows, read_bytes, total_rows_to_read.

/// See Progress.
struct ProgressValues
{
size_t rows;
size_t bytes;
size_t total_rows;
size_t write_rows;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably written will be better than write?

@alexey-milovidov
Copy link
Member

rows_in_set is slightly misleading, because there is a notion of "set" in ClickHouse and it is completely different thing. Let it be total_rows_to_read. Yes, it's ok to have this backward incompatible change (need to write in changelog), because progress in HTTP headers is used not very wide.

@alexey-milovidov
Copy link
Member

The problem is currently the progress concerning the writing is not on real time (we get it on the end), so it's mean we send values who will not change on the progress but only in the summary, does it's ok ?

Why?

@YiuRULE
Copy link
Contributor Author

YiuRULE commented May 20, 2019

One of the reason about why it's that sometime, it's seem during INSERT query ClickHouse ignore totally the Progress. (using this branch, with the initial configuration)

Like on the example I write in the beginning of the PR, with the argument http_headers_progress_interval_ms=0, the only time time that ClickHouse send the header to the http client is only when I write the progress there :

https://github.com/PerformanceVision/ClickHouse/blob/send_header/dbms/src/Interpreters/executeQuery.cpp#L322

While when I use a smaller set of data, like with this command line :

seq 1 2000 | curl 'http://localhost:8123/?query=INSERT%20INTO%20test.insert_number_query%20FORMAT%20CSV&send_progress_in_http_headers=1&wait_end_of_query=1&http_headers_progress_interval_ms=0' --data-binary @- -v

We got more header who are send to the client with the value read_rows and read_bytes with value > 0.

So I don't know it was a bug or not but the main reason was to have first a first version who is consistent in term of results. While to have other fix who incorporate the changes with a real time progress, but also an integration with the TCP client.

@alexey-milovidov alexey-milovidov merged commit 461c491 into ClickHouse:master May 25, 2019
@alexey-milovidov
Copy link
Member

Now it is Ok.

PS. Let's not forget to mention it as backward incompatible change in changelog.

hatarist added a commit to hatarist/clickhouse-cli that referenced this pull request Jun 25, 2019
@filimonov filimonov added the comp-http http protocol related label Sep 3, 2019
@gothug
Copy link

gothug commented Mar 24, 2020

Excuse me, is it currently possible by any means (via http protocol) to get the information on how many rows were returned by the query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-http http protocol related pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants