Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Java Binding. #8942

Closed
adevday opened this issue Apr 3, 2023 · 6 comments · Fixed by #11214
Closed

Benchmark Java Binding. #8942

adevday opened this issue Apr 3, 2023 · 6 comments · Fixed by #11214
Assignees
Milestone

Comments

@adevday
Copy link
Contributor

adevday commented Apr 3, 2023

          > By the way, I would strongly recommend to substitute the JNI implementation (com.risingwave.java.binding.Binding) of serialization/deserialization for a pure Java implementation.

I think before we decide to substitute the implementation we may need some benchmark to see if the cost of JNI call is significant. @idx0-dev Could you make a testing on it?

Originally posted by @wenym1 in #8914 (comment)

@yufansong
Copy link
Contributor

yufansong commented May 25, 2023

Write some test in this branch: feat/stream_chunk_benchmark:

  1. Test data: construct 500k rows. Each row has 10 items: short, int, long, float, double, bool, string, timestep, decimal, null
  2. ArrayList Test: Using ArrayList to construct and get data. In average, only for getting data operation, it will spend 10 ms.
  3. StreamChunk Test: construct payload and get data. In average, only for getting data operation, it will spend 2300ms. For StreamChunk Iterator Construction, it will spend extra 1000-2000ms.

@yufansong
Copy link
Contributor

Try more tests:
Remove the String, TimeStamp, and Decimal those object data types. Only left short, int, long, float, double, bool

  1. For Array List, it still spends 10 ms. The log out for running the test 10 times:
Time elapsed: 9 milliseconds
Time elapsed: 10 milliseconds
Time elapsed: 8 milliseconds
Time elapsed: 10 milliseconds
Time elapsed: 8 milliseconds
Time elapsed: 10 milliseconds
Time elapsed: 8 milliseconds
Time elapsed: 10 milliseconds
Time elapsed: 9 milliseconds
Time elapsed: 9 milliseconds
  1. For Stream Chunk, it will spend 300ms. The log out for running the test 10 times:
Time elapsed: 358 milliseconds
Time elapsed: 336 milliseconds
Time elapsed: 334 milliseconds
Time elapsed: 361 milliseconds
Time elapsed: 358 milliseconds
Time elapsed: 344 milliseconds
Time elapsed: 508 milliseconds
Time elapsed: 347 milliseconds
Time elapsed: 348 milliseconds
Time elapsed: 361 milliseconds

@yufansong
Copy link
Contributor

Code for ArrayList:

import java.util.ArrayList;
public class Test {
    static int loopTime = 500000;
    public static ArrayList<Object> myFunction(int index) {
        short v1 = (short) index;
        int v2 = (int) index;
        long v3 = (long) index;
        float v4 = (float) index;
        double v5 = (double) index;
        boolean v6 = index % 3 == 0;
        String v7 = "'"
                + new String(new char[(index % 10) + 1])
                        .replace("\0", String.valueOf(index))
                + "'";
        String v8 = "to_timestamp(" + index + ")";
        int v9 = index;
        Integer mayNull = null;
        ArrayList<Object> rowData = new ArrayList<>();
        rowData.add(v1);
        rowData.add(v2);
        rowData.add(v3);
        rowData.add(v4);
        rowData.add(v5);
        rowData.add(v6);
        rowData.add(v7);
        rowData.add(v8);
        rowData.add(v9);
        rowData.add(mayNull);
        return rowData;
    }

    public static double processRowData(ArrayList<Object> rowData) {
        short value1 = (short) rowData.get(0);
        int value2 = (int) rowData.get(1);
        long value3 = (long) rowData.get(2);
        float value4 = (float) rowData.get(3);
        double value5 = (double) rowData.get(4);
        boolean value6 = (boolean) rowData.get(5);
        // String value7 = (String) rowData.get(6);
        // String value8 = (String) rowData.get(7);
        // int value9 = (int) rowData.get(8);
        Integer mayNull = (Integer) rowData.get(9);
        return value1 + value2 + value3 + value4 + value5;
    }

    public static void main(String[] args) {
        // Start measuring the time
        ArrayList<ArrayList<Object>> data = new ArrayList<>();
        for (int i = 0; i < loopTime; i++) {
            data.add(myFunction(i));
        }
        for (int t = 0; t < 10; t++) {
            long startTime = System.currentTimeMillis();
            // Call your function here
            for (int i = 0; i < loopTime; i++) {
                processRowData(data.get(i));
            }
            // Stop measuring the time
            long endTime = System.currentTimeMillis();

            // Calculate the time elapsed
            long elapsedTime = endTime - startTime;

            // Print the elapsed time
            System.out.println("Time elapsed: " + elapsedTime + " milliseconds");
        }
    }
}

@yufansong
Copy link
Contributor

If you want to test StreamChunk, remember to change row countlet row_count = 500000; in data-chunk-payload-generator.rs file
Code for StreamChunk:

package com.risingwave.java.binding;

import java.io.IOException;

public class StreamChunkDemo {
    public static double getValue(StreamChunkRow rowData) {
        short value1 = (short) rowData.getShort(0);
        int value2 = (int) rowData.getInt(1);
        Long value3 = (Long) rowData.getLong(2);
        float value4 = (float) rowData.getFloat(3);
        double value5 = (double) rowData.getDouble(4);
        boolean value6 = (boolean) rowData.getBoolean(5);
        // String value7 = (String) rowData.getString(6);
        // java.sql.Timestamp value8 = (java.sql.Timestamp) rowData.getTimestamp(7);
        // int value9 = rowData.getDecimal(8).intValue();
        boolean mayNull = rowData.isNull(9);
        return value1 + value2 + value3 + value4 + value5;
    }

    public static void main(String[] args) throws IOException {
        byte[] payload = System.in.readAllBytes();
        for (int t = 0; t < 10; t++) {
            StreamChunkIterator iter = new StreamChunkIterator(payload);
            long startTime = System.currentTimeMillis();
            while (true) {
                try (StreamChunkRow row = iter.next()) {
                    if (row == null) {
                        break;
                    }
                    getValue(row);
                }
            }
            long endTime = System.currentTimeMillis();
            long elapsedTime = endTime - startTime;
            System.out.println("Time elapsed: " + elapsedTime + " milliseconds");
        }
    }
}

@wenym1
Copy link
Contributor

wenym1 commented Jun 15, 2023

@yufansong With #10229, the java binding can be run without any extra setting. Do you mind modifying the benchmark code to integrate with JMH so that we can easily run the benchmark to do performance improvement? You may refer to https://nickolasfisher.com/blog/How-to-Benchmark-Java-Code-Using-JUnit-and-JMH

@yufansong
Copy link
Contributor

@yufansong With #10229, the java binding can be run without any extra setting. Do you mind modifying the benchmark code to integrate with JMH so that we can easily run the benchmark to do performance improvement? You may refer to https://nickolasfisher.com/blog/How-to-Benchmark-Java-Code-Using-JUnit-and-JMH

Sure, I will check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants