Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: updates on integration and develop folder #3739

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
499 changes: 370 additions & 129 deletions docs/en/developer/built_in_function_develop_guide.md

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions docs/en/developer/contributing.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
# Contributing
Please refer to [Contribution Guideline](https://github.com/4paradigm/OpenMLDB/blob/main/CONTRIBUTING.md)

## Pull Request (PR) Guidelines

When submitting a PR, please pay attention to the following points:
- PR Title: Please adhere to the [commit format](https://github.com/4paradigm/rfcs/blob/main/style-guide/commit-convention.md#conventional-commits-reference) for the PR title. **Note that this refers to the PR title, not the commits within the PR**.
```{note}
If the title does not meet the standard, `pr-linter / pr-name-lint (pull_request)` will fail with a status of `x`.
```
- PR Checks: There are various checks in a PR, and only `codecov/patch` and `codecov/project` may not pass. Other checks should pass. If other checks do not pass and you cannot fix them or believe they should not be fixed, you can leave a comment in the PR.

- PR Description: Please explain the intent of the PR in the first comment of the PR. We provide a PR comment template, and while you are not required to follow it, ensure that there is sufficient explanation.

- PR Files Changed: Pay attention to the `files changed` in the PR. Do not include code changes outside the scope of the PR intent. You can generally eliminate unnecessary diffs by using `git merge origin/main` followed by `git push` to the PR branch. If you need assistance, leave a comment in the PR.
```{note}
If you are not modifying the code based on the main branch, when the PR intends to merge into the main branch, the `files changed` will include unnecessary code. For example, if the main branch is at commit 10, and you start from commit 9 of the old main, add new_commit1, and then add new_commit2 on top of new_commit1, you actually only want to submit new_commit2, but the PR will include new_commit1 and new_commit2.
In this case, just use `git merge origin/main` and `git push` to the PR branch to only include the changes.
```
```{seealso}
If you want the branch code to be cleaner, you can avoid using `git merge` and use `git rebase -i origin/main` instead. It will add your changes one by one on top of the main branch. However, it will change the commit history, and you need `git push -f` to override the branch.
```

## Compilation Guidelines

For compilation details, refer to the [Compilation Documentation](../deploy/compile.md). To avoid the impact of operating systems and tool versions, we recommend compiling OpenMLDB in a compilation image. Since compiling the entire OpenMLDB requires significant space, we recommend using `OPENMLDB_BUILD_TARGET` to specify only the parts you need.
1 change: 0 additions & 1 deletion docs/en/developer/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,3 @@ Developers
built_in_function_develop_guide
sdk_develop
python_dev
udf_develop_guide
19 changes: 17 additions & 2 deletions docs/en/developer/python_dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,19 @@

There are two modules in `python/`: Python SDK and an OpenMLDB diagnostic tool.

## SDK Testing Methods
## SDK

The Python SDK itself does not depend on the pytest and tox libraries used for testing. If you want to use the tests in the tests directory for testing, you can download the testing dependencies using the following method.

```
pip install 'openmldb[test]'
pip install 'dist/....whl[test]'
```

### Testing Method

Run the command `make SQL_PYSDK_ENABLE=ON OPENMLDB_BUILD_TARGET=cp_python_sdk_so` under the root directory and make sure the library in `python/openmldb_sdk/openmldb/native/` was the latest native library. Testing typically requires connecting to an OpenMLDB cluster. If you haven't started a cluster yet, or if you've made code changes to the service components, you'll also need to compile the TARGET openmldb and start a onebox cluster. You can refer to the launch section of `steps/test_python.sh` for guidance.

Run the command `make SQL_PYSDK_ENABLE=ON OPENMLDB_BUILD_TARGET=cp_python_sdk_so` under the root directory and make sure the library in `python/openmldb_sdk/openmldb/native/` was the latest native library.
1. Package installation test: Install the compiled `whl`, then run `pytest tests/`. You can use the script `steps/test_python.sh` directly.
2. Dynamic test: Make sure there isn't OpenMLDB in `pip` or the compiled `whl`. Run `pytest test/` in `python/openmldb_sdk`, thereby you can easily debug.

Expand Down Expand Up @@ -32,6 +42,11 @@ If the python log messages are required in all tests(even successful tests), ple
pytest -so log_cli=true --log-cli-level=DEBUG tests/
```

You can also use the module mode for running tests, which is suitable for actual runtime testing.
```
python -m diagnostic_tool.diagnose ...
```

## Conda

If you use conda, `pytest` may found the wrong python, then get errors like `ModuleNotFoundError: No module named 'IPython'`. Please use `python -m pytest`.
27 changes: 19 additions & 8 deletions docs/en/developer/sdk_develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,19 @@ The OpenMLDB SDK can be divided into several layers, as shown in the figure. The
The bottom layer is the SDK core layer, which is implemented as [SQLClusterRouter](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L110). It is the core layer of **client**. All operations on OpenMLDB clusters can be done by using the methods of `SQLClusterRouter` after proper configuration.

Three core methods of this layer that developers may need to use are:

1. [ExecuteSQL](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L160) supports the execution of all SQL commands, including DDL, DML and DQL.
2. [ExecuteSQLParameterized](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L166)supports parameterized SQL.
3. [ExecuteSQLRequest](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L156)is the special methods for the OpenMLDB specific execution mode: [Online Request mode](../tutorial/modes.md#4-the-online-request-mode).


Other methods, such as CreateDB, DropDB, DropTable, have not been removed promptly due to historical reasons. Developers don't need to be concerned about them.

### Wrapper Layer
Due to the complexity of the implementation of the SDK Layer, we didn't develop the Java and Python SDKs from scratch, but to use Java and Python to call the **SDK Layer**. Specifically, we made a wrapper layer using Swig.
Due to the complexity of the implementation of the SDK Layer, we didn't develop the Java and Python SDKs from scratch, but to use Java and Python to call the **SDK Layer**. Specifically, we made a wrapper layer using swig.

Java Wrapper is implemented as [SqlClusterExecutor](https://github.com/4paradigm/OpenMLDB/blob/main/java/openmldb-jdbc/src/main/java/com/_4paradigm/openmldb/sdk/impl/SqlClusterExecutor.java). It is a simple wrapper of `sql_router_sdk`, including the conversion of input types, the encapsulation of returned results, the encapsulation of returned errors.

Python Wrapper is implemented as [OpenMLDBSdk](https://github.com/4paradigm/OpenMLDB/blob/main/python/openmldb/sdk/sdk.py). Like the Java Wrapper, it is a simple wrapper as well.



### User Layer
Although the Wrapper Layer can be used directly, it is not convenient enough. So, we develop another layer, the User Layer of the Java/Python SDK.

Expand All @@ -36,7 +33,8 @@ The Python User Layer supports the `sqlalchemy`. See [sqlalchemy_openmldb](https

We want an easier to use C++ SDK which doesn't need a Wrapper Layer.
Therefore, in theory, developers only need to design and implement the user layer, which calls the SDK layer.
However, in consideration of code reuse, the SDK Layer code may be changed to some extent, or the core SDK code structure may be adjusted (for example, exposing part of the SDK Layer header file, etc.).

However, in consideration of code reuse, the SDK Layer code may be changed to some extent, or the core SDK code structure may be adjusted (for example, exposing part of the SDK Layer header file, etc.).

## Details of SDK Layer

Expand All @@ -48,7 +46,6 @@ The first two methods are using two options, which create a server connecting Cl
```
These two methods, which do not expose the metadata related DBSDK, are suitable for ordinary users. The underlayers of Java and Python SDK also use these two approaches.


Another way is to create based on DBSDK:
```
explicit SQLClusterRouter(DBSDK* sdk);
Expand Down Expand Up @@ -85,4 +82,18 @@ If you only want to run JAVA testing, try the commands below:
```
mvn test -pl openmldb-jdbc -Dtest="SQLRouterSmokeTest"
mvn test -pl openmldb-jdbc -Dtest="SQLRouterSmokeTest#AnyMethod"
```
```

### batchjob test

batchjob tests can be done using the following method:
```
$SPARK_HOME/bin/spark-submit --master local --class com._4paradigm.openmldb.batchjob.ImportOfflineData --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf spark.openmldb.zk.root.path=/openmldb --conf spark.openmldb.zk.cluster=127.0.0.1:2181 openmldb-batchjob/target/openmldb-batchjob-0.6.5-SNAPSHOT.jar load_data.txt true
```

Alternatively, you can copy the compiled openmldb-batchjob JAR file to the `lib` directory of the task manager in the OpenMLDB cluster. Then, you can use the client or Taskmanager Client to send commands for testing.

When using Hive as a data source, make sure the metastore service is available. For local testing, you can start the metastore service in the Hive directory with the default address being `thrift://localhost:9083`.
```
bin/hive --service metastore
```
216 changes: 0 additions & 216 deletions docs/en/developer/udf_develop_guide.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/en/integration/deploy_integration/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=============================
dispatch
Dispatch
=============================

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/en/integration/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=============================
Upstream and downstream ecology
Upstream and Downstream Ecology
=============================

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/en/integration/online_datasources/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=============================
online data source
Online Data Source
=============================

.. toctree::
Expand Down
Loading