Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Reindexing Doc Updation #18840

Merged
merged 2 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,90 @@ Perform the reindexing process described below.
{% partial file="/v1.5/deployment/reindex.md" /%}

**Note:** If you continue to experience issues, consider `re-installing` the search application.

## Configuration Parameters for Reindexing

This document provides detailed descriptions and best practices for configuring the reindexing process parameters. Proper configuration ensures efficient and reliable reindexing while minimizing potential system bottlenecks.

{% image
src="/images/v1.5/deployment/upgrade/reindex.png"
alt="Reindex Configuration"
/%}

### 1. `recreateIndex`
**Description**:
Determines whether to recreate the index from scratch during the reindexing process. Setting this to `true` will drop the existing index and create a new one.

**Best Practice**:
Use this option with caution. Set it to `true` only when you need a clean slate, such as after significant changes to your data model or during data migration. For routine updates, keep it `false` to preserve the existing index.

### 2. `batchSize`
**Description**:
Defines the maximum number of events sent in a single batch during reindexing. The default value is `100`.

**Best Practice**:
Adjust the batch size based on system capabilities and event size. A larger batch size improves throughput but may increase memory usage and processing time. Monitor performance and fine-tune accordingly.

### 3. `payLoadSize`
**Description**:
Specifies the maximum payload size (in bytes) for events sent in a batch. Default: `104,857,600 bytes` (100 MB).

**Best Practice**:
Ensure the payload size is within your system’s handling capacity. If memory issues or timeouts occur, reduce this value to improve stability.

### 4. `producerThreads`
**Description**:
Indicates the number of threads used for producing events during reindexing. Default: `10`.

**Best Practice**:
Balance the number of threads with system CPU and I/O capacity. Increasing this number can improve throughput but may lead to contention if set too high.

### 5. `maxConcurrentRequests`
**Description**:
Specifies the maximum number of concurrent requests sent to the search index at any given time. Default: `100`.

**Best Practice**:
Tune this value based on the indexing server’s capacity. Too many concurrent requests can overwhelm the server, leading to failures or slowdowns.

### 6. `maxRetries`
**Description**:
Specifies the maximum number of retry attempts for failed requests. Default: `3 retries`.

**Best Practice**:
Keep this value reasonable to avoid excessive load during failures. Analyze failure patterns to optimize this setting.

### 7. `initialBackoff`
**Description**:
Defines the initial backoff time (in milliseconds) before retrying a failed request. Default: `1000 ms` (1 second).

**Best Practice**:
Start with the default value. Increase it if failures occur frequently due to server overload or network issues.

### 8. `maxBackoff`
**Description**:
Specifies the maximum backoff time (in milliseconds) for retries. Default: `10,000 ms` (10 seconds).

**Best Practice**:
Set this value to align with your application’s latency tolerance. A longer backoff can reduce system load during peak times but may slow recovery from errors.

### 9. `queueSize`
**Description**:
Defines the internal queue size used for reindexing operations. Default: `100`.

**Best Practice**:
Adjust the queue size based on expected load and available memory resources. A larger queue can handle spikes in processing but requires more memory.

## Example Configuration for Best Practices

For high-performance systems, consider the following values as a starting point:

```json
{
"batchSize": 300,
"queueSize": 500,
"producerThreads": 20,
"maxConcurrentRequests": 500
}
```

Monitor system performance and adjust these parameters to optimize throughput and resource usage.
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,90 @@ Perform the reindexing process described below.
{% partial file="/v1.5/deployment/reindex.md" /%}

**Note:** If you continue to experience issues, consider `re-installing` the search application.

## Configuration Parameters for Reindexing

This document provides detailed descriptions and best practices for configuring the reindexing process parameters. Proper configuration ensures efficient and reliable reindexing while minimizing potential system bottlenecks.

{% image
src="/images/v1.6/deployment/upgrade/reindex.png"
alt="Reindex Configuration"
/%}

### 1. `recreateIndex`
**Description**:
Determines whether to recreate the index from scratch during the reindexing process. Setting this to `true` will drop the existing index and create a new one.

**Best Practice**:
Use this option with caution. Set it to `true` only when you need a clean slate, such as after significant changes to your data model or during data migration. For routine updates, keep it `false` to preserve the existing index.

### 2. `batchSize`
**Description**:
Defines the maximum number of events sent in a single batch during reindexing. The default value is `100`.

**Best Practice**:
Adjust the batch size based on system capabilities and event size. A larger batch size improves throughput but may increase memory usage and processing time. Monitor performance and fine-tune accordingly.

### 3. `payLoadSize`
**Description**:
Specifies the maximum payload size (in bytes) for events sent in a batch. Default: `104,857,600 bytes` (100 MB).

**Best Practice**:
Ensure the payload size is within your system’s handling capacity. If memory issues or timeouts occur, reduce this value to improve stability.

### 4. `producerThreads`
**Description**:
Indicates the number of threads used for producing events during reindexing. Default: `10`.

**Best Practice**:
Balance the number of threads with system CPU and I/O capacity. Increasing this number can improve throughput but may lead to contention if set too high.

### 5. `maxConcurrentRequests`
**Description**:
Specifies the maximum number of concurrent requests sent to the search index at any given time. Default: `100`.

**Best Practice**:
Tune this value based on the indexing server’s capacity. Too many concurrent requests can overwhelm the server, leading to failures or slowdowns.

### 6. `maxRetries`
**Description**:
Specifies the maximum number of retry attempts for failed requests. Default: `3 retries`.

**Best Practice**:
Keep this value reasonable to avoid excessive load during failures. Analyze failure patterns to optimize this setting.

### 7. `initialBackoff`
**Description**:
Defines the initial backoff time (in milliseconds) before retrying a failed request. Default: `1000 ms` (1 second).

**Best Practice**:
Start with the default value. Increase it if failures occur frequently due to server overload or network issues.

### 8. `maxBackoff`
**Description**:
Specifies the maximum backoff time (in milliseconds) for retries. Default: `10,000 ms` (10 seconds).

**Best Practice**:
Set this value to align with your application’s latency tolerance. A longer backoff can reduce system load during peak times but may slow recovery from errors.

### 9. `queueSize`
**Description**:
Defines the internal queue size used for reindexing operations. Default: `100`.

**Best Practice**:
Adjust the queue size based on expected load and available memory resources. A larger queue can handle spikes in processing but requires more memory.

## Example Configuration for Best Practices

For high-performance systems, consider the following values as a starting point:

```json
{
"batchSize": 300,
"queueSize": 500,
"producerThreads": 20,
"maxConcurrentRequests": 500
}
```

Monitor system performance and adjust these parameters to optimize throughput and resource usage.
Binary file modified openmetadata-docs/images/v1.5/deployment/upgrade/reindex-ES.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified openmetadata-docs/images/v1.6/deployment/upgrade/reindex-ES.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading