Query Performance Improvement Tag Retrieval

Data Specification

Members: 10 records
Categories: 100 records (10 per member)
Tags: 2000 records (200 per member)
Templates: 100,000 records (10,000 per member)
Source Codes: 100,000 to 500,000 records (1 to 5 randomly generated per template)

Test Conditions

Executed 100 times with 10 threads
Total of 1000 requests executed

Template Generation Conditions

Tags used: 20 tags (all existing tags)
Source Codes: 2 codes

Before Optimization

Speed Measurement

Total request count: 1000
Total elapsed time: 5,268,308 ms
Average elapsed time: 5,268 ms

Query Analysis

Total [2 + number of tags] queries executed

1. Retrieve Member (based on MemberId)

Repository: TemplateJpaRepository
Method: findByMemberId

    SELECT
        t1_0.id,
        t1_0.category_id,
        t1_0.created_at,
        t1_0.description,
        (SELECT COUNT(*) 
         FROM likes 
         WHERE likes.template_id = t1_0.id),
        t1_0.member_id,
        t1_0.modified_at,
        t1_0.title 
    FROM
        template t1_0 
    WHERE
        t1_0.member_id = ?

Number of Calls: 1 time

2. Retrieve Tag List for Template (based on TemplateId)

Repository: TemplateTagJpaRepository
Method: findDistinctByTemplateIn

    SELECT
        DISTINCT tt1_0.tag_id 
    FROM
        template_tag tt1_0 
    WHERE
        tt1_0.template_id IN (?, ?, ?, ?) # as many as the number of templates

Number of Calls: 1 time

3. Retrieve Tag Information (based on TagId)

Repository: TagJpaRepository
Method: fetchById

    SELECT
        t1_0.id,
        t1_0.created_at,
        t1_0.modified_at,
        t1_0.name 
    FROM
        tag t1_0 
    WHERE
        t1_0.id = ?

Number of Calls: 200 times (as many as the number of tags)

Necessary Tasks for Improvement

Query Optimization

1st Improvement: Covering Index

Covering Index

An index that contains all the data required to satisfy the query.

If all columns used in SELECT, WHERE, ORDER BY, GROUP BY, etc., are components of the index.

There exists a logic that retrieves all templates for a given member ID.
However, in reality, only the template ID is utilized after this logic.

Thus, we will modify the logic to retrieve only the template IDs.
This change will allow us to utilize the covering index, improving query performance.

Query

@Query("""
    SELECT t.id  
    FROM Template t  
    WHERE t.member.id = :memberId  
""")  
List<Long> findAllIdsByMemberId(Long memberId);

Proof of Covering Index Usage

Before

After

2nd Improvement: Covering Index + Use Subqueries Instead of IN Clause

When dealing with vast datasets, the IN clause can lead to performance degradation. In our code, the logic for retrieving template tags contains datasets in the IN clause (currently 100,000).

We will improve this by using subqueries.
Using a subquery can enhance the performance of the IN clause. A subquery is a query that is included within the main query and is useful for dynamically retrieving data.
By dynamically filtering data with a subquery, we can efficiently query data from indexed columns.

By using a subquery, we will combine the logic for retrieving template IDs based on member ID and retrieving template tags, thereby improving the performance of the IN clause.

Reference: SQL IN Clause Tuning

3rd Improvement: Tag Information Retrieval

Previously, after retrieving tags related to a template from the template tags, we queried the tag table one by one. This caused the tag retrieval logic to execute as many times as there were tags.

To solve this problem, we will merge the logic for retrieving template tags and the logic for retrieving tags.

@Query("""
    SELECT DISTINCT t  
    FROM Tag t  
    WHERE t.id IN (  
        SELECT DISTINCT tt.id.tagId    
        FROM TemplateTag tt    
        WHERE tt.id.templateId IN        
            (SELECT te.id FROM Template te WHERE te.member.id = :memberId)
    )
""")  
List<Tag> findDistinctTagNameByMemberIdIn(Long memberId);

After Optimization

Speed Measurement

1st Improvement

Total request count: 1000
Total elapsed time: 3,632,279 ms
Average elapsed time: 3,632 ms

2nd Improvement

Total request count: 1000
Total elapsed time: 2,704,116 ms
Average elapsed time: 2,704 ms

3rd Improvement

Total request count: 1000
Total elapsed time: 92,743 ms
Average elapsed time: 92 ms

Query Analysis

Total 1 query executed

4. Retrieve Tag Information (based on TagId)

Repository: TemplateTagJpaRepository
Method: findDistinctTagNameByMemberIdIn

    SELECT
        DISTINCT t1_0.id,
        t1_0.created_at,
        t1_0.modified_at,
        t1_0.name 
    FROM
        tag t1_0 
    WHERE
        t1_0.id IN (SELECT
            DISTINCT tt1_0.tag_id 
        FROM
            template_tag tt1_0 
        WHERE
            tt1_0.template_id IN (SELECT
                t2_0.id 
            FROM
                template t2_0 
            WHERE
                t2_0.member_id = ?))

Query Performance Improvement Tag Retrieval

Data Specification

Test Conditions

Template Generation Conditions

Before Optimization

Speed Measurement

Query Analysis

1. Retrieve Member (based on MemberId)

2. Retrieve Tag List for Template (based on TemplateId)

3. Retrieve Tag Information (based on TagId)

Necessary Tasks for Improvement

Query Optimization

1st Improvement: Covering Index

Query

Proof of Covering Index Usage

Before

After

2nd Improvement: Covering Index + Use Subqueries Instead of IN Clause

3rd Improvement: Tag Information Retrieval

After Optimization

Speed Measurement

1st Improvement

2nd Improvement

3rd Improvement

Query Analysis

4. Retrieve Tag Information (based on TagId)

Performance Improvement Results

Before Improvement

After Improvement

⚡️ 코드zap

프로젝트

규칙 및 정책

공통

백엔드

프론트엔드

매뉴얼

백엔드

기술 문서

백엔드

프론트엔드

회의록

Clone this wiki locally