[Issue 918] [Refactor] Remove the `clearMessageQueuesCh` in `partitionConsumer.dispatcher()` #921

Gleiphir2769 · 2022-12-21T12:51:22Z

Master Issue: #918

Motivation

The two chanel clearMessageQueuesCh and clearQueueCb just need to keep one.

For more details please check #918.

This PR does not ony aim to clean up, but also fix the potential bug in clearMessageQueuesCh.

For example, the clearMessageQueuesCh do the jod including clearing messageCh. But it may cause problem when SeekByTime invoked on partition topic.

pulsar-client-go/pulsar/consumer_impl.go

Lines 614 to 626 in 1d3499a

    
           func (c *consumer) SeekByTime(time time.Time) error { 
        
           	c.Lock() 
        
           	defer c.Unlock() 
        
           	var errs error 
        
           	// run SeekByTime on every partition of topic 
        
           	for _, cons := range c.consumers { 
        
           		if err := cons.SeekByTime(time); err != nil { 
        
           			msg := fmt.Sprintf("unable to SeekByTime for topic=%s subscription=%s", c.topic, c.Subscription()) 
        
           			errs = pkgerrors.Wrap(newError(SeekFailed, err.Error()), msg) 
        
           		} 
        
           	} 
        
           	return errs 
        
           }

pulsar-client-go/pulsar/consumer_partition.go

Lines 1168 to 1175 in 1d3499a

    
           case doneCh := <-pc.clearMessageQueuesCh: 
        
           	for len(pc.queueCh) > 0 { 
        
           		<-pc.queueCh 
        
           	} 
        
           	for len(pc.messageCh) > 0 { 
        
           		<-pc.messageCh 
        
           	} 
        
           	messages = nil

When consume the partition topic, all the partitionConsumer share the same messageCh. After SeekByTime on partitioned topic, the messageCh may be cleared more than one time which will cause the messages losing. Suppose there is such a situation, partitionConsumer-1 has cleared its messageCh and queueCh. When partitionConsumer-2 do the clear job, it can also exec this logic.

pulsar-client-go/pulsar/consumer_partition.go

Lines 1172 to 1174 in 1d3499a

    
           for len(pc.messageCh) > 0 { 
        
           	<-pc.messageCh 
        
           }

But messageCh is a share chan, partitionConsumer-1 may received new messages and put them to messageCh at this moment. There is such a possibility that partitionConsumer-2 cleared the new messages from messageCh.

Modifications

Remove the clearMessageQueuesCh in partitionConsumer.dispatcher()
Modify the clearQueueCb in partitionConsumer.dispatcher()

Verifying this change

Make sure that the change passes the CI checks.

nodece · 2022-12-26T09:09:11Z

pulsar/consumer_partition.go

-			}
-
-			close(doneCh)
+			clearQueueCb(nextMessageInQueue)


Do you forget the reset available permits?

Do you forget the reset available permits?

Hi. @nodece, I don't think it's necessary to reset available permits. In the user side, if permits over the threshold, internalFlow will be invoked.

pulsar-client-go/pulsar/consumer_partition.go

Lines 196 to 208 in 1d3499a

if ap >= flowThreshold {

availablePermits := ap

requestedPermits := ap

// check if permits changed

if !atomic.CompareAndSwapInt32(&p.permits, ap, 0) {

return

}

p.pc.log.Debugf("requesting more permits=%d available=%d", requestedPermits, availablePermits)

if err := p.pc.internalFlow(uint32(requestedPermits)); err != nil {

p.pc.log.WithError(err).Error("unable to send permits")

}

}

In the broker side, I checked the relative code and doesn't find any need to reset it. The Java Client does not reset it too.

So I think it's no need to reset the available permits. I don't understand why should reset it in the legacy code and I guess it's just a mearsure taken for safe.

nodece · 2022-12-27T03:43:51Z

pulsar/consumer_partition.go

@@ -138,15 +138,14 @@ type partitionConsumer struct {
 	// the size of the queue channel for buffering messages
 	queueSize       int32
 	queueCh         chan []*message
-	startMessageID  trackingMessageID
+	startMessageID  atomicMessageID


Could atomic.Value{} instead of this?

Could atomic.Value{} instead of this?

I think atomicMessageID is better than atomic.Value{} because it's more simple and clearly.

For example, if we use atomic.Value as the startMessageID type, the original using of startMessageID will be like this.

// original L985 return pc.startMessageID.greater(msgID.messageID) // atomicMessageID return pc.startMessageID.get().greater(msgID.messageID) // atomic.Value{} return pc.startMessageID.Load().(trackingMessageID).greater(msgID.messageID)

atomic.Value{} will need one more time type assertion.

But atomicMessageID may need a better name and declear position. Do you have more idea? Thanks.

nodece · 2022-12-27T08:42:42Z

pulsar/consumer_test.go

@@ -3177,6 +3177,7 @@ func TestConsumerSeekByTimeOnPartitionedTopic(t *testing.T) {
 	// should be able to consume all messages once again
 	for i := 0; i < N; i++ {
 		msg, err := consumer.Receive(ctx)
+		fmt.Println(string(msg.Payload()) + "-" + strconv.Itoa(i))


Suggested change

fmt.Println(string(msg.Payload()) + "-" + strconv.Itoa(i))

Got it. Thanks!

nodece

LGTM

…nsumer.dispatcher()

Gleiphir2769 changed the title ~~[Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher()~~ [Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher() Dec 21, 2022

nodece reviewed Dec 26, 2022

View reviewed changes

nodece reviewed Dec 27, 2022

View reviewed changes

nodece approved these changes Dec 27, 2022

View reviewed changes

nodece requested a review from RobertIndie December 27, 2022 08:50

nodece assigned Gleiphir2769 Dec 27, 2022

[Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionCo…

3afd2d4

…nsumer.dispatcher()

Gleiphir2769 force-pushed the issue918 branch from 91996c4 to 3afd2d4 Compare December 28, 2022 02:34

RobertIndie approved these changes Jan 3, 2023

View reviewed changes

RobertIndie merged commit 44dc85c into apache:master Jan 3, 2023

Gleiphir2769 mentioned this pull request Jan 4, 2023

Refactor the clearMessageQueuesCh and clearQueueCh in dispatcher #918

Closed

RobertIndie added this to the v0.10.0 milestone Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue 918] [Refactor] Remove the `clearMessageQueuesCh` in `partitionConsumer.dispatcher()` #921

[Issue 918] [Refactor] Remove the `clearMessageQueuesCh` in `partitionConsumer.dispatcher()` #921

Gleiphir2769 commented Dec 21, 2022 •

edited

Loading

nodece Dec 26, 2022

Gleiphir2769 Dec 26, 2022 •

edited

Loading

nodece Dec 27, 2022

Gleiphir2769 Dec 27, 2022 •

edited

Loading

nodece Dec 27, 2022

nodece Dec 27, 2022

Gleiphir2769 Dec 27, 2022

nodece left a comment

	func (c *consumer) SeekByTime(time time.Time) error {
	c.Lock()
	defer c.Unlock()
	var errs error
	// run SeekByTime on every partition of topic
	for _, cons := range c.consumers {
	if err := cons.SeekByTime(time); err != nil {
	msg := fmt.Sprintf("unable to SeekByTime for topic=%s subscription=%s", c.topic, c.Subscription())
	errs = pkgerrors.Wrap(newError(SeekFailed, err.Error()), msg)
	}
	}
	return errs
	}

	case doneCh := <-pc.clearMessageQueuesCh:
	for len(pc.queueCh) > 0 {
	<-pc.queueCh
	}
	for len(pc.messageCh) > 0 {
	<-pc.messageCh
	}
	messages = nil

	if ap >= flowThreshold {
	availablePermits := ap
	requestedPermits := ap
	// check if permits changed
	if !atomic.CompareAndSwapInt32(&p.permits, ap, 0) {
	return
	}

	p.pc.log.Debugf("requesting more permits=%d available=%d", requestedPermits, availablePermits)
	if err := p.pc.internalFlow(uint32(requestedPermits)); err != nil {
	p.pc.log.WithError(err).Error("unable to send permits")
	}
	}

[Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher() #921

[Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher() #921

Conversation

Gleiphir2769 commented Dec 21, 2022 • edited Loading

Motivation

Modifications

Verifying this change

nodece Dec 26, 2022

Choose a reason for hiding this comment

Gleiphir2769 Dec 26, 2022 • edited Loading

Choose a reason for hiding this comment

nodece Dec 27, 2022

Choose a reason for hiding this comment

Gleiphir2769 Dec 27, 2022 • edited Loading

Choose a reason for hiding this comment

nodece Dec 27, 2022

Choose a reason for hiding this comment

nodece Dec 27, 2022

Choose a reason for hiding this comment

Gleiphir2769 Dec 27, 2022

Choose a reason for hiding this comment

nodece left a comment

Choose a reason for hiding this comment

[Issue 918] [Refactor] Remove the `clearMessageQueuesCh` in `partitionConsumer.dispatcher()` #921

[Issue 918] [Refactor] Remove the `clearMessageQueuesCh` in `partitionConsumer.dispatcher()` #921

Gleiphir2769 commented Dec 21, 2022 •

edited

Loading

Gleiphir2769 Dec 26, 2022 •

edited

Loading

Gleiphir2769 Dec 27, 2022 •

edited

Loading