Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv,grok,json,ngnix parser添加多线程 #709

Merged
merged 5 commits into from
Aug 28, 2018
Merged

Conversation

redHJ
Copy link
Collaborator

@redHJ redHJ commented Aug 15, 2018

@redHJ redHJ force-pushed the pdr-6966 branch 3 times, most recently from 7fe4d33 to cec6521 Compare August 15, 2018 08:45
@redHJ redHJ changed the title [WIP] csv,grok,json,ngnix,syslog parser添加多线程 [WIP] csv,grok,json,ngnix parser添加多线程 Aug 15, 2018
@redHJ redHJ changed the title [WIP] csv,grok,json,ngnix parser添加多线程 csv,grok,json,ngnix,syslog parser添加多线程 Aug 15, 2018
@redHJ redHJ changed the title csv,grok,json,ngnix,syslog parser添加多线程 csv,grok,json,ngnix parser添加多线程 Aug 15, 2018
@redHJ redHJ force-pushed the pdr-6966 branch 2 times, most recently from f1ff457 to 706b030 Compare August 15, 2018 09:09
logkit.go Outdated
@@ -239,8 +239,9 @@ func main() {
times.AddLayout(conf.TimeLayouts)
}
if conf.MaxProcs == 0 {
conf.MaxProcs = runtime.NumCPU()
conf.MaxProcs = NumCpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/NumCpu/NumCPU/

@@ -39,6 +41,7 @@ type Parser struct {
allmoreStartNUmber int
allowNotMatch bool
ignoreInvalid bool
routineNumber int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routineNumber 听起来像某个任务号,建议改叫 numRoutines,其它地方也是

@@ -9,6 +9,10 @@ import (

"github.com/qiniu/log"

"sync"

"sort"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

分组搞一下

@redHJ redHJ force-pushed the pdr-6966 branch 3 times, most recently from 2f2d5a6 to 8911b8b Compare August 17, 2018 08:53
@@ -60,6 +61,11 @@ const (
InputNumber = "inputNumber"
)

var (
MaxProcs = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里初始化设置为1

utils/utils.go Outdated
@@ -10,3 +12,15 @@ func IsExist(path string) bool {
_, err := os.Stat(path)
return err == nil || os.IsExist(err)
}

func GetTestData(line string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法本身的含义是复制数据到默认的maxBatchSize,应该以函数本身的含义命名函数。然后这个函数主要用于测试,这一点可以写在注释上。

@@ -95,6 +98,10 @@ func NewParser(c conf.MapConf) (parser.Parser, error) {
}
allmoreStartNumber, _ := c.GetIntOr(parser.KeyCSVAllowMoreStartNum, 0)
ignoreInvalid, _ := c.GetBoolOr(parser.KeyCSVIgnoreInvalidField, false)
numRoutine := MaxProcs
if numRoutine == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

理论上MaxProcs不可能是0,如果是0,那么代表哪里出bug了,那我建议这里的numRoutine也不用跑飞,只要设置为1保持稳妥就好了

@@ -451,6 +463,15 @@ func (p *Parser) Rename(datas []Data) []Data {
return newData
}

func (p *Parser) RenameData(data Data) Data {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面的Rename应该调用你这个RenameData方法。另外这个方法不用挂到(p *Parser)下面,还能方便一下测试

return dataSlice, nil
}

func (p *Parser) parseLine(sendChan chan parser.ParseInfo, resultChan chan parser.ParseResult, wg *sync.WaitGroup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这个sendChan 应该叫 dataPipeline,那一头是send,这一头实际上是receive,另外channel类型也可以设置为receive only : <-chan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外这里为什么要单独写一个,我看parser包里有一个

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个和别的parse不一样,这边返回的是数组,别的不是,json会返回数组

@unknwon
Copy link
Contributor

unknwon commented Aug 21, 2018

LGTM

// label 不覆盖数据,其他parser不需要这么一步检验,因为Schema固定,json的Schema不固定
if _, ok := data[l.Name]; ok {
continue
func (p *Parser) parse(line string) (dataSlice []Data, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要返回数组

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json可能是数组啊

@redHJ redHJ changed the title csv,grok,json,ngnix parser添加多线程 [WIP] csv,grok,json,ngnix parser添加多线程 Aug 23, 2018
@redHJ redHJ force-pushed the pdr-6966 branch 2 times, most recently from 5cde0ce to 1f0b5d4 Compare August 27, 2018 08:15
@redHJ redHJ changed the title [WIP] csv,grok,json,ngnix parser添加多线程 csv,grok,json,ngnix parser添加多线程 Aug 27, 2018
@redHJ
Copy link
Collaborator Author

redHJ commented Aug 27, 2018

Unbuntu Parse Test

GROK, data: 1GB ParseTime(ns)
Routine: 1, ParseTime: 1226077872156
Routine: 2, ParseTime: 713335285905
Routine: 3, ParseTime: 503750868591
Routine: 4, ParseTime: 395376283339

CSV, data: 1GB ParseTime(ns)
Routine: 1, ParseTime: 262033208580
Routine: 2, ParseTime: 183709294532
Routine: 3, ParseTime: 141773371734
Routine: 4, ParseTime: 121622516601

NGNIX data: 1GB ParseTime(ns)
Routine: 1, ParseTime: 258334012559
Routine: 2, ParseTime: 188298916476
Routine: 3, ParseTime: 147130715494
Routine: 4, ParseTime: 126473431218

JSON data: 1GB ParseTime(ns)
Routine: 1, ParseTime: 112820240001
Routine: 2, ParseTime: 69955550437
Routine: 3, ParseTime: 51142855868
Routine: 4, ParseTime: 43761123498

@redHJ redHJ force-pushed the pdr-6966 branch 4 times, most recently from 93eadfd to 8939a44 Compare August 27, 2018 09:56
for resultInfo := range resultChan {
parseResultSlice = append(parseResultSlice, resultInfo)
}
sort.Stable(parseResultSlice)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有sort的地方加个判断,如果 routine==1,不做这个sort

@redHJ redHJ force-pushed the pdr-6966 branch 2 times, most recently from 8752cdb to 5cb09b8 Compare August 28, 2018 03:20
for resultInfo := range resultChan {
parseResultSlice = append(parseResultSlice, resultInfo)
}
if len(parseResultSlice) > 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个条件判断的不对

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是判断Routine而不是slice,slice表达的是解析的这批数据量大小,单个Routine出来的slice可以很长,但是已经有序

if err != nil {

if parseResult.Err != nil {
log.Debug(parseResult.Err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个删掉吧

se.AddSuccess()

if parseResult.Err != nil {
log.Debug(parseResult.Err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个debug删掉吧

Line: parseInfo.Line,
Index: parseInfo.Index,
}
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

空行还需要传吗?是为了index吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对, DatasourceSkipIndex 需要记录

@wonderflow
Copy link
Contributor

LGTM

@wonderflow wonderflow merged commit b110855 into qiniu:master Aug 28, 2018
@redHJ redHJ deleted the pdr-6966 branch January 24, 2019 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants