how to create index for field of sub-struct in thrift #3

packageyao · 2012-10-31T10:50:17Z

I have a thrift struct

struct A
{
1: string a1,
......
}

struct B
{
1 : int b1,
2: A a,
......
}

If I use pig to load the data file, how can I create the index for a.a1 and how to filter the block by using the statement "a.a1=='1234'"

the data file uses base64 line lzo format.

thanks.

dvryaboy · 2012-10-31T18:40:18Z

That's exactly how you refer to it:

stuff = load ...;
filtered_stuff = filter stuff by a.a1 == '1234';

Does this not work? Could you post the script you are using and the error you get?
Could you also post the result of running "describe" on the relation you are trying to filter?

packageyao · 2012-11-01T04:28:17Z

thrift file

struct Company
{
1:required string Id,
2:required string name,
3:required string address,
4:required string tele,
}

struct Person
{
1:required string ID,
2:required string name,
3:required byte age,
4:required Company company,
5:required string phone,
}

pig script

T1 = LOAD 'data_dir' USING com.twitter.elephanttwin.retrieval.IndexedPigLoader('com.twitter.elephantbird.pig.load.ThriftPigLoader', 'Person', 'index_dir');
T2 = FILTER T1 BY company.address=='address_12';
DUMP T2;

if I create index for Person::name, I could using the index in pig and get the correct result.

I also want to create index for Person::Company::address, so I modify the source code , and in creating index job I could get the value of Person::Company::address, the partition key is "company.address", but when I use the script above, the pig scans all the blocks instead of the block indexed to find the record I want.

I read the pig source code and found the setPartitionFilter method is not invoked, so the index is not used.

I use pig 0.8.1

Can you give me some advice? thanks.

dvryaboy · 2012-11-02T00:09:41Z

Ah I see what's happening. I think this is a Pig bug -- it needs to push down the filter, but nested relations confuse it. I don't see any reason Elephant-Twin wouldn't be able to support it if Pig can push it. Could you open a Jira with Apache Pig?

packageyao · 2012-11-02T07:56:33Z

Now I write my own pig loader. In this loader, I add a field for filter expression, and add the expression to the inputformat directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to create index for field of sub-struct in thrift #3

how to create index for field of sub-struct in thrift #3

packageyao commented Oct 31, 2012

dvryaboy commented Oct 31, 2012

packageyao commented Nov 1, 2012

dvryaboy commented Nov 2, 2012

packageyao commented Nov 2, 2012

how to create index for field of sub-struct in thrift #3

how to create index for field of sub-struct in thrift #3

Comments

packageyao commented Oct 31, 2012

dvryaboy commented Oct 31, 2012

packageyao commented Nov 1, 2012

dvryaboy commented Nov 2, 2012

packageyao commented Nov 2, 2012