-
Notifications
You must be signed in to change notification settings - Fork 20
how to create index for field of sub-struct in thrift #3
Comments
That's exactly how you refer to it: stuff = load ...; Does this not work? Could you post the script you are using and the error you get? |
thrift file struct Company struct Person pig script T1 = LOAD 'data_dir' USING com.twitter.elephanttwin.retrieval.IndexedPigLoader('com.twitter.elephantbird.pig.load.ThriftPigLoader', 'Person', 'index_dir'); if I create index for Person::name, I could using the index in pig and get the correct result. I also want to create index for Person::Company::address, so I modify the source code , and in creating index job I could get the value of Person::Company::address, the partition key is "company.address", but when I use the script above, the pig scans all the blocks instead of the block indexed to find the record I want. I read the pig source code and found the setPartitionFilter method is not invoked, so the index is not used. I use pig 0.8.1 Can you give me some advice? thanks. |
Ah I see what's happening. I think this is a Pig bug -- it needs to push down the filter, but nested relations confuse it. I don't see any reason Elephant-Twin wouldn't be able to support it if Pig can push it. Could you open a Jira with Apache Pig? |
Now I write my own pig loader. In this loader, I add a field for filter expression, and add the expression to the inputformat directly. |
I have a thrift struct
struct A
{
1: string a1,
......
}
struct B
{
1 : int b1,
2: A a,
......
}
If I use pig to load the data file, how can I create the index for a.a1 and how to filter the block by using the statement "a.a1=='1234'"
the data file uses base64 line lzo format.
thanks.
The text was updated successfully, but these errors were encountered: