exportFlattenedVector does not support nested encoding and non-scalar types #9821

rui-mo · 2024-05-15T06:41:52Z

Description

In 9830814, flattenDictionary and flattenConstant are set as true for Parquet write, which relies on Bridge to convert Velox vector as Arrow array. When VectorFuzzer generates nested dictionary-encoded vector or non-scalar types, exporting to Arrow fails at below checks.

velox/velox/vector/arrow/Bridge.cpp

Lines 884 to 889 in dc561a3

    
           VELOX_CHECK( 
        
               vec.valueVector() == nullptr || vec.wrappedVector()->isFlatEncoding(), 
        
               "An unsupported nested encoding was found."); 
        
           VELOX_CHECK(vec.isScalar(), "Flattening is only supported for scalar types."); 
        
           VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH( 
        
               flattenAndExport, vec.typeKind(), vec, rows, options, out, pool, holder);

The text was updated successfully, but these errors were encountered:

mbasmanova · 2024-05-15T14:04:04Z

CC: @Yuhta

@rui-mo Does this imply that ParquetWriter cannot create files for tables with columns of type array/map/struct?

rui-mo · 2024-05-16T06:24:30Z

@mbasmanova I think only when the vector is dictionary-encoded, we cannot create Parquet for tables with complex types. If not, they are supported as below in Bridge.

velox/velox/vector/arrow/Bridge.cpp

Lines 985 to 1001 in e2c0014

    
           case VectorEncoding::Simple::ROW: 
        
             exportRows( 
        
                 *vec.asUnchecked<RowVector>(), rows, options, out, pool, *holder); 
        
             break; 
        
           case VectorEncoding::Simple::ARRAY: 
        
             exportArrays( 
        
                 *vec.asUnchecked<ArrayVector>(), rows, options, out, pool, *holder); 
        
             break; 
        
           case VectorEncoding::Simple::MAP: 
        
             exportMaps( 
        
                 *vec.asUnchecked<MapVector>(), rows, options, out, pool, *holder); 
        
             break; 
        
           case VectorEncoding::Simple::DICTIONARY: 
        
             options.flattenDictionary 
        
                 ? exportFlattenedVector(vec, rows, options, out, pool, *holder) 
        
                 : exportDictionary(vec, rows, options, out, pool, *holder); 
        
             break;

mbasmanova · 2024-05-16T09:49:52Z

@rui-mo If Parquet writer can handle all types, but only flat encodings, then we can simply flatten data before writing to Parquet in the Fuzzer.

rui-mo · 2024-05-16T09:54:55Z

@mbasmanova Got it. I will try as you suggested. Thanks.

rui-mo · 2024-07-02T12:51:41Z

This issue could be resolved by flattening data before writing into Parquet.

rui-mo added the enhancement New feature or request label May 15, 2024

rui-mo mentioned this issue May 15, 2024

build: Enable Spark query runner as reference in aggregation fuzzer test #9559

Open

rui-mo closed this as completed Jul 2, 2024

rui-mo mentioned this issue Jul 4, 2024

An unsupported nested encoding was found. #10397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exportFlattenedVector does not support nested encoding and non-scalar types #9821

exportFlattenedVector does not support nested encoding and non-scalar types #9821

rui-mo commented May 15, 2024

mbasmanova commented May 15, 2024

rui-mo commented May 16, 2024

mbasmanova commented May 16, 2024

rui-mo commented May 16, 2024

rui-mo commented Jul 2, 2024

exportFlattenedVector does not support nested encoding and non-scalar types #9821

exportFlattenedVector does not support nested encoding and non-scalar types #9821

Comments

rui-mo commented May 15, 2024

Description

mbasmanova commented May 15, 2024

rui-mo commented May 16, 2024

mbasmanova commented May 16, 2024

rui-mo commented May 16, 2024

rui-mo commented Jul 2, 2024