Parquet to CSV convert problem #291

ahmadhega · 2020-06-11T08:28:30Z

Hi ,
Creating Parquet for CSV file using parquet-go library and reconvert it by the library success for me, but creating parquet by python tool :
import pandas as pd df = pd.read_parquet(“file.csv”) df.to_parquet(“file.parquet”)

trying to convert it to CSV file failed.
To be more specified function (self *ParquetReader) Read(dstInterface interface{}) error failed for me and the error was : "runtime error: index out of range [74] with length 4"

Any idea?

The text was updated successfully, but these errors were encountered:

xitongsys · 2020-06-15T23:49:31Z

hi, @ahmadhega
Could you provide a sample parquet file ? It's hard to know what happened just from this error.

ahmadhega · 2020-06-24T06:52:26Z

@xitongsys sent you by mail ([email protected]) more details.

xitongsys · 2020-06-26T12:19:44Z

hi, @ahmadhega
I used the following code to read your parquet file, no error found. Maybe there are some errors in your codes.

package main

import (
	"log"

	"github.com/xitongsys/parquet-go-source/local"
	"github.com/xitongsys/parquet-go/reader"
)

type Shoes struct {
	Name    *string  `parquet:"name=Name, type=UTF8, encoding=PLAIN_DICTIONARY"`
	Size    *int64   `parquet:"name=Size, type=INT32, encoding=PLAIN"`
}

func main() {
	///read
	fr, err := local.NewLocalFileReader("shoes_orders.parquet")
	if err != nil {
		log.Println("Can't open file")
		return
	}

	pr, err := reader.NewParquetReader(fr, new(Shoes), 1)
	if err != nil {
		log.Println("Can't create parquet reader", err)
		return
	}
	num := int(pr.GetNumRows())
	stus := make([]Shoes, num) //read 10 rows
	if err = pr.Read(&stus); err != nil {
		log.Println("Read error", err)
	}
	log.Println(stus)
	pr.ReadStop()
	fr.Close()

}

…itongsys#289) * refactor packages to use encoding.Values container * refactor page and dictionary creation to use encoding.Values * go vet fix * reduce memory footprint of encoding.Values * refactor encoding.Encoding to use simple Go types * port parquet-go package to use pair of values+offsets to represent byte arrays * add fuzz tests back * optimize DELTA_LENGTH_BYTE_ARRAY decoding (xitongsys#291) * optimize DELTA_LENGTH_BYTE_ARRAY decoding * add link to online documentation * fix * add a unit test for decodeByteArrayLengths * Update encoding/delta/length_byte_array_amd64.s Co-authored-by: Kevin Burke <[email protected]> * optimize DELTA_LENGTH_BYTE_ARRAY encoding (xitongsys#292) Co-authored-by: Kevin Burke <[email protected]> * account for size of offsets buffer when benchmarking throughput * optimize DELTA_BYTE_ARRAY decoding (xitongsys#294) * PR feedback Co-authored-by: Kevin Burke <[email protected]>

ahmadhega changed the title ~~Python Parquet convert problem~~ Parquet convert problem Jun 11, 2020

ahmadhega changed the title ~~Parquet convert problem~~ Parquet to CSV convert problem Jun 11, 2020

xitongsys closed this as completed Aug 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet to CSV convert problem #291

Parquet to CSV convert problem #291

ahmadhega commented Jun 11, 2020

xitongsys commented Jun 15, 2020

ahmadhega commented Jun 24, 2020

xitongsys commented Jun 26, 2020

Parquet to CSV convert problem #291

Parquet to CSV convert problem #291

Comments

ahmadhega commented Jun 11, 2020

xitongsys commented Jun 15, 2020

ahmadhega commented Jun 24, 2020

xitongsys commented Jun 26, 2020