Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet to CSV convert problem #291

Closed
ahmadhega opened this issue Jun 11, 2020 · 3 comments
Closed

Parquet to CSV convert problem #291

ahmadhega opened this issue Jun 11, 2020 · 3 comments

Comments

@ahmadhega
Copy link

Hi ,
Creating Parquet for CSV file using parquet-go library and reconvert it by the library success for me, but creating parquet by python tool :
import pandas as pd df = pd.read_parquet(“file.csv”) df.to_parquet(“file.parquet”)

trying to convert it to CSV file failed.
To be more specified function (self *ParquetReader) Read(dstInterface interface{}) error failed for me and the error was : "runtime error: index out of range [74] with length 4"

Any idea?

@ahmadhega ahmadhega changed the title Python Parquet convert problem Parquet convert problem Jun 11, 2020
@ahmadhega ahmadhega changed the title Parquet convert problem Parquet to CSV convert problem Jun 11, 2020
@xitongsys
Copy link
Owner

hi, @ahmadhega
Could you provide a sample parquet file ? It's hard to know what happened just from this error.

@ahmadhega
Copy link
Author

@xitongsys sent you by mail ([email protected]) more details.

@xitongsys
Copy link
Owner

hi, @ahmadhega
I used the following code to read your parquet file, no error found. Maybe there are some errors in your codes.

package main

import (
	"log"

	"github.com/xitongsys/parquet-go-source/local"
	"github.com/xitongsys/parquet-go/reader"
)

type Shoes struct {
	Name    *string  `parquet:"name=Name, type=UTF8, encoding=PLAIN_DICTIONARY"`
	Size    *int64   `parquet:"name=Size, type=INT32, encoding=PLAIN"`
}

func main() {
	///read
	fr, err := local.NewLocalFileReader("shoes_orders.parquet")
	if err != nil {
		log.Println("Can't open file")
		return
	}

	pr, err := reader.NewParquetReader(fr, new(Shoes), 1)
	if err != nil {
		log.Println("Can't create parquet reader", err)
		return
	}
	num := int(pr.GetNumRows())
	stus := make([]Shoes, num) //read 10 rows
	if err = pr.Read(&stus); err != nil {
		log.Println("Read error", err)
	}
	log.Println(stus)
	pr.ReadStop()
	fr.Close()

}

zolstein pushed a commit to zolstein/parquet-go that referenced this issue Jun 23, 2023
…itongsys#289)

* refactor packages to use encoding.Values container

* refactor page and dictionary creation to use encoding.Values

* go vet fix

* reduce memory footprint of encoding.Values

* refactor encoding.Encoding to use simple Go types

* port parquet-go package to use pair of values+offsets to represent byte arrays

* add fuzz tests back

* optimize DELTA_LENGTH_BYTE_ARRAY decoding (xitongsys#291)

* optimize DELTA_LENGTH_BYTE_ARRAY decoding

* add link to online documentation

* fix

* add a unit test for decodeByteArrayLengths

* Update encoding/delta/length_byte_array_amd64.s

Co-authored-by: Kevin Burke <[email protected]>

* optimize DELTA_LENGTH_BYTE_ARRAY encoding (xitongsys#292)

Co-authored-by: Kevin Burke <[email protected]>

* account for size of offsets buffer when benchmarking throughput

* optimize DELTA_BYTE_ARRAY decoding (xitongsys#294)

* PR feedback

Co-authored-by: Kevin Burke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants