- Issue #6, wrong bitshift in RLE Reader's ReadIntOnBytes due to syntax misinterpretation
- Missing method
Schema::GetNotEqualsMessage()
and exception/error message on appending to a parquet file with schema mismatch (#3)
- Improved return type annotation for
Schema::GetDataFields()
(#1) - Added PHP 8.1 to Github Actions workflow (Unit Tests)
- Removed duplicate Github Actions workflow
- Changed namespace to
codename\parquet
- Supports for more data types: Int16 (Short), UInt16 (UShort), UInt32, UInt64 (as far as reasonable)
- Some micro-optimizations (e.g. in StringDataTypeHandler)
- Port of parquet-dotnet issues/PRs #22 #23 (schema decoding bug, backward compat issue)
- Skip some tests if ext-snappy unavailable
- Benchmark: enable snappy compression benchmark if respective extension available/loaded
- parquet-dotnet PR #96 port - Fix reading of plain dictionary with zero length
- parquet-dotnet PR #95 port - Empty page handling
- Replaced dependency "apache/thrift" by "packaged/thrift" to support composer installations under PHP8 without using overrides and implicitly fix some incompatibilities.
- Changed root schema name to
php_parquet_schema
to comply with Avro specs (https://avro.apache.org/docs/current/spec.html#names) and avoid Hadoop failing to read files (see parquet-dotnet issue #93) - Allow dots in field names
- Provide a greater memory_limit (512M) in phpunit.xml defaults for unit testing
- Fixed return type error during Schema Tree traversal when using StructField
- EndToEndTypeTests array evaluation/comparison
- Tests for 'bad data'
- Invalid null count determination (parquet-dotnet issue #87) in DataColumnReader
- ByteArrayDataTypeHandler now really handles byte arrays as arrays of 'bytes' (int arrays in PHP)
- ByteDataTypeHandler improved/fixed according to protocol/specs (single unsigned byte)
- ByteArray datatype-dependent matching
- PHP 8.0 test pipeline / support
- Test for a pending issue from parquet-dotnet (aloneguid/parquet-dotnet#87) (marked as incomplete as it fails, pending verification)
- dropped requirement of 'ext-snappy' and made it a suggestion, as it is only required for using snappy compression.
- dropped requirement of 'mdurrant/php-binary-reader', as it has been replaced with a custom implementation
- (Optional) support for calculating statistics (distinct/unique, null, min and max values per column chunk). NOTE: this has a negative performance impact, but improves cross-implementation compatibility. This has to be enabled via ParquetOptions object: ->CalculateStatistics = true; As these fields are mostly optional, they're skipped in favor of performance, by default. Nulls are calculated anyways.
- Quirks Tests for tracking PHP-specific specialties
- Test for parquet-dotnet issue #81 (passing in our package!)
- Test for multi_page_dictionary_with_nulls
- Ported DateTypeHandlerInterface's plainEncode() & plainDecode() for usage with statistics calculations
- Automatically return DateTimeDataField when specifying DataField::createFromFormat('xyz', 'DateTimeImmutable') (or DateTimeImmutable::class)
- Minor performance optimizations in ThriftFooter
- enabled RLE-encoding handling in DataColumnReader::readColumn (Tests needed!)
- method signature added/changed: DataTypeHandlerInterface::write(SchemaElement $tse, BinaryWriter $writer, array $values, DataColumnStatistics $statistics)
- Changed some EndToEndTypeTests for accomodate to floating point fluctuations
- Issue with some DateTime conversions having discrepancies of one or more days due to non-float-to-int casting in NanoTime