You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the gcs.open function returns a Reader or Writer object, but does not add a name field to the object. This creates a problem when trying to open compressed files in GCS, since smart open's compression wrapper relies on the name field to choose the decompression algorithm. Thankfully, the problem is relatively straightforward to fix by slightly updating the open function in gcs.py:
Here is the current version of gcs.open:
if mode == constants.READ_BINARY:
return Reader(
bucket_id,
blob_id,
buffer_size=buffer_size,
line_terminator=constants.BINARY_NEWLINE,
client=client,
)
elif mode == constants.WRITE_BINARY:
return Writer(
bucket_id,
blob_id,
min_part_size=min_part_size,
client=client,
)
else:
raise NotImplementedError('GCS support for mode %r not implemented' % mode)
Here is the updated version that would fix the issue:
Looks like azure suffers from this as well. I didn't look at other transport mechanisms but hopefully they all have them. What is odd is that our compress / decompress tests don't seem to catch this
Problem description
Currently the
gcs.open
function returns a Reader or Writer object, but does not add aname
field to the object. This creates a problem when trying to open compressed files in GCS, since smart open's compression wrapper relies on thename
field to choose the decompression algorithm. Thankfully, the problem is relatively straightforward to fix by slightly updating theopen
function ingcs.py
:Here is the current version of
gcs.open
:Here is the updated version that would fix the issue:
I tried to push a branch & open a pull request that makes the fix, but it looks like I don't have access rights, so I made this issue instead.
The text was updated successfully, but these errors were encountered: