Opening GCS files via `smart_open.open` fails to add the `name` field to the file object #505

todor-markov · 2020-06-04T23:36:28Z

Problem description

Currently the gcs.open function returns a Reader or Writer object, but does not add a name field to the object. This creates a problem when trying to open compressed files in GCS, since smart open's compression wrapper relies on the name field to choose the decompression algorithm. Thankfully, the problem is relatively straightforward to fix by slightly updating the open function in gcs.py:

Here is the current version of gcs.open:

if mode == constants.READ_BINARY:
        return Reader(
            bucket_id,
            blob_id,
            buffer_size=buffer_size,
            line_terminator=constants.BINARY_NEWLINE,
            client=client,
        )
    elif mode == constants.WRITE_BINARY:
        return Writer(
            bucket_id,
            blob_id,
            min_part_size=min_part_size,
            client=client,
        )
    else:
        raise NotImplementedError('GCS support for mode %r not implemented' % mode)

Here is the updated version that would fix the issue:

if mode == constants.READ_BINARY:
        fileobj = Reader(
            bucket_id,
            blob_id,
            buffer_size=buffer_size,
            line_terminator=constants.BINARY_NEWLINE,
            client=client,
        )
    elif mode == constants.WRITE_BINARY:
        fileobj = Writer(
            bucket_id,
            blob_id,
            min_part_size=min_part_size,
            client=client,
        )
    else:
        raise NotImplementedError('GCS support for mode %r not implemented' % mode)

    fileobj.name = blob_id
    return fileobj

I tried to push a branch & open a pull request that makes the fix, but it looks like I don't have access rights, so I made this issue instead.

The text was updated successfully, but these errors were encountered:

mpenkov · 2020-06-05T01:40:58Z

Thank you for reporting this.

I tried to push a branch & open a pull request that makes the fix, but it looks like I don't have access rights, so I made this issue instead.

I think you may be pushing to the wrong location. In github, the way to do what you want to do is:

Fork the repo, so you have a personal copy of the repository under your account.
Push the branch to your fork. Because the fork belongs to you, there are no permission issues.
Make a PR
PROFIT

For more, see this link or Google around.

Do you want to go ahead and try making a PR by yourself? If you haven't done this kind of thing before, it's a great opportunity to start.

petedannemann · 2020-06-05T17:14:40Z

Looks like azure suffers from this as well. I didn't look at other transport mechanisms but hopefully they all have them. What is odd is that our compress / decompress tests don't seem to catch this

mpenkov · 2020-06-06T01:56:00Z

Yeah, sounds like we have a hole in our test coverage.

@petedannemann Are you able to look into it? If not, I will make a separate ticket and deal with it in time.

todor-markov mentioned this issue Jun 5, 2020

Fix bug where opening files in GCS forgets to define a name field #506

Merged

5 tasks

mpenkov closed this as completed in #506 Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening GCS files via `smart_open.open` fails to add the `name` field to the file object #505

Opening GCS files via `smart_open.open` fails to add the `name` field to the file object #505

todor-markov commented Jun 4, 2020 •

edited

Loading

mpenkov commented Jun 5, 2020

petedannemann commented Jun 5, 2020

mpenkov commented Jun 6, 2020

Opening GCS files via smart_open.open fails to add the name field to the file object #505

Opening GCS files via smart_open.open fails to add the name field to the file object #505

Comments

todor-markov commented Jun 4, 2020 • edited Loading

Problem description

mpenkov commented Jun 5, 2020

petedannemann commented Jun 5, 2020

mpenkov commented Jun 6, 2020

Opening GCS files via `smart_open.open` fails to add the `name` field to the file object #505

Opening GCS files via `smart_open.open` fails to add the `name` field to the file object #505

todor-markov commented Jun 4, 2020 •

edited

Loading