Skip to content
This repository has been archived by the owner on Nov 30, 2023. It is now read-only.

Unzipping cannot handle language encoding bit properly #379

Closed
bdferris opened this issue Sep 26, 2014 · 2 comments
Closed

Unzipping cannot handle language encoding bit properly #379

bdferris opened this issue Sep 26, 2014 · 2 comments

Comments

@bdferris
Copy link
Contributor

From [email protected] on September 08, 2014 00:54:56

What steps will reproduce the problem? 1. Run the attached gtfs-nonworking.zip against validator -> feedvalidator.exe gtfs-nonworking.zip

What is the expected output?
I expect to see properly generated summary of gtfs files

What do you see instead?
Crash of feed validator What version of the product are you using? On what operating system? transitfeed-windows-binary-1.2.12.zip
Windows 7 Professional 64bit Please provide any additional information below. We have been using GTFS Feed Validator to check the feeds produced by our system. We noticed couple of weeks ago, that validator started to crash with our data and last week I investigated it for a while.

To me it looks like your tool (or python?) might have some issues with unzipping the zip-files.

You can first check the output generated by your tool, it's attached on the text-file. You can reproduce this running your tool against the gtfs-nonworking.zip.

Then again, you can run validator succesfully against gtfs-working.zip. The data in both of these are exactly the same! And if you unzip the nonworking-zipfile and run validator against the generated folder, validator works without problems. If you use windows zipper and zip the folder again and run validator against new zip, it works again without issues.

What's the problem then with our original zip-file? I found out, that if the zip file's general purpose bit states language encoding, zip-file does not work with feed validator.

nonworking:
    general purpose bit flag (0x0808) (bit 15..0):  0000.1000 0000.1000
      file security status  (bit 0):                not encrypted
      extended local header (bit 3):                yes
      UTF-8 names          (bit 11):                yes

working:
    general purpose bit flag (0x0008) (bit 15..0):  0000.0000 0000.1000
      file security status  (bit 0):                not encrypted
      extended local header (bit 3):                yes

Here's snippet of our java code generating gtfs zip-files. Only change when generating those attached zip-files was commenting/uncommenting line where language encoding is set to false.

    public void marshal(OutputStream output, Feed feed) throws IOException {
        ZipArchiveOutputStream zos = new ZipArchiveOutputStream(output);
//        zos.setUseLanguageEncodingFlag(false);

        try {
            zos.putArchiveEntry(new ZipArchiveEntry("stops.txt"));
            writers.getWriter("stops.txt").write(feed.getStops(), zos);
            zos.closeArchiveEntry();
...rest of files...
        } finally {
            zos.flush();
            zos.close();
        }
    }

Would it be possible to fix the handling of language encoding bit on Gtfs Feed Validator side?

Original issue: http://code.google.com/p/googletransitdatafeed/issues/detail?id=379

@bdferris
Copy link
Contributor Author

From [email protected] on September 08, 2014 18:03:20

(No comment was entered for this change.)

Blocking: googletransitdatafeed:344

@bdferris
Copy link
Contributor Author

From [email protected] on September 08, 2014 18:08:03

Fixed in r1876 .

Status: Fixed
Labels: Language-Python Type-Defect App-FeedValidator

@bdferris bdferris added this to the 1.2.13 milestone Oct 5, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant