Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch: Allow uploading files > 2G https://bugs.php.net/bug.php?id=44522 #372

Merged
merged 5 commits into from
Sep 17, 2013
Merged

Patch: Allow uploading files > 2G https://bugs.php.net/bug.php?id=44522 #372

merged 5 commits into from
Sep 17, 2013

Conversation

ralflang
Copy link
Contributor

Patch: Allow uploading files > 2G https://bugs.php.net/bug.php?id=44522

This is essentially the same as the patch
"uploads_larger_than_2g_HEAD_v2 (last revision 2012-03-26 03:59 UTC) by
jason at infininull dot com)" but using off_t instead of signed long
(originally: uint)

I tested this on 64bit linux and succeeded uploading a file of 4.8 G.
The File did not get corrupted or truncated in any way.

I did not yet test this under windows or 32 bit linux

Note that there are still limitations:

  • Did not test for files > 8 G
  • php does not yet reject absurdly high values
  • Still limited by underlying file system specific limits and free space
  • in upload
  • tmp dir and destination dir

ralflang added 2 commits June 28, 2013 08:32
above 2G.

This is essentially the same as the patch
"uploads_larger_than_2g_HEAD_v2 (last revision 2012-03-26 03:59 UTC) by
jason at infininull dot com)" but using off_t instead of signed long
(originally: uint)

I tested this on 64bit linux and succeeded uploading a file of 4.8 G.
The File did not get corrupted or truncated in any way.

I did not yet test this under windows or 32 bit linux

Note that there are still limitations:

* Did not test for files > 8 G
* php does not yet reject absurdly high values
* Still limited by underlying file system specific limits and free space
* in upload
* tmp dir and destination dir
int boundary_len = 0, total_bytes = 0, cancel_upload = 0, is_arr_upload = 0, array_len = 0;
int max_file_size = 0, skip_upload = 0, anonindex = 0, is_anonymous;
int boundary_len = 0, cancel_upload = 0, is_arr_upload = 0, array_len = 0;
off_t total_bytes = 0, max_file_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_file_size is later (around line 900) set using

                if (!strcasecmp(param, "MAX_FILE_SIZE")) {
                    max_file_size = atol(value);
                }

this might cause issues on windows where sizeof(long) always equals 4, even on 64 bit systems. In case off_t is a 64 bit integer (or unsized 32 bit) this might lead to unexpected behavior.

Also later, around line 1026 I see

if (PG(upload_max_filesize) > 0 && (long)(total_bytes+blen) > PG(upload_max_filesize)) {
#if DEBUG_FILE_UPLOAD
                    sapi_module.sapi_error(E_NOTICE, "upload_max_filesize of %ld bytes exceeded - file [%s=%s] not saved", PG(upload_max_filesize), param, filename);
#endif
                    cancel_upload = UPLOAD_ERROR_A;
                } else if (max_file_size && ((long)(total_bytes+blen) > max_file_size)) {
#if DEBUG_FILE_UPLOAD
                    sapi_module.sapi_error(E_NOTICE, "MAX_FILE_SIZE of %ld bytes exceeded - file [%s=%s] not saved", max_file_size, param, filename);
#endif
                    cancel_upload = UPLOAD_ERROR_B;
                } else if (blen > 0) {

those casts to long seem to be wrong after this patch, probably casting to off_t is better %ld should be double checked (while that's debug code only)

Again for Windows: Line 1222 or so reads

                  file_size.value.lval = total_bytes;

lval is a long. long on Windows always is 32bit. Probably we need (unprecise) double or string representation for files with total_bytes > maxint

This is simpler if off_t on Win64 is 32bits, too, which I don't know :-)

@weltling
Copy link
Contributor

weltling commented Jul 8, 2013

off_t is long, so 4 bytes in both x64 and x86 windows. On 32 bit linux as they say it's 'signed integer'. That all means on windows and 32 bit linux there's no benefit from this patch, true 64 bit only systems would benefit. I think double instead of off_t could be used on non 64 bit systems to make it profitable even there.

@ralflang
Copy link
Contributor Author

ralflang commented Jul 8, 2013

The off_t was advised by [email protected] on the php-dev internals list - to be used together with #define _FILE_OFFSET_BITS 64 which I haven't done yet. And I need to address Johannes' points..

Double has a wide range on all platforms but it's not a precise, discrete integer type. I'm not sure a floating point type is appropriate here and like to hear other opinions. I think I'm best off with an integer type that is 64 bit on all platforms.

@johannes
Copy link
Member

johannes commented Jul 8, 2013

Be careful with _FILE_OFFSET_BITS 64 - as there is an off_t in a global header this would have to affect all of PHP, which might cause trouble with external libs we're linking too. That can of worms made us not add the large file support.

@ralflang
Copy link
Contributor Author

ralflang commented Jul 8, 2013

I've read a bit about windows type lengths and it's probably best to define an own type filesize_t which equals unsigned long long in linux and whatever equivalent we can get on both 32 bit and 64 bit windows (i'm not a windows dev but http://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.80%29.aspx says their long long is also 64 bit). This could avoid _File_OFFSET_BITS.

@weltling
Copy link
Contributor

weltling commented Jul 8, 2013

With 64 bit integer on 32 bit platform there were possible overflow in returning the file size to PHP, even returning it as string. double is good enough as soon as it's ieee 754 complaint, so has 52 bit integer part. And it can be returned to php without overflow. Or how would you do it?

@johannes
Copy link
Member

johannes commented Jul 8, 2013

For exporting a large integer to PHP use a string. Using double's
imprecision might cause issues with further file processing. For fixed
64bit types look at int64_t from C99 which we provide via
win32/php_stdint.h and the PHP_CHECKTYPES config.m4 macro.

@weltling
Copy link
Contributor

weltling commented Jul 8, 2013

That's true. Plus minus one byte would corrupt a file. But what's the issue using just the integer part, like double d = .0; int i = 42; d += (double)i; ... 42 were converted to 42.0 even without explicit cast, or where is the pitfall?

@ralflang
Copy link
Contributor Author

I settled for int64_t - sorry for the delay but I was a little alien to windows build at first.
You seem to define an uint64_t too, but only for windows? Actually signed file sizes don't make too much sense but it seems like glibc does not have an atoull either. strtoull seems an option but I did not want to introduce too much change at once. 63bits + signed is probably enough for the next decade or so.

@weltling
Copy link
Contributor

Yep, that should already compile with VC. I'd however rewrite the check a bit more precise

#if defined(PHP_WIN32) && !defined(HAVE_ATOLL)

like that, to prevent possible future conflicts. You can ping me again when i can test on windows.

@weltling
Copy link
Contributor

or, as that's an isolated case, just do that in place, like

#ifdef PHP_WIN32
.... do __atoi64
#else
.... do atoll
#endif

@ralflang
Copy link
Contributor Author

I've setup a win7/VC11 build and it built. I'm currently reading/searching how to set up a windows test case with this build.

m6w6 added a commit to m6w6/php-src that referenced this pull request Aug 5, 2013
@pierrejoye
Copy link
Contributor

I do not really understand why this patch gets merged already. Not that it is not necessary :) but I do not think it is ready enough to get into master. There are a couple of areas that needs more work (userland side, handling for large filesize, only meta infos f.e.). As we discussed on IRC, it could way easier to get it via the 64bit support patch/RFC.

What was the reasoning

@m6w6
Copy link
Contributor

m6w6 commented Aug 16, 2013

It was ridiculous to not support 64bit upload size on platforms where long is large enough. The int64 change is a lot more work. I see that it is not perfect yet, but small steps are better than no progress at all. Actually, using int64_t instead of long would not be necessary, so I can change it back if you don't like it!

@ralflang
Copy link
Contributor Author

We had all this type changing back and forth (long, size_t, int64_t) in the patch to make sure it behaves consistently in linux and windows, 32 and 64 bit platforms.

signed long in 32 bit systems is only 32 bit, roughly 2G - not enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants