decompressing large zip files

Hi there Once in a while, I get large zip files from clients. Since they are running Windows, their zip tool doesn't have the same 4GB limitation that the unzip command-line tool under Linux usually has. Today, I once again had to deal with a large file and had some time to look into a solution that wouldn't involve copying the file onto a Windows box for converting the archive. If you have p7zip-full installed, you can use: 7z x file.zip If you have Java installed, you can use: jar xf file.zip Found here: https://serverfault.com/questions/235139/how-to-unzip-files-bigger-than-4gb Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Wed, 22 May 2019 09:29:55 +1200, Peter Reutemann wrote:
Once in a while, I get large zip files from clients. Since they are running Windows, their zip tool doesn't have the same 4GB limitation that the unzip command-line tool under Linux usually has.
I just tried zipping and unzipping this file -rw-r--r-- 1 ldo users 7769948160 Dec 10 2015 new/distros/CentOS-7-x86_64-Everything-1511.iso which as you can see is over 7GB in size, using the standard Debian “zip” and “unzip” commands. Worked just fine. (For what it’s worth, the zip archive compressed the file down by just 3.4%.)

Once in a while, I get large zip files from clients. Since they are running Windows, their zip tool doesn't have the same 4GB limitation that the unzip command-line tool under Linux usually has.
I just tried zipping and unzipping this file
-rw-r--r-- 1 ldo users 7769948160 Dec 10 2015 new/distros/CentOS-7-x86_64-Everything-1511.iso
which as you can see is over 7GB in size, using the standard Debian “zip” and “unzip” commands. Worked just fine.
(For what it’s worth, the zip archive compressed the file down by just 3.4%.)
unzip -q file.zip warning [file.zip]: 810575875 extra bytes at beginning or within zipfile (attempting to process anyway) error [file.zip]: start of central directory not found; zipfile corrupt. (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly) The file size is 4.8GB (5106462267 bytes). Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Wed, 22 May 2019 10:19:58 +1200, Peter Reutemann wrote:
unzip -q file.zip warning [file.zip]: 810575875 extra bytes at beginning or within zipfile (attempting to process anyway) error [file.zip]: start of central directory not found; zipfile corrupt. (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)
This is the version info for my zip and unzip executables. Note the “ZIP64_SUPPORT” flag: ---- ldo(a)theon:~> zip -v Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. This is Zip 3.0 (July 5th 2008), by Info-ZIP. Currently maintained by E. Gordon. Please send bug reports to the authors using the web page at www.info-zip.org; see README for details. Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip, as of above date; see http://www.info-zip.org/ for other sites. Compiled with gcc 6.3.0 20170221 for Unix (Linux ELF). Zip special compilation options: USE_EF_UT_TIME (store Universal Time) BZIP2_SUPPORT (bzip2 library version 1.0.6, 6-Sept-2010) bzip2 code and library copyright (c) Julian R Seward (See the bzip2 license for terms of use) SYMLINK_SUPPORT (symbolic links supported) LARGE_FILE_SUPPORT (can read and write large files on file system) ZIP64_SUPPORT (use Zip64 to store large files in archives) UNICODE_SUPPORT (store and read UTF-8 Unicode paths) STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field) UIDGID_NOT_16BIT (old Unix 16-bit UID/GID extra field not used) [encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3) Encryption notice: The encryption code of this program is not copyrighted and is put in the public domain. It was originally written in Europe and, to the best of our knowledge, can be freely distributed in both source and object forms from any country, including the USA under License Exception TSU of the U.S. Export Administration Regulations (section 740.13(e)) of 6 June 2002. Zip environment options: ZIP: [none] ZIPOPT: [none] ldo(a)theon:~> unzip -v UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP. Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ; see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites. Compiled with gcc 8.2.1 20190204 for Unix (Linux ELF). UnZip special compilation options: ACORN_FTYPE_NFS COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported) SET_DIR_ATTRIB SYMLINKS (symbolic links supported, if RTL and file system permit) TIMESTAMP UNIXBACKUP USE_EF_UT_TIME USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported) USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported) UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths) LARGE_FILE_SUPPORT (large files over 2 GiB supported) ZIP64_SUPPORT (archives using Zip64 for large files supported) USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.6, 6-Sept-2010) VMS_TEXT_CONV WILD_STOP_AT_DIR [decryption, version 2.11 of 05 Jan 2007] UnZip and ZipInfo environment options: UNZIP: [none] UNZIPOPT: [none] ZIPINFO: [none] ZIPINFOOPT: [none] ---- Also, info about my installed “zip” and “unzip” packages: in zip amd64 3.0-11 Archiver for .zip files This is InfoZIP in unzip amd64 6.0-22 De-archiver for .zip files InfoZIP's un

Here are my versions (details below): dpkg-query --show zip zip 3.0-11build1 dpkg-query --show unzip unzip 6.0-21ubuntu1 Both are listed with ZIP64_SUPPORT but still can't handle large ZIP files generated on Windows. Cheers, Peter ==== zip ==== zip -v Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. This is Zip 3.0 (July 5th 2008), by Info-ZIP. Currently maintained by E. Gordon. Please send bug reports to the authors using the web page at www.info-zip.org; see README for details. Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip, as of above date; see http://www.info-zip.org/ for other sites. Compiled with gcc 6.3.0 20170415 for Unix (Linux ELF). Zip special compilation options: USE_EF_UT_TIME (store Universal Time) BZIP2_SUPPORT (bzip2 library version 1.0.6, 6-Sept-2010) bzip2 code and library copyright (c) Julian R Seward (See the bzip2 license for terms of use) SYMLINK_SUPPORT (symbolic links supported) LARGE_FILE_SUPPORT (can read and write large files on file system) ZIP64_SUPPORT (use Zip64 to store large files in archives) UNICODE_SUPPORT (store and read UTF-8 Unicode paths) STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field) UIDGID_NOT_16BIT (old Unix 16-bit UID/GID extra field not used) [encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3) Encryption notice: The encryption code of this program is not copyrighted and is put in the public domain. It was originally written in Europe and, to the best of our knowledge, can be freely distributed in both source and object forms from any country, including the USA under License Exception TSU of the U.S. Export Administration Regulations (section 740.13(e)) of 6 June 2002. Zip environment options: ZIP: [none] ZIPOPT: [none] ==== unzip ==== unzip -v UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP. Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ; see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites. Compiled with gcc 6.3.0 20170415 for Unix (Linux ELF). UnZip special compilation options: ACORN_FTYPE_NFS COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported) SET_DIR_ATTRIB SYMLINKS (symbolic links supported, if RTL and file system permit) TIMESTAMP UNIXBACKUP USE_EF_UT_TIME USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported) USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported) UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths) LARGE_FILE_SUPPORT (large files over 2 GiB supported) ZIP64_SUPPORT (archives using Zip64 for large files supported) USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.6, 6-Sept-2010) VMS_TEXT_CONV WILD_STOP_AT_DIR [decryption, version 2.11 of 05 Jan 2007] UnZip and ZipInfo environment options: UNZIP: [none] UNZIPOPT: [none] ZIPINFO: [none] ZIPINFOOPT: [none] -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Wed, 22 May 2019 11:38:46 +1200, Peter Reutemann wrote:
Both are listed with ZIP64_SUPPORT but still can't handle large ZIP files generated on Windows.
Are they using a different format for encoding large files, then? Are you able to send a Windows user a large file?

Both are listed with ZIP64_SUPPORT but still can't handle large ZIP files generated on Windows.
Are they using a different format for encoding large files, then?
Don't know. Not sure whether WinZIP, 7zip or built-in Windows zip utility is used to generate these files.
Are you able to send a Windows user a large file?
It's usually the other way round, i.e., I receive large data dumps from the client (raw data for building models, SQL dumps, etc). The jar and 7z utilities solve that problem for me. Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

The jar and 7z utilities solve that problem for me.
Another handy one you might like to try is unar. That will decode just about any format. Maybe it will handle these zips too.
Only partially: unar -q file.zip Archive parsing failed! (Unknown error.) Ended up with 4.1 instead of 4.8GB of data decompressed. Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

Only partially:
unar -q file.zip Archive parsing failed! (Unknown error.)
Ended up with 4.1 instead of 4.8GB of data decompressed.
Hmm, pity ...
Yeah, looked really promising... But I'll keep that utility in mind. :-) Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

Only partially:
unar -q file.zip Archive parsing failed! (Unknown error.)
Ended up with 4.1 instead of 4.8GB of data decompressed.
Are you using a 32 bit copy of zip and/or linux per-chance? At $dayjob we encountered this quite a while ago and if I recall correctly we had a 32bit version of zip and/or Linux (I'm not convinced this is the answer, as you've already mentioned that you have a copy with ZIP64_SUPPORT) Cheers, Warren.

Only partially:
unar -q file.zip Archive parsing failed! (Unknown error.)
Ended up with 4.1 instead of 4.8GB of data decompressed.
Are you using a 32 bit copy of zip and/or linux per-chance?
At $dayjob we encountered this quite a while ago and if I recall correctly we had a 32bit version of zip and/or Linux
(I'm not convinced this is the answer, as you've already mentioned that you have a copy with ZIP64_SUPPORT)
Haven't been running a 32-bit distro for a very long time. :-) And jar/7z utilities handle the large file flawlessly. Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Wed, 22 May 2019 14:59:10 +1200, Peter Reutemann wrote:
Haven't been running a 32-bit distro for a very long time. :-)
I doubt that that would matter, anyway. #include <sys/types.h> and all that.
And jar/7z utilities handle the large file flawlessly.
Looking for information on how WinZip (is that what they were using?) handles large files, I found this <http://kb.winzip.com/kb/entry/99/>, which quite clearly mentions the Zip64 extension. Unless there is some quirk about using that extension in a way that is not recognized by the Linux utilities (not setting the right status flag?) ...
participants (3)
-
Lawrence D'Oliveiro
-
Peter Reutemann
-
Warren Boyd