Hacking GitHub Large File Storage (LFS)
Jorge Martínez Garrido
September 5, 2023
Today, I had to work with various FITS files. These files can be very large in size. Even if I knew this, I decided to set up a repository for tracking my work.
The only measure that I took was to compress all of them inside a
fits.tar.gz
file. Then, I added this file using git lfs
and pushed the commit to GitHub. Here is the result:
Why previous limit?
GitHub imposes a limit on:
- File limit size without using LFS: 100 Mb
- File limit size using LFS: 5Gb (max allowed by GitHub Enterprise)
Read more information about these limits in the official GitHub documentation.
A solution to previous situation
The solution to previous situation is to split the file you want to push in several ones whose maximum size is 100Mb.
To do so, you can use the split
command:
split --bytes 100M -d -a <N> <file> <prefix>
The options in previous command stand for:
--bytes
sets the size bytes per output file-d
sets digits for identifying the different part files-a
sets suffixes of length N
The arguments in previous command are:
<N>
is the desired lenght of numeric places<file>
is the name of the file to be splitted<prefix>
is the desired prefix name used in the generated part files
Example
Having all the FITS files in a fits.tar.gz
compressed file, execute:
split --bytes 100M -d -a 2 fits.tar.gz "fits_part_"
Previous command generated the following files:
.
├── fits_part_00
├── fits_part_01
├── fits_part_02
├── fits_part_03
├── ...
└── fits_part_29
None of the previous files exceeds the 100Mb limit. You can know stage and commit previous files. GitHub will not complain.
Limitations of this method
Note that you are no longer uploading a single file but a set of files. If you wish to have a single file, you need to implement some preprocessing to concatenate these files after fetching them:
cat fits_part_* > fits.tar.gz && tar xvf fits.tar.gz
This is not ideal but at least you can track now large files in GitHub if you do not want to upgrade your plan.