Jorge Martinez

Aerospace Engineer and Senior Software Developer


Hacking GitHub Large File Storage (LFS)

Jorge Martínez Garrido

September 5, 2023

hacking github git lfs linux


Today, I had to work with various FITS files. These files can be very large in size. Even if I knew this, I decided to set up a repository for tracking my work.

The only measure that I took was to compress all of them inside a fits.tar.gz file. Then, I added this file using git lfs and pushed the commit to GitHub. Here is the result:

A screenshot from my GitHub account indicating the LFS limit has been reached

Why previous limit?

GitHub imposes a limit on:

Read more information about these limits in the official GitHub documentation.

A solution to previous situation

The solution to previous situation is to split the file you want to push in several ones whose maximum size is 100Mb.

To do so, you can use the split command:

split --bytes 100M -d -a <N> <file> <prefix>

The options in previous command stand for:

The arguments in previous command are:

Example

Having all the FITS files in a fits.tar.gz compressed file, execute:

split --bytes 100M -d -a 2 fits.tar.gz "fits_part_"

Previous command generated the following files:

.
├── fits_part_00
├── fits_part_01
├── fits_part_02
├── fits_part_03
├── ...
└── fits_part_29

None of the previous files exceeds the 100Mb limit. You can know stage and commit previous files. GitHub will not complain.

Limitations of this method

Note that you are no longer uploading a single file but a set of files. If you wish to have a single file, you need to implement some preprocessing to concatenate these files after fetching them:

cat fits_part_* > fits.tar.gz && tar xvf fits.tar.gz

This is not ideal but at least you can track now large files in GitHub if you do not want to upgrade your plan.