Git LFS

Learn via video courses
Topics Covered

Overview

Git LFS, lets us handle large files in the repository and their modification without the overhead to download all the files, again and again, each time we commit, pull, clone, or change the contents of the large files. It achieves this functionality using pointers. All the git properties are provided with Git LFS with the same commands.

Pre-requisites

What is Git LFS?

Git is an open-source, version control, distributed system. It is used to develop large projects with a team of people. A project and its version saved in the cloud in Git is known as a repository. When cloning this repository the client has to clone the entire history of the repository.

This repository often contains large files and it becomes difficult if large files are modified regularly. Imagine if the client is cloning the repository which has 50 commits out of which 30 commits contain the modified version of the large file whose size is about 1GB. To clone this repository, a huge amount and space are consumed as all the versions of the file are downloaded by the client.

LFS in Git i.e. Large File Storage helps in such situations. It is an open-source Git extension developed by GitHub, Atlassian, etc. It downloads the large files only during the checkout process rather than downloading them while cloning or fetching the repository.

Features

  • It is useful in versioning large files whose size is in Gigabytes with Git itself.
  • As large files are stored externally, more space is available for the repository.
  • Cloning and Fetching are faster as instead of cloning all the versions of the large files, only pointers are cloned or fetched.
  • Same Git workflow is used so additional skills, commands, toolsets, or secondary storage systems are not required.
  • No need of adding new access controls and permissions. These are fetched from the working repository present in remote hosts like GitHub.

Installing Git Large File Storage

To install Git large file storage, follow the steps given below :

  1. Visit Git LFS website and click on Download.

  2. A file with the .exe extension will be downloaded on the computer. Install this file by going to file explorer, locating the downloaded file, and double-clicking on the file.

  3. Windows setup will ask for permission to install the file, click on yes and agree to the terms to install the file.

  4. Open Git Bash. To verify the installation of LFS run the command as follows :

    It will return the output as follows if it is correctly installed.

  5. If it is not installed, please contact GitHub Support. Please mention the operating system you are installing, to get accurate results.

How LFS Objects Affect Repository Size?

  • When an LFS object is added to our repository, GitLab first creates an LFS object.
  • It then associates or connects this newly created object with the repository.
  • A job is then queued to calculate the size of the repository storage and LFS object storage to calculate the total size.
  • When we fork the repository to our local computer, the fork size is the same as that of the original repository, and LFS are associated with the original repository and are also associated with the forked copy.
  • If any LFS objects inside the forked copy are modified, the original repository's size is not affected.
  • Lastly, when we merge the changes made back to the remote repository the modified LFS associated with the forked copy are also associated with the remote repository and its size changes.

How Does Git LFS Work?

Git LFS instead of storing entire files in the local repository stores pointers to the large files. These pointers occupy very little space as compared to large files. All these pointers are handled by LFS automatically and thus, are never seen by the clients.

Let us explore some cases to understand how LFS in Git works in backhand.

  • Case - I :
    File is added to the repository. In this case, a large file is replaced by the pointer, and the content of the file is stored in a local cache of LFS. A pointer is a small file containing the address of the large file on the server along with other important details of the file.

    file-is-added-to-the-repository

  • Case - II :
    New commits are pushed to the server. When a new commit is pushed to the remote server, the large files that are modified in this commit are transferred from the local Git LFS cache to the server. new-commits-are-pushed-to-the-server

  • Case - III :
    An old commit containing LFS in Git pointers is checked out. Our local repository contains the address to this LFS present on the server. As we request an old commit using the git checkout command it downloads the file and stores it in the local repository.

    old-commit-containing-lfs-in-git-pointers-is-checked-out

    In short for all the large files in Git, locally we only store pointers. When we are required to work on these files, then only it is downloaded.

    For these communications, LFS in Git communicates with the GitLab server over HTTPS. As the client requests some file, its request is first authorized and then a large file is fetched or pushed.

Using Git LFS

Creating a New Git LFS Repository

We are first required to create a Git repository using the following commands

This will create a folder Repo and initialize it as an empty Git repository.

Now to initialize it as a Git LFS, use the following command

It will install a pre-push hook in the Repo repository and transfers the large files to the server when we initially push the repository.

In Bitbucket Cloud repositories, LFS is already enabled.

Clone a Git LFS Repository

The Git LFS repository is cloned as a normal Git repository itself using the git clone command. After the repository is cloned, Git will check the default branch, and all the LFS files that are required to be checked out are automatically downloaded.

Pull From a Git LFS Repository

The git pull command is only used to pull the LFS repository. It automatically checks for the LFS files and downloads the required files during the checkout process after the pull completes.

In some cases, it fails to check out for unexpected reasons. To resolve this issue, we can also use thegit lfs pull command to download any missing LFS files for the specific commit.

Speeding Up Pulls

If you are cloning for the first time or after a long time and several large files are to be downloaded during the checkout process. To disable this at the moment and download these files later explicitly using git lfs pull. To do so use the given command overriding Git config with the -c flag while invoking git pull.

If you find this command complicated and wish to simplify the process you can use the batched Git and Git LFS pull using this command set.

This will save time and will make the process faster.

Tracking Files

Git needs to be told that the new file added is the LFS in the repository. We can use the pattern to specify the file to be tracked. Command git lfs track is used to track the file. The pattern or filename must be enclosed between ""(double quotes).

LFS in Git supports all the patterns supported by the .gitignore command except the negative patterns.

For example :

As soon as the git lfs track command is called for the first time, Git creates a special file .gitattributes. It keeps the record of all the LFS files to provide special behavior to these files in Git. Git updates .gitattributes and keeps the records of LFS files automatically. As the .gitattribute file is a part of the repository, each time we make changes to the LFS files, we are supposed to manually commit changes in the .gitattribute file.

To view all the files being tracked we can simply use git lfs track without any flags.

If we no more wish to track a file we can simply remove it from the tracking list. There are two ways of removing these files, first one is carefully removing the file's name from the .gitattribute file and committing the changes. The second method is using the git lfs untrack command.

This will remove all the .png files from the list of files being tracked.

Commit and Push

The Git LFS repository can be committed and pushed as aregularl repository. To make the changes to the remote repository, the git push command is used. Along with the normal output, a few lines of output are added for the LFS files. LFS files are tracked and the committed changes to these files are recorded.

Due to any reason if the transfer of LFS files fails, the push does not proceed forward it is rather aborted and the repository can be pushed again. It is always in stable form only.

Download Extra Content for Other Recently Modified Branches

LFS in Git only fetches files required for the current commit. To extract all the files forcefully we can use the fit lfs fetch command. It will download all the extra content of recently modified branches and commits.

It usually fetches changes made in the last 7 days but to change the number of days use the below-given command

Delete Files From Your Local Git LFS

As the files are really large and to clear up the storage, we can delete the LFS files from the local cache.

This will delete all the local LFS files that are old i.e. it is not referenced by the current commit or the commits that are not pushed yet.

Usually recent for this command is 10 days but it can again be changed with lfs.pruneoffsetdays as in the previous section.

Unlike Git, LFS doesn't have automatic garbage collection. Therefore, it is required to be cleaned or pruned manually. It is a good habit to clean it regularly and keep the repository size small.

As we are deleting files and we are not sure if only unwanted files are being deleted we can always dry-run it using the following command :

If you want the log to return the files it is going to remove we can use the verbose option. It returns SHA-256 hashes i.e. object IDs of the files that are removed.

We can also check if the file is present in the remote repository or not using the --verify remote option. If the file is present in a remote repository, it is safe to delete it locally.

Deleting Remote Git LFS Files From the Server

We can simply delete large files from the local repository but to remove these files from the remote server, the command-line client cannot be used.

The method varies for different hosting providers. In the Bitbucket cloud, we have a GUI and we can select the items we wish to delete. Navigate to Repository Settings > Git LFS to delete the files.

It is important to note that multiple commits might be pointing to the same file. To preview the file we can download or search for commits referring to this file using its SHA hash code or object ID.

Including/excluding Git LFS Files

Instead of downloading all the files or none of them, we can download selected files. This can be done using a pattern, only the files matching the pattern can be downloaded.

To exclude files with a particular format and download the rest of them, the following format is used.

To include only a particular format, use the format given below :

We can combine excluding and including particular files together in the same command as well.

Instead of using -I and -X we can also use lfs.fetchinclude and lfs.fetchexclude as well.

Git LFS File Locking

Conflicts often arise when we try to merge binary files. There is no easy way to prevent these conflicts but we can lock the binary files from being merged or overwritten.

To lock any file, we are first supposed to tell git that these types of files are lockable. To do so we use the --lockable flag with the git lfs track command to store PSD as well as mark them lockable in LFS.

Next, we will add all these lockable files to the .gitattributes file

To finally add the lock to the file on the Git server, use the given command

It returns a message saying the file is locked.

To remove this lock from the file, we can use the git lfs unlock command.

The locked files can be overwritten by using the --force flag but you must be sure that this step is correct before using this flag.

Troubleshooting

Let us now see some of the commonly occurring errors and their simple solutions.

Error : Encountered n Files That Should have been Pointers

This error usually occurs when the files are large and are expected to be tracked as LFS files but are tracked as normal files. When we upload files through a web interface, files are not tracked as LFS. To resolve this problem use the following steps given below :

  1. Navigate the file to LFS using the command

  2. Push back to the remote repository

  3. It is optional to clean the .git folder of the repository

Error : Repository or LFS Object not Found

There are multiple reasons why this error arises, a few of which are listed below :

  • You are not authorized to access the specific LFS objects. Check for permissions and access to figure out if you are authorized to push or fetch from the repository.
  • The LFS object is not associated with the project anymore. This can occur if the object is removed from the server.
  • Another possible reason for this error could be local Git repository is using a deprecated or older version of the LFS API.

Error : Invalid Status for URL (error 501)

To view the error, we can check the log. The failures are logged into the log file of LFS in Git. The below command can be used to view the logs.

The possible reasons for the error could be :

  • If Git LFS is not enabled for the particular project, this error occurs. To enable it, go to project settings.

  • Even when LFS in Git is enabled, Git LFS support is not enabled on the server side of GitLab, this error arises. To resolve this issue, you are required to contact the server administrator.

  • Lastly, the version of Git LFS on the client side is different from the latest version supported by the GitLab server. The version can be checked using the command.

Error : getsockopt: Connection Refused

This error arises when we try to push an LFS object to the project. This happens because the GitLab instance is served on HTTP: and the client is trying to reach through HTTPS:.

To prevent this, we can use the command lfs.url before the URL in the project configuration as follows :

Error : Credentials Required

Git remembers the credentials for all your repositories. So each time you push anything you are not required to enter the details. Every time we time to push something, LFS in Git authenticates the user with the help of HTTPS credentials.

To ask Git to save the password for some time, while you push and are not required to enter it again use the following command.

After 15 minutes, you are again required to enter the credentials.

Error : LFS Objects are Missing

To manually push all the LFS files present locally, use the command

GitLab first checks files to verify if the files already exist. If these files are detected using LFS pointers, it doesn't push them, and thus manually pushing LFS files is required.

Error : Hosting LFS Objects Externally

You may wish to host LFS objects externally with the help of git config -f .lfsconfig lfs.URL [exampleurl.com]. This may cause push failure. This is because in the project LFS support is enabled. To disable this, go to project settings and disable it.

Conclusion

  • Git repositories usually have large files that are difficult to handle if we are required to download multiple of these files with each commit.
  • A huge amount of storage and time is wasted. To avoid this Git LFS is built, it stores the files on the cloud. Only recent large files are downloaded and for the rest of the files, only their pointers are used.
  • Git in the local repository stores only pointers for large files. It only downloads the files which are modified recently. It also doesn't upload all the large files every time we push the commits. It only pushed the large files that are modified.
  • It has various amazing features such as versioning of large files, cloning, pushing, and fetching the repositories faster, and it is the same as git in toolsets and commands.
  • Git lFS can be installed from its website. We are supposed to run the command git lfs install in Git Bash to verify the installation.
  • Using Git LFS, we can create a new repository that supports LFS files, clone an existing repository, push and pull repositories, tell Git to track LFS files, etc.
  • We can also download all the LFS files using the fetch command or selectively download particular files using fetch include and exclude options.
  • The LFS files which are no more being used can be deleted using the git lfs prune command. To delete the remote files command line cannot be used and thus its method varies for each hosting provider.
  • Various errors occur very frequently and are easy to troubleshoot such as enabling Git LFS or disabling it, checking for permissions, adding proper credentials, etc.