[ git ssh setup ]
Git is a version control system. Imagine you're working on a project, like writing a report, developing software, or designing something, and you need to keep track of all the changes you make over time. Git helps you do just that by recording snapshots of your work at different points. These snapshots are called "commits."
Version control systems (VCS) are essential tools for tracking and managing changes to code and other files in software development. Git, one of the most popular VCS, provides powerful features for collaboration, history tracking, and code management.
Git was created by Linus Torvalds in 2005 to manage the development of the Linux kernel, and since then, it's become one of the most popular tools for managing changes in projects, especially in software development.
On Linux, you can install it using your package manager, like $ sudo apt-get install git
on Ubuntu.
After installing Git, you need to tell it who you are. Open your terminal or command prompt and type:
$ git config --global user.name "Your Name"
$ git config --global user.email "[email protected]"
Go to the folder where you want to keep your project and type: $ git init
.
Create or add files to your project. For example, you might create a text file called README.md.
To tell Git to start tracking the file, type: $ git add README.md
. Now, save a snapshot of your project by typing: $ git commit -m "Add README file"
. The -m
flag lets you add a message describing what you did, like "Add README file."
If you're working with others, you might want to share your code using a service like GitHub, GitLab, or Bitbucket. You can push your commits to a remote repository and allow others to clone, pull, and push changes.
$ git remote add origin https://github.com/yourusername/yourproject.git
$ git push -u origin master
Here is my mini guide on version control with git. Some useful resources : git & github slide, git_cheatsheet.pdf, git_cheatsheet_2.pdf, github actions : CI/CD - example ('.yaml' config file in repository).
Git is more than just a tool for version control; it's a sophisticated system that models project history using a data structure called a Directed Acyclic Graph (DAG). This structure allows Git to efficiently track changes and manage the state of your project over time. To understand Git's internals, let's dive into how it models history and stores data.
At the core of Git's data model are three types of objects: blobs, trees, and commits. These objects represent the content of files, directories, and snapshots of the project, respectively.
A blob is essentially a file in Git. It stores the content of a file as a sequence of bytes.
In code terms, you can think of it as: $ type blob = array<byte>
.
<root> (tree)
|
+- foo (tree)
| |
| + bar.txt (blob, contents = "hello world")
|
+- baz.txt (blob, contents = "git is wonderful")
A tree is a directory in Git. It maps filenames to either blobs (files) or other trees (subdirectories). It represents the hierarchical structure of your project.
A commit is a snapshot of your project at a given point in time. It contains references to one or more parent commits (if the commit is part of a branch or merge), metadata (such as the author and commit message), and the top-level tree that represents the state of the project.
How git models history:
// a file is a bunch of bytes
type blob = array<byte>
// a directory contains named files and directories
type tree = map<string, tree | blob>
// a commit has parents, metadata, and the top-level tree
type commit = struct {
parent: array<commit>
author: string
message: string
snapshot: tree
}
Git uses a unique identifier for each object, which is the SHA-1 hash of the object’s content. This ensures that every object is uniquely and immutably stored in Git's internal database.
An “object” is a blob, tree, or commit:
type object = blob | tree | commit
objects = map<string, object>
def store(object):
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
When you store an object (whether it's a blob, tree, or commit), Git computes its SHA-1 hash and stores it in a map where the key is the hash and the value is the object.
While objects in Git are stored using SHA-1 hashes, which are not human-friendly, Git also maintains references that map human-readable names (like branch names or tags) to these hashes.
references = map<string, string>
def update_reference(name, id):
references[name] = id
def read_reference(name):
return references[name]
def load_reference(name_or_id):
if name_or_id in references:
return load(references[name_or_id])
else:
return load(name_or_id)
When you create a new branch or tag, Git stores the associated commit hash in a map under the branch or tag name.
To load the content of a branch or tag, Git checks if the name corresponds to a reference. If so, it retrieves the associated commit; otherwise, it assumes the input is already a hash and directly loads the object.
In Git, everything ultimately boils down to storing and retrieving objects (blobs, trees, and commits) and managing references to these objects. When you make a commit, Git takes the current state of your project, stores it as a tree of blobs, creates a commit object that points to this tree and any parent commits, and then updates the reference for your current branch to point to this new commit.
All in all git stores references as a commit. references = "commit msg, sha1 hash"
git help <command>:
get help for a git commandgit init:
creates a new git repo, with data stored in the .git directorygit status:
tells you what’s going ongit add <filename>:
adds files to staging areagit commit:
creates a new commit- Write good commit messages!
- Even more reasons to write good commit messages!
git log:
shows a flattened log of historygit log --all --graph --decorate:
visualizes history as a DAGgit diff <filename>:
show changes you made relative to the staging areagit diff <revision> <filename>:
shows differences in a file between snapshotsgit checkout <revision>:
updates HEAD and current branch
git branch:
shows branchesgit branch <name>:
creates a branchgit checkout -b <name>:
creates a branch and switches to it same as git branch ; git checkoutgit merge <revision>:
merges into current branchgit mergetool:
use a fancy tool to help resolve merge conflictsgit rebase:
rebase set of patches onto a new base
git remote:
list remotesgit remote add <name> <url>:
add a remotegit push <remote> <local branch>:<remote branch>:
send objects to remote, and update remote referencegit branch --set-upstream-to=<remote>/<remote branch>:
set up correspondence between local and remote branchgit fetch:
retrieve objects/references from a remotegit pull:
same as git fetch; git mergegit clone:
download repository from remote
git commit --amend:
edit a commit’s contents/messagegit reset HEAD <file>:
unstage a filegit checkout -- <file>:
discard changes
git config:
Git is highly customizablegit clone --depth=1:
shallow clone, without entire version historygit add -p:
interactive staginggit rebase -i:
interactive rebasinggit blame:
show who last edited which linegit stash:
temporarily remove modifications to working directorygit bisect:
binary search history (e.g. for regressions).gitignore:
specify intentionally untracked files to ignore
A branch in Git is simply a pointer to a specific commit in the project history. It allows you to diverge from the main line of development and continue to work on your code independently. Each branch represents an isolated line of development.
Let’s say you have a project with a few commits on the main
branch:
A---B---C (main)
Here, A
, B
, and C
represent commits, and the main
branch points to commit C
.
Now, you want to add a new feature without affecting the main
branch. You create a new branch called feature
: $ git checkout -b feature
.
This creates a new branch, feature
, that also points to commit C
:
A---B---C (main, feature)
When you start making new commits on the feature
branch, the history diverges:
A---B---C (main)
\
D---E (feature)
Now, main
remains on commit C
, while feature
has advanced to commit E
.
Merging in Git is the process of integrating changes from one branch into another. The most common scenario is merging a feature
branch into the main branch once the feature is complete.
-
Fast-Forward Merge:
- Occurs when the main branch hasn't changed since the feature branch was created.
- Git simply moves the main branch pointer forward to the latest commit on the feature branch.
Before:
A---B---C (main)
\
D---E (feature)
After:
A---B---C---D---E (main)
(feature)
-
Three-Way Merge:
-
Occurs when both the
main
branch and the feature branch have made changes after the feature branch was created. -
Git creates a new merge commit that has two parents, representing the history of both branches.
Before:
A---B---C---F (main)
\
D---E (feature)
After:
A---B---C---F---G (main)
\ /
D----E (feature)
Here, G
is the merge commit that combines changes from both F
(on main
) and E
(on feature
).
To merge the feature branch into the main branch:
git checkout main
git merge feature
Rebasing in Git is an alternative to merging. It allows you to move or combine a sequence of commits from one branch onto another. Rebasing rewrites the commit history by applying your changes on top of another branch.
When you rebase a branch onto another, Git essentially:
- Finds the common ancestor between the branches.
- Applies the commits from the current branch onto the target branch one by one.
- Moves the branch pointer to the new commits.
If you rebase feature onto main
:
$ git checkout feature
$ git rebase main
The feature
branch commits D
and E
are replayed on top of main
:
A---B---C---F (main)
\
D'---E' (feature)
Notice how the commits on feature
(now D'
and E'
) have new hashes because they are new commits.
- Cleaner History: Rebasing results in a linear history, making it easier to follow the progression of changes.
- No Merge Commits: Rebasing avoids creating merge commits, which can clutter the history.
[ Git MERGE vs REBASE, GIT: Working with Branches ]
- git init
- ls -a
- ls .git
- git help init
- git cat-file -p 42fb7a2 #change the hash to know about the commit
- git add :/ #adds all from top down in repository
- git log
- git log --all --graph --decorate
- git checkout hashvalue
- git diff hello.txt #changes_since_the_last_snapshot
- git diff 42fb7a2 hello.txt #changes_since_the_42fb7a2_snapshot
- git diff 42fb7a2 56hs32d hello.txt #changes_from_the_42fb7a2__snapshot__to_56hs32d
- git branch -vv
- git branch branchname
- git branch abc; git checkout abc (git checkout -b abc)
To merge these branches:
- git merge cat
- git mergetool
- vimdiff
- git merge --continue
- git log --all --graph --decorate
- git remote
- git branch --set-upstream-to=origin/master
- git fetch + get merge = git pull
- git config / vim ~/.gitconfig
- git clone --shallow #no_version_history
- git diff --cached
- git blame abc.txt
- git show 42fb7a2
- git stash #temporarily_hide)the_current_changes
- git stash pop
- git bisect #powerful_tool_to_find_the_last_commit_where_unittest_was_passing
- git config --global user.name "My Name“
- git reset files/newcopy2.txt
- git rm '*.txt‘
MIT Lecture 6: Version Control (git) (2020) | git assignments, Complete Git Guide: Understand and master Git and GitHub, The Git & Github Bootcamp, So You Think You Know Git - FOSDEM 2024, Git & GitHub Tutorial for Beginners #8 - Branches, Git Branches Tutorial, So You Think You Know Git Part 2 - DevWorld 2024
resources : ProGit Book | markdown-cs, cli/cli | cli.github | git merge vs git rebase, Removing sensitive data from a repository | GitHub Actions - Now with built-in CI/CD! Live from GitHub HQ | Contributing to Open Source for the first time, Github Workflow | Github Workflow 2 | Github Actions | Github Actions 2 | Git and GitHub for Beginners Tutorial | Git Branches Tutorial | Git for Professionals Tutorial - Tools & Concepts for Mastering Version Control with Git | Advanced Git Tutorial - Interactive Rebase, Cherry-Picking, Reflog, Submodules and more | Complete Guide to Open Source - How to Contribute | Contributing to Open Source for the first time