summary refs log tree commit diff
path: root/www/text.causal.agency/023-sparse-checkout.7
blob: 925bc043d16cc68d8ac8712841f2b744d0366c07 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
.Dd June  9, 2021
.Dt SPARSE-CHECKOUT 7
.Os "Causal Agency"
.
.Sh NAME
.Nm Sparse Checkout
.Nd a cool git feature
.
.Sh DESCRIPTION
I was going to write a post about
.Xr git-subtree 1
(and I still plan to!)
but while talking about it
with a friend
I came across another command:
.Xr git-sparse-checkout 1 .
I got pretty excited because
I already had a use case for it.
.
.Pp
.Xr git-sparse-checkout 1
does pretty much what it sounds like.
It lets you only have
a subset of files in the repository actually
.Dq checked out .
This is really useful
for huge respositories
where you are only interested in
some part of it.
Any operation touching the working tree
is much faster because
it can skip all the files you don't care about.
.
.Pp
My use case is with the
.Fx
.Xr ports 7
tree,
which recently moved to git
and contains almost 14 thousand files.
Working with the whole repository
was super painful.
.Xr git-status 1 ,
which I run as a habit
when my shell is idle,
would take dozens of seconds
to check the whole working tree
and report back.
(I didn't get any real time measurements
before enabling
.Xr git-sparse-checkout 1 ,
and I'm not about to disable it now,
since it'd have to check out
all those files again.)
I'm only actually working on
a small handful of ports,
so all that work is wasted.
Time to turn on sparse checkout:
.Bd -literal -offset indent
git sparse-checkout init --cone
.Ed
.
.Pp
The
.Fl \-cone
option here
(which I keep reading as
.Dq clone
because it's git)
restricts the kinds of patterns
you can use to select files to check out,
but makes the calculation more efficient.
Basically it means you can only select
paths along with everything below them,
which I think is pretty much
always what you want anyway.
Enabling sparse checkout
can take quite a while
because it has to do a lot of un-checking-out.
I should mention
that you can pass
.Fl \-sparse
to
.Xr git-clone 1
to avoid ever checking out
the whole tree.
.
.Pp
The default selection when you run
.Cm init
is to check out all the files
at the root of the repository,
but none of the subdirectories.
For
.Xr ports 7 ,
I also want to check out
the shared scripts and Makefiles:
.Bd -literal -offset indent
git sparse-checkout add Keywords Mk Templates Tools
.Ed
.
.Pp
And then I can selectively check out
just the ports I'm working on:
.Bd -literal -offset indent
git sparse-checkout add irc/catgirl irc/pounce
.Ed
.
.Pp
After enabling sparse checkout,
.Xr git-status 1
takes what I'd call
a normal amount of time.
I also did this on
a couple-weeks-out-of-date copy of the
.Xr ports 7
tree,
and when I ran
.Xr git-pull 1
it was also really quick,
because it didn't have to bother
updating all those files
I'm not interested in.
It still downloads all the git objects,
of course,
and you can just add any new paths you need
to the sparse checkout list.
My disk usage also went down
by about a gigabyte.
.
.Pp
I'm super pleased to discover this part of git,
because it makes working with huge
and/or monorepo-style repositories
so much more feasible!
You can see how I came across it,
since
.Xr git-subtree 1
is also a useful tool for monorepos.
Stay tuned for that post,
I guess :)
.
.Sh AUTHORS
.An june Aq Mt june@causal.agency
ss='logsubject'>Add support for downloading single blobsLars Hjemli Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-05-08ui-view: show pathname if specified in querystringLars Hjemli Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-05-08Update to libgit 1.5.2-rc2Lars Hjemli Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-21Layout updateLars Hjemli 2007-02-08Make snapshot feature configurableLars Hjemli Snapshots can now be enabled/disabled by default for all repositories in cgitrc with param "snapshots". Additionally, any repo can override the default setting with param "repo.snapshots". By default, no snapshotting is enabled. Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-08Add support for snapshotsLars Hjemli Make a link from the commit viewer to a snapshot of the corresponding tree. Currently only zip-format is supported. Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-05cgit v0.2Lars Hjemli Main changes since v0.1: -list tags in repo summary -allow search in log-view -read repository paths from cgitrc Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-05Add support for prefix and gitsrc arguments to 'make'Lars Hjemli This should improve the installation a little, especially since the new options are mentioned in the README. Also, add a make-rule to build the git binaries if necessary + a dependency between cgit and libgit.a. Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-04Update cgitrc templateLars Hjemli Make the descriptions more helpfull. Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-04Add support for lightweight tagsLars Hjemli There is nothing bad about a tag that has no tag-object, but the old code didn't handle such tags correctly. Fix it. Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-04Read repo-info from /etc/cgitrcLars Hjemli This makes cgit read all repo-info from the configfile, instead of scanning for possible git-dirs below a common root path. This is primarily done to get better security (separate physical path from logical repo-name). In /etc/cgitrc each repo is registered with the following keys: repo.url repo.name repo.path repo.desc repo.owner Note: *Required keys are repo.url and repo.path, all others are optional *Each occurrence of repo.url starts a new repository registration *Default value for repo.name is taken from repo.url *The value of repo.url cannot contain characters with special meaning for urls (i.e. one of /?%&), while repo.name can contain anything. Example: repo.url=cgit-pub repo.name=cgit/public repo.path=/pub/git/cgit repo.desc=My public cgit repo repo.owner=Lars Hjemli repo.url=cgit-priv repo.name=cgit/private repo.path=/home/larsh/src/cgit/.git repo.desc=My private cgit repo repo.owner=Lars Hjemli Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-04Do not die if tag has no messageLars Hjemli Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-02-03Fix search for non-virtual urlsLars Hjemli When cgit don't use virtual urls, the current repo and page url parameters must be included in the search form as hidden input fields. Signed-off-by: Lars Hjemli <hjemli@gmail.com> 2007-01-28Update README with install/config informationLars Hjemli