[svnbook commit] r2712 - trunk/src/en/book
cmpilato
noreply at red-bean.com
Mon Feb 26 20:13:14 CST 2007
Author: cmpilato
Date: Mon Feb 26 20:13:14 2007
New Revision: 2712
Modified:
trunk/src/en/book/ch05-repository-admin.xml
Log:
* src/en/book/ch05-repository-admin.xml
Move the guts of dump-filtering into its own new section, leaving only
a stub description of svndumpfilter in the Administrator's Toolkit
section. Also created a new stub in that section for svnsync.
Modified: trunk/src/en/book/ch05-repository-admin.xml
==============================================================================
--- trunk/src/en/book/ch05-repository-admin.xml (original)
+++ trunk/src/en/book/ch05-repository-admin.xml Mon Feb 26 20:13:14 2007
@@ -1019,14 +1019,48 @@
by Subversion's own tools.</para>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
+ <sect3 id="svn.reposadmin.maint.tk.svnadmin">
+ <title>svnadmin</title>
+
+ <para>The <command>svnadmin</command> program is the
+ repository administrator's best friend. Besides providing
+ the ability to create Subversion repositories, this program
+ allows you to perform several maintenance operations on
+ those repositories. The syntax of
+ <command>svnadmin</command> is similar to that of other
+ Subversion command-line programs:</para>
+
+ <screen>
+$ svnadmin help
+general usage: svnadmin SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...]
+Type 'svnadmin help <subcommand>' for help on a specific subcommand.
+Type 'svnadmin --version' to see the program version and FS modules.
+
+Available subcommands:
+ crashtest
+ create
+ deltify
+…
+</screen>
+
+ <para>We've already mentioned <command>svnadmin</command>'s
+ <literal>create</literal> subcommand (see <xref
+ linkend="svn.reposadmin.basics.creating"/>). Most of the
+ others we will cover as they become topically relevant later
+ in this chapter. And you can consult <xref
+ linkend="svn.ref.svnadmin" /> for a full rundown of
+ subcommands and what each of them offers.</para>
+
+ </sect3>
+
+ <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<sect3 id="svn.reposadmin.maint.tk.svnlook">
<title>svnlook</title>
<para><command>svnlook</command> is a tool provided by
Subversion for examining the various revisions and
transactions in a repository. No part of this program
- attempts to change the repository—it's a
- <quote>read-only</quote> tool. <command>svnlook</command>
+ attempts to change the repository. <command>svnlook</command>
is typically used by the repository hooks for reporting the
changes that are about to be committed (in the case of the
<command>pre-commit</command> hook) or that were just
@@ -1054,19 +1088,9 @@
itself, or how it differs from the previous revision of the
repository. You use the <option>--revision</option> and
<option>--transaction</option> options to specify which
- revision or transaction, respectively, to examine. Note
- that while revision numbers appear as natural numbers,
- transaction names are alphanumeric strings. Keep in mind
- that the filesystem only allows browsing of uncommitted
- transactions (transactions that have not resulted in a new
- revision). Most repositories will have no such
- transactions, because transactions are usually either
- committed (in which case, you should access them as revision
- with the <option>--revision</option> option) or aborted and
- removed.</para>
-
- <para>In the absence of both the <option>--revision</option>
- and <option>--transaction</option> options,
+ revision or transaction, respectively, to examine. In the
+ absence of both the <option>--revision</option> and
+ <option>--transaction</option> options,
<command>svnlook</command> will examine the youngest (or
<quote>HEAD</quote>) revision in the repository. So the
following two commands do exactly the same thing when 19 is
@@ -1087,6 +1111,15 @@
$ svnlook youngest /path/to/repos
19
</screen>
+
+ <note>
+ <para>Keep in mind that the only transactions you can browse
+ are uncommitted ones. Most repositories will have no such
+ transactions, because transactions are usually either
+ committed (in which case, you should access them as
+ revision with the <option>--revision</option> option) or
+ aborted and removed.</para>
+ </note>
<para>Output from <command>svnlook</command> is designed to be
both human- and machine-parsable. Take as an example the output
@@ -1123,7 +1156,7 @@
<para>This output is human-readable, meaning items like the
datestamp are displayed using a textual representation
instead of something more obscure (such as the number of
- nanoseconds since the Tasty Freeze guy drove by). But this
+ nanoseconds since the Tasty Freeze guy drove by). But the
output is also machine-parsable—because the log
message can contain multiple lines and be unbounded in
length, <command>svnlook</command> provides the length of
@@ -1134,88 +1167,14 @@
in the event that this output is not the last bit of data in
the stream.</para>
- <para>Another common use of <command>svnlook</command> is to
- actually view the contents of a revision or transaction
- tree. The <command>svnlook tree</command> command displays
- the directories and files in the requested tree. If you
- supply the <option>--show-ids</option> option, it will also
- show the filesystem node revision IDs for each of those
- paths (which is generally of more use to developers than to
- users).</para>
-
- <screen>
-$ svnlook tree /path/to/repos --show-ids
-/ <0.0.1>
- A/ <2.0.1>
- B/ <4.0.1>
- lambda <5.0.1>
- E/ <6.0.1>
- alpha <7.0.1>
- beta <8.0.1>
- F/ <9.0.1>
- mu <3.0.1>
- C/ <a.0.1>
- D/ <b.0.1>
- gamma <c.0.1>
- G/ <d.0.1>
- pi <e.0.1>
- rho <f.0.1>
- tau <g.0.1>
- H/ <h.0.1>
- chi <i.0.1>
- omega <k.0.1>
- psi <j.0.1>
- iota <1.0.1>
-</screen>
-
- <para>Once you've seen the layout of directories and files in
- your tree, you can use commands like <command>svnlook
- cat</command>, <command>svnlook propget</command>, and
- <command>svnlook proplist</command> to dig into the details
- of those files and directories.</para>
-
<para><command>svnlook</command> can perform a variety of
- other queries, displaying subsets of bits of information
- we've mentioned previously, reporting which paths were
- modified in a given revision or transaction, showing textual
- and property differences made to files and directories, and
- so on. See <xref linkend="svn.ref.svnlook" /> for a full
- reference of <command>svnlook</command>'s features.</para>
-
- </sect3>
-
- <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
- <sect3 id="svn.reposadmin.maint.tk.svnadmin">
- <title>svnadmin</title>
-
- <para>The <command>svnadmin</command> program is the
- repository administrator's best friend. Besides providing
- the ability to create Subversion repositories, this program
- allows you to perform several maintenance operations on
- those repositories. The syntax of
- <command>svnadmin</command> is similar to that of
- <command>svnlook</command>:</para>
-
- <screen>
-$ svnadmin help
-general usage: svnadmin SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...]
-Type 'svnadmin help <subcommand>' for help on a specific subcommand.
-Type 'svnadmin --version' to see the program version and FS modules.
-
-Available subcommands:
- crashtest
- create
- deltify
-…
-</screen>
-
- <para>We've already mentioned <command>svnadmin</command>'s
- <literal>create</literal> subcommand (see <xref
- linkend="svn.reposadmin.basics.creating"/>). Most of the
- others we will cover as they become topically relevant later
- in this chapter. And you can consult <xref
- linkend="svn.ref.svnadmin" /> for a full rundown of
- subcommands and what each of them offers.</para>
+ other queries: displaying subsets of bits of information
+ we've mentioned previously, recursively listing versioned
+ directory trees, reporting which paths were modified in a
+ given revision or transaction, showing textual and property
+ differences made to files and directories, and so on. See
+ <xref linkend="svn.ref.svnlook" /> for a full reference of
+ <command>svnlook</command>'s features.</para>
</sect3>
@@ -1223,61 +1182,12 @@
<sect3 id="svn.reposadmin.maint.tk.svndumpfilter">
<title>svndumpfilter</title>
- <para>Since Subversion stores your versioned history using, at
- the very least, binary differencing algorithms and data
- compression (optionally in a completely opaque database
- system), attempting manual tweaks is unwise, if not quite
- difficult, and at any rate strongly discouraged. And once
- data has been stored in your repository, Subversion
- generally doesn't provide an easy way to remove that data.
- <footnote>
- <para>That's rather the reason you use version control at
- all, right?</para>
- </footnote>
- But inevitably, there will be times when you would like to
- manipulate the history of your repository. You might need
- to strip out all instances of a file that was accidentally
- added to the repository (and shouldn't be there for whatever
- reason).
- <footnote>
- <para>Conscious, cautious removal of certain bits of
- versioned data is actually supported by real use-cases.
- That's why an <quote>obliterate</quote> feature has been
- one of the most highly requested Subversion features,
- and one which the Subversion developers hope to soon
- provide.</para>
- </footnote>
- Or, perhaps you have multiple projects sharing a
- single repository, and you decide to split them up into
- their own repositories. To accomplish tasks like this,
- administrators need a more manageable and malleable
- representation of the data in their repositories—the
- Subversion repository dump format.</para>
-
- <para>The Subversion repository dump format is a
- human-readable representation of the changes that you've
- made to your versioned data over time. You use the
- <command>svnadmin dump</command> command to generate the
- dump data, and <command>svnadmin load</command> to populate
- a new repository with it (see <xref
- linkend="svn.reposadmin.maint.migrate"/>). The great thing about the
- human-readability aspect of the dump format is that, if you
- aren't careless about it, you can manually inspect and
- modify it. Of course, the downside is that if you have three
- years' worth of repository activity encapsulated in what is
- likely to be a very large dump file, it could take you a
- long, long time to manually inspect and modify it.</para>
-
<para>While it won't be the most commonly used tool at the
administrator's disposal, <command>svndumpfilter</command>
provides a very particular brand of useful
functionality—the ability to quickly and easily modify
- that dump data by acting as a path-based filter. Simply
- give it either a list of paths you wish to keep, or a list
- of paths you wish to not keep, then pipe your repository
- dump data through this filter. The result will be a
- modified stream of dump data that contains only the
- versioned paths you (explicitly or implicitly) requested.</para>
+ streams of Subversion repository history data by acting as a
+ path-based filter.</para>
<para>The syntax of <command>svndumpfilter</command> is as
follows:</para>
@@ -1287,7 +1197,7 @@
general usage: svndumpfilter SUBCOMMAND [ARGS & OPTIONS ...]
Type "svndumpfilter help <subcommand>" for help on a specific subcommand.
Type 'svndumpfilter --version' to see the program version.
-
+
Available subcommands:
exclude
include
@@ -1316,230 +1226,52 @@
</varlistentry>
</variablelist>
- <para>Let's look a realistic example of how you might use this
- program. We discuss elsewhere (see <xref
- linkend="svn.reposadmin.projects.chooselayout"/>) the
- process of deciding how to choose a layout for the data in
- your repositories—using one repository per project or
- combining them, arranging stuff within your repository, and
- so on. But sometimes after new revisions start flying in,
- you rethink your layout and would like to make some changes.
- A common change is the decision to move multiple projects
- which are sharing a single repository into separate
- repositories for each project.</para>
-
- <para>Our imaginary repository contains three projects:
- <literal>calc</literal>, <literal>calendar</literal>, and
- <literal>spreadsheet</literal>. They have been living
- side-by-side in a layout like this:</para>
+ <para>You can learn more about these subcommands and
+ <command>svndumpfilter</command>'s unique purpose in <xref
+ linkend="svn.reposadmin.maint.filtering" />.</para>
- <screen>
-/
- calc/
- trunk/
- branches/
- tags/
- calendar/
- trunk/
- branches/
- tags/
- spreadsheet/
- trunk/
- branches/
- tags/
-</screen>
-
- <para>To get these three projects into their own repositories,
- we first dump the whole repository:</para>
-
- <screen>
-$ svnadmin dump /path/to/repos > repos-dumpfile
-* Dumped revision 0.
-* Dumped revision 1.
-* Dumped revision 2.
-* Dumped revision 3.
-…
-$
-</screen>
+ </sect3>
- <para>Next, run that dump file through the filter, each time
- including only one of our top-level directories, and
- resulting in three new dump files:</para>
+ <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
+ <sect3 id="svn.reposadmin.maint.tk.svnsync">
+ <title>svnsync</title>
- <screen>
-$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile
-…
-$ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile
-…
-$ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile
-…
-$
-</screen>
+ <para>The <command>svnsync</command> program, which is new to
+ the 1.4 release of Subversion, provides all the
+ functionality required for maintaining a read-only mirror of
+ a Subversion repository. The program really has one
+ job—to transfer one repository's versioned history
+ into another repository. And while there are few ways to do
+ that, its primary strength is that it can operate
+ remotely—the <quote>source</quote> and
+ <quote>sink</quote>
+ <footnote>
+ <para>Or is that, the <quote>sync</quote>?</para>
+ </footnote>
+ repositories may be on different computers from each other
+ and from <command>svnsync</command> itself.</para>
- <para>At this point, you have to make a decision. Each of
- your dump files will create a valid repository,
- but will preserve the paths exactly as they were in the
- original repository. This means that even though you would
- have a repository solely for your <literal>calc</literal>
- project, that repository would still have a top-level
- directory named <filename>calc</filename>. If you want
- your <filename>trunk</filename>, <filename>tags</filename>,
- and <filename>branches</filename> directories to live in the
- root of your repository, you might wish to edit your
- dump files, tweaking the <literal>Node-path</literal> and
- <literal>Node-copyfrom-path</literal> headers to no longer have
- that first <filename>calc/</filename> path component. Also,
- you'll want to remove the section of dump data that creates
- the <filename>calc</filename> directory. It will look
- something like:</para>
+ <para>As you might expect, <command>svnsync</command> has a
+ syntax that looks very much like every other program we've
+ mentioned in this chapter:</para>
<screen>
-Node-path: calc
-Node-action: add
-Node-kind: dir
-Content-length: 0
-
-</screen>
-
- <warning>
- <para>If you do plan on manually editing the dump file to
- remove a top-level directory, make sure that your editor is
- not set to automatically convert end-lines to the native
- format (e.g. \r\n to \n) as the content will then not agree
- with the metadata. This will render the dump file
- useless.</para>
- </warning>
-
- <para>All that remains now is to create your three new
- repositories, and load each dump file into the right
- repository:</para>
+$ svnsync help
+general usage: svnsync SUBCOMMAND DEST_URL [ARGS & OPTIONS ...]
+Type 'svnsync help <subcommand>' for help on a specific subcommand.
+Type 'svnsync --version' to see the program version and RA modules.
- <screen>
-$ svnadmin create calc; svnadmin load calc < calc-dumpfile
-<<< Started new transaction, based on original revision 1
- * adding path : Makefile ... done.
- * adding path : button.c ... done.
-…
-$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
-<<< Started new transaction, based on original revision 1
- * adding path : Makefile ... done.
- * adding path : cal.c ... done.
-…
-$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
-<<< Started new transaction, based on original revision 1
- * adding path : Makefile ... done.
- * adding path : ss.c ... done.
-…
+Available subcommands:
+ initialize (init)
+ synchronize (sync)
+ copy-revprops
+ help (?, h)
$
</screen>
- <para>Both of <command>svndumpfilter</command>'s subcommands
- accept options for deciding how to deal with
- <quote>empty</quote> revisions. If a given revision
- contained only changes to paths that were filtered out, that
- now-empty revision could be considered uninteresting or even
- unwanted. So to give the user control over what to do with
- those revisions, <command>svndumpfilter</command> provides
- the following command-line options:</para>
-
- <variablelist>
- <varlistentry>
- <term><option>--drop-empty-revs</option></term>
- <listitem>
- <para>Do not generate empty revisions at all—just
- omit them.</para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><option>--renumber-revs</option></term>
- <listitem>
- <para>If empty revisions are dropped (using the
- <option>--drop-empty-revs</option> option), change the
- revision numbers of the remaining revisions so that
- there are no gaps in the numeric sequence.</para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><option>--preserve-revprops</option></term>
- <listitem>
- <para>If empty revisions are not dropped, preserve the
- revision properties (log message, author, date, custom
- properties, etc.) for those empty revisions.
- Otherwise, empty revisions will only contain the
- original datestamp, and a generated log message that
- indicates that this revision was emptied by
- <command>svndumpfilter</command>.</para>
- </listitem>
- </varlistentry>
- </variablelist>
-
- <para>While <command>svndumpfilter</command> can be very
- useful, and a huge timesaver, there are unfortunately a
- couple of gotchas. First, this utility is overly sensitive
- to path semantics. Pay attention to whether paths in your
- dump file are specified with or without leading slashes.
- You'll want to look at the <literal>Node-path</literal> and
- <literal>Node-copyfrom-path</literal> headers.</para>
-
- <screen>
-…
-Node-path: spreadsheet/Makefile
-…
-</screen>
-
- <para>If the paths have leading slashes, you should
- include leading slashes in the paths you pass to
- <command>svndumpfilter include</command> and
- <command>svndumpfilter exclude</command> (and if they don't,
- you shouldn't). Further, if your dump file has an inconsistent
- usage of leading slashes for some reason,
- <footnote>
- <para>While <command>svnadmin dump</command> has a
- consistent leading slash policy—to not include
- them—other programs which generate dump data might
- not be so consistent.</para>
- </footnote>
- you should probably normalize those paths so they all
- have, or lack, leading slashes.</para>
-
- <para>Also, copied paths can give you some trouble.
- Subversion supports copy operations in the repository, where
- a new path is created by copying some already existing path.
- It is possible that at some point in the lifetime of your
- repository, you might have copied a file or directory from
- some location that <command>svndumpfilter</command> is
- excluding, to a location that it is including. In order to
- make the dump data self-sufficient,
- <command>svndumpfilter</command> needs to still show the
- addition of the new path—including the contents of any
- files created by the copy—and not represent that
- addition as a copy from a source that won't exist in your
- filtered dump data stream. But because the Subversion
- repository dump format only shows what was changed in each
- revision, the contents of the copy source might not be
- readily available. If you suspect that you have any copies
- of this sort in your repository, you might want to rethink
- your set of included/excluded paths.</para>
-
- <para>Finally, <command>svndumpfilter</command> takes path
- filtering quite literally. If you are trying to copy the
- history of a project rooted at
- <filename>trunk/my-project</filename> and move it into a
- repository of its own, you would, of course, use the
- <command>svndumpfilter include</command> command to keep all
- the changes in and under
- <filename>trunk/my-project</filename>. But the resulting
- dump file makes no assumptions about the repository into
- which you plan to load this data. Specifically, the dump
- data might begin with the revision which added the
- <filename>trunk/my-project</filename> directory, but it will
- <emphasis>not</emphasis> contain directives which would
- create the <filename>trunk</filename> directory itself
- (because <filename>trunk</filename> doesn't match the
- include filter). You'll need to make sure that any
- directories which the new dump stream expect to exist
- actually do exist in the target repository before trying to
- load the stream into that repository.</para>
+ <para>We talk more about replication repositories with
+ <command>svnsync</command> in <xref
+ linkend="svn.reposadmin.maint.replication" />.</para>
</sect3>
@@ -2005,12 +1737,29 @@
various back-end data store files in a fashion generally
understood by (and of interest to) only the Subversion
developers themselves. However, circumstances may arise that
- call for all, or some subset, of that data to be collected
- into a single, portable, flat file format and copied or moved
- into another repository. Subversion provides such
- functionality, implemented in a pair of
- <command>svnadmin</command> subcommands:
- <literal>dump</literal> and <literal>load</literal>.</para>
+ call for all, or some subset, of that data to be copied to
+ moved into another repository.</para>
+
+ <para>Subversion provides such functionality by way of
+ repository dump streams. A repository dump stream (often
+ referred to as a <quote>dumpfile</quote> when stored as a file
+ on disk) is a portable, flat file format that describes the
+ various revisions in your repository—what was changed,
+ by whom, when, and so on. This dump stream is the primary
+ mechanism used to marshal versioned history—in whole or
+ in part, with or without modification—between
+ repositories.</para>
+
+ <warning>
+ <para>While the Subversion repository dump format contains
+ human-readable portions and a familiar structure (it
+ resembles an RFC-822 format, the same type of format used
+ for most email), it is <emphasis>not</emphasis> a plaintext
+ file format. The format should be treated as a binary file
+ format, highly sensitive to meddling. Many text editor
+ tools will corrupt the file's contents, often due to
+ automatic line ending character conversion.</para>
+ </warning>
<para>There are many reasons for dumping and loading Subversion
repository data. Early in Subversion's life, the most common
@@ -2025,16 +1774,17 @@
new OS or CPU architecture, or switching between the Berkeley
DB and FSFS back-ends.</para>
- <para>Whatever your reason, using the <command>svnadmin
- dump</command> and <command>svnadmin load</command>
- subcommands is straightforward. <command>svnadmin
- dump</command> will output a range of repository revisions
- that are formatted using Subversion's custom filesystem dump
- format. The dump format is printed to the standard output
- stream, while informative messages are printed to the standard
- error stream. This allows you to redirect the output stream
- to a file while watching the status output in your terminal
- window. For example:</para>
+ <para>Whatever your reason for migration repository history,
+ using the <command>svnadmin dump</command> and
+ <command>svnadmin load</command> subcommands is
+ straightforward. <command>svnadmin dump</command> will output
+ a range of repository revisions that are formatted using
+ Subversion's custom filesystem dump format. The dump format
+ is printed to the standard output stream, while informative
+ messages are printed to the standard error stream. This
+ allows you to redirect the output stream to a file while
+ watching the status output in your terminal window. For
+ example:</para>
<screen>
$ svnlook youngest myrepos
@@ -2097,9 +1847,10 @@
repository—the same thing you get by making commits
against that repository from a regular Subversion client. And
just as in a commit, you can use hook programs to perform
- actions before and after each of the commits made during a load
- process. By passing the <option>--use-pre-commit-hook</option>
- and <option>--use-post-commit-hook</option> options to
+ actions before and after each of the commits made during a
+ load process. By passing the
+ <option>--use-pre-commit-hook</option> and
+ <option>--use-post-commit-hook</option> options to
<command>svnadmin load</command>, you can instruct Subversion
to execute the pre-commit and post-commit hook programs,
respectively, for each loaded revision. You might use these,
@@ -2109,8 +1860,9 @@
post-commit hook sends emails to a mailing list for each new
commit, you might not want to spew hundreds or thousands of
commit emails in rapid succession at that list for each of the
- loaded revisions! You can read more about the use of hook
- scripts in <xref linkend="svn.reposadmin.create.hooks"/>.</para>
+ loaded revisions! You can read more about the use of hook
+ scripts in <xref
+ linkend="svn.reposadmin.create.hooks"/>.</para>
<para>Note that because <command>svnadmin</command> uses
standard input and output streams for the repository dump and
@@ -2188,8 +1940,8 @@
$ svnadmin dump myrepos --revision 2001:3000 --incremental > dumpfile3
</screen>
- <para>These dump files could be loaded into a new repository with
- the following command sequence:</para>
+ <para>These dump files could be loaded into a new repository
+ with the following command sequence:</para>
<screen>
$ svnadmin load newrepos < dumpfile1
@@ -2213,10 +1965,11 @@
<para>The dump format can also be used to merge the contents of
several different repositories into a single repository. By
- using the <option>--parent-dir</option> option of <command>svnadmin
- load</command>, you can specify a new virtual root directory
- for the load process. That means if you have dump files for
- three repositories, say <filename>calc-dumpfile</filename>,
+ using the <option>--parent-dir</option> option of
+ <command>svnadmin load</command>, you can specify a new
+ virtual root directory for the load process. That means if
+ you have dump files for three repositories, say
+ <filename>calc-dumpfile</filename>,
<filename>cal-dumpfile</filename>, and
<filename>ss-dumpfile</filename>, you can first create a new
repository to hold them all:</para>
@@ -2256,15 +2009,9 @@
repository dump format—conversion from a different
storage mechanism or version control system altogether.
Because the dump file format is, for the most part,
- human-readable,
- <footnote>
- <para>The Subversion repository dump format resembles
- an RFC-822 format, the same type of format used for most
- email.</para>
- </footnote>
- it should be relatively easy to describe generic sets of
- changes—each of which should be treated as a new
- revision—using this file format. In fact, the
+ human-readable, it should be relatively easy to describe
+ generic sets of changes—each of which should be treated
+ as a new revision—using this file format. In fact, the
<command>cvs2svn</command> utility (see <xref
linkend="svn.forcvs.convert"/>) uses the dump format to
represent the contents of a CVS repository so that those
@@ -2273,6 +2020,294 @@
</sect2>
<!-- =============================================================== -->
+ <sect2 id="svn.reposadmin.maint.filtering">
+ <title>Filtering Repository History</title>
+
+ <para>Since Subversion stores your versioned history using, at
+ the very least, binary differencing algorithms and data
+ compression (optionally in a completely opaque database
+ system), attempting manual tweaks is unwise, if not quite
+ difficult, and at any rate strongly discouraged. And once
+ data has been stored in your repository, Subversion
+ generally doesn't provide an easy way to remove that data.
+ <footnote>
+ <para>That's rather the reason you use version control at
+ all, right?</para>
+ </footnote>
+ But inevitably, there will be times when you would like to
+ manipulate the history of your repository. You might need
+ to strip out all instances of a file that was accidentally
+ added to the repository (and shouldn't be there for whatever
+ reason).
+ <footnote>
+ <para>Conscious, cautious removal of certain bits of
+ versioned data is actually supported by real use-cases.
+ That's why an <quote>obliterate</quote> feature has been
+ one of the most highly requested Subversion features,
+ and one which the Subversion developers hope to soon
+ provide.</para>
+ </footnote>
+ Or, perhaps you have multiple projects sharing a
+ single repository, and you decide to split them up into
+ their own repositories. To accomplish tasks like this,
+ administrators need a more manageable and malleable
+ representation of the data in their repositories—the
+ Subversion repository dump format.</para>
+
+ <para>As we described in <xref
+ linkend="svn.reposadmin.maint.migrate" />, the
+ Subversion repository dump format is a human-readable
+ representation of the changes that you've made to your
+ versioned data over time. You use the <command>svnadmin
+ dump</command> command to generate the dump data, and
+ <command>svnadmin load</command> to populate a new
+ repository with it (see <xref
+ linkend="svn.reposadmin.maint.migrate"/>). The great thing
+ about the human-readability aspect of the dump format is
+ that, if you aren't careless about it, you can manually
+ inspect and modify it. Of course, the downside is that if
+ you have three years' worth of repository activity
+ encapsulated in what is likely to be a very large dump file,
+ it could take you a long, long time to manually inspect and
+ modify it.</para>
+
+ <para>That's where <command>svndumpfilter</command> becomes
+ useful. This program acts as path-based filter for
+ repository dump streams. Simply give it either a list of
+ paths you wish to keep, or a list of paths you wish to not
+ keep, then pipe your repository dump data through this
+ filter. The result will be a modified stream of dump data
+ that contains only the versioned paths you (explicitly or
+ implicitly) requested.</para>
+
+ <para>Let's look a realistic example of how you might use this
+ program. We discuss elsewhere (see <xref
+ linkend="svn.reposadmin.projects.chooselayout"/>) the
+ process of deciding how to choose a layout for the data in
+ your repositories—using one repository per project or
+ combining them, arranging stuff within your repository, and
+ so on. But sometimes after new revisions start flying in,
+ you rethink your layout and would like to make some changes.
+ A common change is the decision to move multiple projects
+ which are sharing a single repository into separate
+ repositories for each project.</para>
+
+ <para>Our imaginary repository contains three projects:
+ <literal>calc</literal>, <literal>calendar</literal>, and
+ <literal>spreadsheet</literal>. They have been living
+ side-by-side in a layout like this:</para>
+
+ <screen>
+/
+ calc/
+ trunk/
+ branches/
+ tags/
+ calendar/
+ trunk/
+ branches/
+ tags/
+ spreadsheet/
+ trunk/
+ branches/
+ tags/
+</screen>
+
+ <para>To get these three projects into their own repositories,
+ we first dump the whole repository:</para>
+
+ <screen>
+$ svnadmin dump /path/to/repos > repos-dumpfile
+* Dumped revision 0.
+* Dumped revision 1.
+* Dumped revision 2.
+* Dumped revision 3.
+…
+$
+</screen>
+
+ <para>Next, run that dump file through the filter, each time
+ including only one of our top-level directories, and
+ resulting in three new dump files:</para>
+
+ <screen>
+$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile
+…
+$ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile
+…
+$ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile
+…
+$
+</screen>
+
+ <para>At this point, you have to make a decision. Each of
+ your dump files will create a valid repository,
+ but will preserve the paths exactly as they were in the
+ original repository. This means that even though you would
+ have a repository solely for your <literal>calc</literal>
+ project, that repository would still have a top-level
+ directory named <filename>calc</filename>. If you want
+ your <filename>trunk</filename>, <filename>tags</filename>,
+ and <filename>branches</filename> directories to live in the
+ root of your repository, you might wish to edit your
+ dump files, tweaking the <literal>Node-path</literal> and
+ <literal>Node-copyfrom-path</literal> headers to no longer have
+ that first <filename>calc/</filename> path component. Also,
+ you'll want to remove the section of dump data that creates
+ the <filename>calc</filename> directory. It will look
+ something like:</para>
+
+ <screen>
+Node-path: calc
+Node-action: add
+Node-kind: dir
+Content-length: 0
+
+</screen>
+
+ <warning>
+ <para>If you do plan on manually editing the dump file to
+ remove a top-level directory, make sure that your editor is
+ not set to automatically convert end-lines to the native
+ format (e.g. \r\n to \n) as the content will then not agree
+ with the metadata. This will render the dump file
+ useless.</para>
+ </warning>
+
+ <para>All that remains now is to create your three new
+ repositories, and load each dump file into the right
+ repository:</para>
+
+ <screen>
+$ svnadmin create calc; svnadmin load calc < calc-dumpfile
+<<< Started new transaction, based on original revision 1
+ * adding path : Makefile ... done.
+ * adding path : button.c ... done.
+…
+$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
+<<< Started new transaction, based on original revision 1
+ * adding path : Makefile ... done.
+ * adding path : cal.c ... done.
+…
+$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
+<<< Started new transaction, based on original revision 1
+ * adding path : Makefile ... done.
+ * adding path : ss.c ... done.
+…
+$
+</screen>
+
+ <para>Both of <command>svndumpfilter</command>'s subcommands
+ accept options for deciding how to deal with
+ <quote>empty</quote> revisions. If a given revision
+ contained only changes to paths that were filtered out, that
+ now-empty revision could be considered uninteresting or even
+ unwanted. So to give the user control over what to do with
+ those revisions, <command>svndumpfilter</command> provides
+ the following command-line options:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--drop-empty-revs</option></term>
+ <listitem>
+ <para>Do not generate empty revisions at all—just
+ omit them.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--renumber-revs</option></term>
+ <listitem>
+ <para>If empty revisions are dropped (using the
+ <option>--drop-empty-revs</option> option), change the
+ revision numbers of the remaining revisions so that
+ there are no gaps in the numeric sequence.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--preserve-revprops</option></term>
+ <listitem>
+ <para>If empty revisions are not dropped, preserve the
+ revision properties (log message, author, date, custom
+ properties, etc.) for those empty revisions.
+ Otherwise, empty revisions will only contain the
+ original datestamp, and a generated log message that
+ indicates that this revision was emptied by
+ <command>svndumpfilter</command>.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>While <command>svndumpfilter</command> can be very
+ useful, and a huge timesaver, there are unfortunately a
+ couple of gotchas. First, this utility is overly sensitive
+ to path semantics. Pay attention to whether paths in your
+ dump file are specified with or without leading slashes.
+ You'll want to look at the <literal>Node-path</literal> and
+ <literal>Node-copyfrom-path</literal> headers.</para>
+
+ <screen>
+…
+Node-path: spreadsheet/Makefile
+…
+</screen>
+
+ <para>If the paths have leading slashes, you should
+ include leading slashes in the paths you pass to
+ <command>svndumpfilter include</command> and
+ <command>svndumpfilter exclude</command> (and if they don't,
+ you shouldn't). Further, if your dump file has an inconsistent
+ usage of leading slashes for some reason,
+ <footnote>
+ <para>While <command>svnadmin dump</command> has a
+ consistent leading slash policy—to not include
+ them—other programs which generate dump data might
+ not be so consistent.</para>
+ </footnote>
+ you should probably normalize those paths so they all
+ have, or lack, leading slashes.</para>
+
+ <para>Also, copied paths can give you some trouble.
+ Subversion supports copy operations in the repository, where
+ a new path is created by copying some already existing path.
+ It is possible that at some point in the lifetime of your
+ repository, you might have copied a file or directory from
+ some location that <command>svndumpfilter</command> is
+ excluding, to a location that it is including. In order to
+ make the dump data self-sufficient,
+ <command>svndumpfilter</command> needs to still show the
+ addition of the new path—including the contents of any
+ files created by the copy—and not represent that
+ addition as a copy from a source that won't exist in your
+ filtered dump data stream. But because the Subversion
+ repository dump format only shows what was changed in each
+ revision, the contents of the copy source might not be
+ readily available. If you suspect that you have any copies
+ of this sort in your repository, you might want to rethink
+ your set of included/excluded paths.</para>
+
+ <para>Finally, <command>svndumpfilter</command> takes path
+ filtering quite literally. If you are trying to copy the
+ history of a project rooted at
+ <filename>trunk/my-project</filename> and move it into a
+ repository of its own, you would, of course, use the
+ <command>svndumpfilter include</command> command to keep all
+ the changes in and under
+ <filename>trunk/my-project</filename>. But the resulting
+ dump file makes no assumptions about the repository into
+ which you plan to load this data. Specifically, the dump
+ data might begin with the revision which added the
+ <filename>trunk/my-project</filename> directory, but it will
+ <emphasis>not</emphasis> contain directives which would
+ create the <filename>trunk</filename> directory itself
+ (because <filename>trunk</filename> doesn't match the
+ include filter). You'll need to make sure that any
+ directories which the new dump stream expect to exist
+ actually do exist in the target repository before trying to
+ load the stream into that repository.</para>
+
+ </sect2>
+
+ <!-- =============================================================== -->
<sect2 id="svn.reposadmin.maint.replication">
<title>Repository Replication</title>
@@ -2286,35 +2321,19 @@
distribute heavy Subversion load across multiple servers, use
as a soft-upgrade mechanism, and so on.</para>
- <para>The <command>svnsync</command> program, which is new to
- the 1.4.0 release of Subversion, provides all the
- functionality required for maintaining a read-only mirror of a
- Subversion repository.</para>
-
- <screen>
-$ svnsync help
-general usage: svnsync SUBCOMMAND DEST_URL [ARGS & OPTIONS ...]
-Type 'svnsync help <subcommand>' for help on a specific subcommand.
-Type 'svnsync --version' to see the program version and RA modules.
-
-Available subcommands:
- initialize (init)
- synchronize (sync)
- copy-revprops
- help (?, h)
-$
-</screen>
-
- <para><command>svnsync</command> works by essentially asking the
+ <para>As of version 1.4, Subversion provides a program for
+ managing scenarios like
+ these—<command>svnsync</command>.
+ <command>svnsync</command> works by essentially asking the
Subversion server to <quote>replay</quote> revisions, one at a
time. It then uses that revision information to mimic a
commit of the same to another repository. Neither repository
- needs to be locally accessible to
- <command>svnsync</command>—its parameters are repository
- URLs, and it does all its work through Subversion's repository
- access interfaces. All you need is read access to the source
- repository; commit access and revision property modification
- access to the destination repository.</para>
+ needs to be locally accessible to machine on which
+ <command>svnsync</command> is running—its parameters are
+ repository URLs, and it does all its work through Subversion's
+ repository access (RA) interfaces. All it requires is read
+ access to the source repository and read/write access to the
+ destination repository.</para>
<note>
<para>When using <command>svnsync</command> against a remote
More information about the svnbook-dev
mailing list