[svnbook commit] r2712 - trunk/src/en/book

cmpilato noreply at red-bean.com
Mon Feb 26 20:13:14 CST 2007


Author: cmpilato
Date: Mon Feb 26 20:13:14 2007
New Revision: 2712

Modified:
   trunk/src/en/book/ch05-repository-admin.xml

Log:
* src/en/book/ch05-repository-admin.xml
  Move the guts of dump-filtering into its own new section, leaving only 
  a stub description of svndumpfilter in the Administrator's Toolkit
  section.  Also created a new stub in that section for svnsync.


Modified: trunk/src/en/book/ch05-repository-admin.xml
==============================================================================
--- trunk/src/en/book/ch05-repository-admin.xml	(original)
+++ trunk/src/en/book/ch05-repository-admin.xml	Mon Feb 26 20:13:14 2007
@@ -1019,14 +1019,48 @@
         by Subversion's own tools.</para>
 
       <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
+      <sect3 id="svn.reposadmin.maint.tk.svnadmin">
+        <title>svnadmin</title>
+
+        <para>The <command>svnadmin</command> program is the
+          repository administrator's best friend.  Besides providing
+          the ability to create Subversion repositories, this program
+          allows you to perform several maintenance operations on
+          those repositories.  The syntax of
+          <command>svnadmin</command> is similar to that of other
+          Subversion command-line programs:</para>
+
+        <screen>
+$ svnadmin help
+general usage: svnadmin SUBCOMMAND REPOS_PATH  [ARGS & OPTIONS ...]
+Type 'svnadmin help <subcommand>' for help on a specific subcommand.
+Type 'svnadmin --version' to see the program version and FS modules.
+
+Available subcommands:
+   crashtest
+   create
+   deltify
+…
+</screen>
+
+        <para>We've already mentioned <command>svnadmin</command>'s
+          <literal>create</literal> subcommand (see <xref
+          linkend="svn.reposadmin.basics.creating"/>).  Most of the
+          others we will cover as they become topically relevant later
+          in this chapter.  And you can consult <xref
+          linkend="svn.ref.svnadmin" /> for a full rundown of
+          subcommands and what each of them offers.</para>
+
+      </sect3>
+
+      <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
       <sect3 id="svn.reposadmin.maint.tk.svnlook">
         <title>svnlook</title>
             
         <para><command>svnlook</command> is a tool provided by
           Subversion for examining the various revisions and
           transactions in a repository.  No part of this program
-          attempts to change the repository—it's a
-          <quote>read-only</quote> tool.  <command>svnlook</command>
+          attempts to change the repository.  <command>svnlook</command>
           is typically used by the repository hooks for reporting the
           changes that are about to be committed (in the case of the
           <command>pre-commit</command> hook) or that were just
@@ -1054,19 +1088,9 @@
           itself, or how it differs from the previous revision of the
           repository.  You use the <option>--revision</option> and
           <option>--transaction</option> options to specify which
-          revision or transaction, respectively, to examine.  Note
-          that while revision numbers appear as natural numbers,
-          transaction names are alphanumeric strings.  Keep in mind
-          that the filesystem only allows browsing of uncommitted
-          transactions (transactions that have not resulted in a new
-          revision).  Most repositories will have no such
-          transactions, because transactions are usually either
-          committed (in which case, you should access them as revision
-          with the <option>--revision</option> option) or aborted and
-          removed.</para>
-
-        <para>In the absence of both the <option>--revision</option>
-          and <option>--transaction</option> options,
+          revision or transaction, respectively, to examine.  In the
+          absence of both the <option>--revision</option> and
+          <option>--transaction</option> options,
           <command>svnlook</command> will examine the youngest (or
           <quote>HEAD</quote>) revision in the repository.  So the
           following two commands do exactly the same thing when 19 is
@@ -1087,6 +1111,15 @@
 $ svnlook youngest /path/to/repos
 19
 </screen>
+
+        <note>
+          <para>Keep in mind that the only transactions you can browse
+            are uncommitted ones.  Most repositories will have no such
+            transactions, because transactions are usually either
+            committed (in which case, you should access them as
+            revision with the <option>--revision</option> option) or
+            aborted and removed.</para>
+        </note>
             
         <para>Output from <command>svnlook</command> is designed to be
           both human- and machine-parsable.  Take as an example the output
@@ -1123,7 +1156,7 @@
         <para>This output is human-readable, meaning items like the
           datestamp are displayed using a textual representation
           instead of something more obscure (such as the number of
-          nanoseconds since the Tasty Freeze guy drove by).  But this
+          nanoseconds since the Tasty Freeze guy drove by).  But the
           output is also machine-parsable—because the log
           message can contain multiple lines and be unbounded in
           length, <command>svnlook</command> provides the length of
@@ -1134,88 +1167,14 @@
           in the event that this output is not the last bit of data in
           the stream.</para>
 
-        <para>Another common use of <command>svnlook</command> is to
-          actually view the contents of a revision or transaction
-          tree.  The <command>svnlook tree</command> command displays
-          the directories and files in the requested tree.  If you
-          supply the <option>--show-ids</option> option, it will also
-          show the filesystem node revision IDs for each of those
-          paths (which is generally of more use to developers than to
-          users).</para>
-
-        <screen>
-$ svnlook tree /path/to/repos --show-ids
-/ <0.0.1>
- A/ <2.0.1>
-  B/ <4.0.1>
-   lambda <5.0.1>
-   E/ <6.0.1>
-    alpha <7.0.1>
-    beta <8.0.1>
-   F/ <9.0.1>
-  mu <3.0.1>
-  C/ <a.0.1>
-  D/ <b.0.1>
-   gamma <c.0.1>
-   G/ <d.0.1>
-    pi <e.0.1>
-    rho <f.0.1>
-    tau <g.0.1>
-   H/ <h.0.1>
-    chi <i.0.1>
-    omega <k.0.1>
-    psi <j.0.1>
- iota <1.0.1>
-</screen>
-
-        <para>Once you've seen the layout of directories and files in
-          your tree, you can use commands like <command>svnlook
-          cat</command>, <command>svnlook propget</command>, and
-          <command>svnlook proplist</command> to dig into the details
-          of those files and directories.</para>
-
         <para><command>svnlook</command> can perform a variety of
-          other queries, displaying subsets of bits of information
-          we've mentioned previously, reporting which paths were
-          modified in a given revision or transaction, showing textual
-          and property differences made to files and directories, and
-          so on.  See <xref linkend="svn.ref.svnlook" /> for a full
-          reference of <command>svnlook</command>'s features.</para>
-
-      </sect3>
-
-      <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
-      <sect3 id="svn.reposadmin.maint.tk.svnadmin">
-        <title>svnadmin</title>
-
-        <para>The <command>svnadmin</command> program is the
-          repository administrator's best friend.  Besides providing
-          the ability to create Subversion repositories, this program
-          allows you to perform several maintenance operations on
-          those repositories.  The syntax of
-          <command>svnadmin</command> is similar to that of
-          <command>svnlook</command>:</para>
-
-        <screen>
-$ svnadmin help
-general usage: svnadmin SUBCOMMAND REPOS_PATH  [ARGS & OPTIONS ...]
-Type 'svnadmin help <subcommand>' for help on a specific subcommand.
-Type 'svnadmin --version' to see the program version and FS modules.
-
-Available subcommands:
-   crashtest
-   create
-   deltify
-…
-</screen>
-
-        <para>We've already mentioned <command>svnadmin</command>'s
-          <literal>create</literal> subcommand (see <xref
-          linkend="svn.reposadmin.basics.creating"/>).  Most of the
-          others we will cover as they become topically relevant later
-          in this chapter.  And you can consult <xref
-          linkend="svn.ref.svnadmin" /> for a full rundown of
-          subcommands and what each of them offers.</para>
+          other queries: displaying subsets of bits of information
+          we've mentioned previously, recursively listing versioned
+          directory trees, reporting which paths were modified in a
+          given revision or transaction, showing textual and property
+          differences made to files and directories, and so on.  See
+          <xref linkend="svn.ref.svnlook" /> for a full reference of
+          <command>svnlook</command>'s features.</para>
 
       </sect3>
 
@@ -1223,61 +1182,12 @@
       <sect3 id="svn.reposadmin.maint.tk.svndumpfilter">
         <title>svndumpfilter</title>
 
-        <para>Since Subversion stores your versioned history using, at
-          the very least, binary differencing algorithms and data
-          compression (optionally in a completely opaque database
-          system), attempting manual tweaks is unwise, if not quite
-          difficult, and at any rate strongly discouraged.  And once
-          data has been stored in your repository, Subversion
-          generally doesn't provide an easy way to remove that data.
-          <footnote>
-            <para>That's rather the reason you use version control at
-              all, right?</para>
-          </footnote>
-          But inevitably, there will be times when you would like to
-          manipulate the history of your repository.  You might need
-          to strip out all instances of a file that was accidentally
-          added to the repository (and shouldn't be there for whatever
-          reason).
-          <footnote>
-            <para>Conscious, cautious removal of certain bits of
-              versioned data is actually supported by real use-cases.
-              That's why an <quote>obliterate</quote> feature has been
-              one of the most highly requested Subversion features,
-              and one which the Subversion developers hope to soon
-              provide.</para>
-          </footnote>
-          Or, perhaps you have multiple projects sharing a
-          single repository, and you decide to split them up into
-          their own repositories.  To accomplish tasks like this,
-          administrators need a more manageable and malleable
-          representation of the data in their repositories—the
-          Subversion repository dump format.</para>
-
-        <para>The Subversion repository dump format is a
-          human-readable representation of the changes that you've
-          made to your versioned data over time.  You use the
-          <command>svnadmin dump</command> command to generate the
-          dump data, and <command>svnadmin load</command> to populate
-          a new repository with it (see <xref
-          linkend="svn.reposadmin.maint.migrate"/>).  The great thing about the
-          human-readability aspect of the dump format is that, if you
-          aren't careless about it, you can manually inspect and
-          modify it.  Of course, the downside is that if you have three
-          years' worth of repository activity encapsulated in what is
-          likely to be a very large dump file, it could take you a
-          long, long time to manually inspect and modify it.</para>
-
         <para>While it won't be the most commonly used tool at the
           administrator's disposal, <command>svndumpfilter</command>
           provides a very particular brand of useful
           functionality—the ability to quickly and easily modify
-          that dump data by acting as a path-based filter.  Simply
-          give it either a list of paths you wish to keep, or a list
-          of paths you wish to not keep, then pipe your repository
-          dump data through this filter.  The result will be a
-          modified stream of dump data that contains only the
-          versioned paths you (explicitly or implicitly) requested.</para>
+          streams of Subversion repository history data by acting as a
+          path-based filter.</para>
 
         <para>The syntax of <command>svndumpfilter</command> is as
           follows:</para>
@@ -1287,7 +1197,7 @@
 general usage: svndumpfilter SUBCOMMAND [ARGS & OPTIONS ...]
 Type "svndumpfilter help <subcommand>" for help on a specific subcommand.
 Type 'svndumpfilter --version' to see the program version.
-
+  
 Available subcommands:
    exclude
    include
@@ -1316,230 +1226,52 @@
           </varlistentry>
         </variablelist>
 
-        <para>Let's look a realistic example of how you might use this
-          program.  We discuss elsewhere (see <xref
-          linkend="svn.reposadmin.projects.chooselayout"/>) the
-          process of deciding how to choose a layout for the data in
-          your repositories—using one repository per project or
-          combining them, arranging stuff within your repository, and
-          so on.  But sometimes after new revisions start flying in,
-          you rethink your layout and would like to make some changes.
-          A common change is the decision to move multiple projects
-          which are sharing a single repository into separate
-          repositories for each project.</para>
-
-        <para>Our imaginary repository contains three projects:
-          <literal>calc</literal>, <literal>calendar</literal>, and
-          <literal>spreadsheet</literal>.  They have been living
-          side-by-side in a layout like this:</para>
+        <para>You can learn more about these subcommands and
+          <command>svndumpfilter</command>'s unique purpose in <xref
+          linkend="svn.reposadmin.maint.filtering" />.</para>
 
-        <screen>
-/
-   calc/
-      trunk/
-      branches/
-      tags/
-   calendar/
-      trunk/
-      branches/
-      tags/
-   spreadsheet/
-      trunk/
-      branches/
-      tags/
-</screen>
-
-        <para>To get these three projects into their own repositories,
-          we first dump the whole repository:</para>
-
-        <screen>
-$ svnadmin dump /path/to/repos > repos-dumpfile
-* Dumped revision 0.
-* Dumped revision 1.
-* Dumped revision 2.
-* Dumped revision 3.
-…
-$
-</screen>
+      </sect3>
 
-        <para>Next, run that dump file through the filter, each time
-          including only one of our top-level directories, and
-          resulting in three new dump files:</para>
+      <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
+      <sect3 id="svn.reposadmin.maint.tk.svnsync">
+        <title>svnsync</title>
 
-        <screen>
-$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile
-…
-$ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile
-…
-$ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile
-…
-$
-</screen>
+        <para>The <command>svnsync</command> program, which is new to
+          the 1.4 release of Subversion, provides all the
+          functionality required for maintaining a read-only mirror of
+          a Subversion repository.  The program really has one
+          job—to transfer one repository's versioned history
+          into another repository.  And while there are few ways to do
+          that, its primary strength is that it can operate
+          remotely—the <quote>source</quote> and
+          <quote>sink</quote>
+          <footnote>
+            <para>Or is that, the <quote>sync</quote>?</para>
+          </footnote>
+          repositories may be on different computers from each other
+          and from <command>svnsync</command> itself.</para>
 
-        <para>At this point, you have to make a decision.  Each of
-          your dump files will create a valid repository,
-          but will preserve the paths exactly as they were in the
-          original repository.  This means that even though you would
-          have a repository solely for your <literal>calc</literal>
-          project, that repository would still have a top-level
-          directory named <filename>calc</filename>.  If you want
-          your <filename>trunk</filename>, <filename>tags</filename>,
-          and <filename>branches</filename> directories to live in the
-          root of your repository, you might wish to edit your
-          dump files, tweaking the <literal>Node-path</literal> and
-          <literal>Node-copyfrom-path</literal> headers to no longer have
-          that first <filename>calc/</filename> path component.  Also,
-          you'll want to remove the section of dump data that creates
-          the <filename>calc</filename> directory.  It will look
-          something like:</para>
+        <para>As you might expect, <command>svnsync</command> has a
+          syntax that looks very much like every other program we've
+          mentioned in this chapter:</para>
 
         <screen>
-Node-path: calc
-Node-action: add
-Node-kind: dir
-Content-length: 0
-
-</screen>
-
-        <warning>
-          <para>If you do plan on manually editing the dump file to
-            remove a top-level directory, make sure that your editor is
-            not set to automatically convert end-lines to the native
-            format (e.g. \r\n to \n) as the content will then not agree
-            with the metadata.  This will render the dump file
-            useless.</para>
-        </warning>
-
-        <para>All that remains now is to create your three new
-          repositories, and load each dump file into the right
-          repository:</para>
+$ svnsync help
+general usage: svnsync SUBCOMMAND DEST_URL  [ARGS & OPTIONS ...]
+Type 'svnsync help <subcommand>' for help on a specific subcommand.
+Type 'svnsync --version' to see the program version and RA modules.
 
-        <screen>
-$ svnadmin create calc; svnadmin load calc < calc-dumpfile
-<<< Started new transaction, based on original revision 1
-     * adding path : Makefile ... done.
-     * adding path : button.c ... done.
-…
-$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
-<<< Started new transaction, based on original revision 1
-     * adding path : Makefile ... done.
-     * adding path : cal.c ... done.
-…
-$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
-<<< Started new transaction, based on original revision 1
-     * adding path : Makefile ... done.
-     * adding path : ss.c ... done.
-…
+Available subcommands:
+   initialize (init)
+   synchronize (sync)
+   copy-revprops
+   help (?, h)
 $
 </screen>
 
-        <para>Both of <command>svndumpfilter</command>'s subcommands
-          accept options for deciding how to deal with
-          <quote>empty</quote> revisions.  If a given revision
-          contained only changes to paths that were filtered out, that
-          now-empty revision could be considered uninteresting or even
-          unwanted.  So to give the user control over what to do with
-          those revisions, <command>svndumpfilter</command> provides
-          the following command-line options:</para>
-
-        <variablelist>
-          <varlistentry>
-            <term><option>--drop-empty-revs</option></term>
-            <listitem>
-              <para>Do not generate empty revisions at all—just
-                omit them.</para>
-            </listitem>
-          </varlistentry>
-          <varlistentry>
-            <term><option>--renumber-revs</option></term>
-            <listitem>
-              <para>If empty revisions are dropped (using the
-                <option>--drop-empty-revs</option> option), change the
-                revision numbers of the remaining revisions so that
-                there are no gaps in the numeric sequence.</para>
-            </listitem>
-          </varlistentry>
-          <varlistentry>
-            <term><option>--preserve-revprops</option></term>
-            <listitem>
-              <para>If empty revisions are not dropped, preserve the
-                revision properties (log message, author, date, custom
-                properties, etc.) for those empty revisions.
-                Otherwise, empty revisions will only contain the
-                original datestamp, and a generated log message that
-                indicates that this revision was emptied by
-                <command>svndumpfilter</command>.</para>
-            </listitem>
-          </varlistentry>
-        </variablelist>
-        
-        <para>While <command>svndumpfilter</command> can be very
-          useful, and a huge timesaver, there are unfortunately a
-          couple of gotchas.  First, this utility is overly sensitive
-          to path semantics.  Pay attention to whether paths in your
-          dump file are specified with or without leading slashes.
-          You'll want to look at the <literal>Node-path</literal> and
-          <literal>Node-copyfrom-path</literal> headers.</para>
-
-        <screen>
-…
-Node-path: spreadsheet/Makefile
-…
-</screen>
-
-        <para>If the paths have leading slashes, you should
-          include leading slashes in the paths you pass to
-          <command>svndumpfilter include</command> and
-          <command>svndumpfilter exclude</command> (and if they don't,
-          you shouldn't).  Further, if your dump file has an inconsistent
-          usage of leading slashes for some reason,
-          <footnote>
-            <para>While <command>svnadmin dump</command> has a
-              consistent leading slash policy—to not include
-              them—other programs which generate dump data might
-              not be so consistent.</para>
-          </footnote>
-          you should probably normalize those paths so they all
-          have, or lack, leading slashes.</para>
-
-        <para>Also, copied paths can give you some trouble.
-          Subversion supports copy operations in the repository, where
-          a new path is created by copying some already existing path.
-          It is possible that at some point in the lifetime of your
-          repository, you might have copied a file or directory from
-          some location that <command>svndumpfilter</command> is
-          excluding, to a location that it is including.  In order to
-          make the dump data self-sufficient,
-          <command>svndumpfilter</command> needs to still show the
-          addition of the new path—including the contents of any
-          files created by the copy—and not represent that
-          addition as a copy from a source that won't exist in your
-          filtered dump data stream.  But because the Subversion
-          repository dump format only shows what was changed in each
-          revision, the contents of the copy source might not be
-          readily available.  If you suspect that you have any copies
-          of this sort in your repository, you might want to rethink
-          your set of included/excluded paths.</para>
-
-        <para>Finally, <command>svndumpfilter</command> takes path
-          filtering quite literally.  If you are trying to copy the
-          history of a project rooted at
-          <filename>trunk/my-project</filename> and move it into a
-          repository of its own, you would, of course, use the
-          <command>svndumpfilter include</command> command to keep all
-          the changes in and under
-          <filename>trunk/my-project</filename>.  But the resulting
-          dump file makes no assumptions about the repository into
-          which you plan to load this data.  Specifically, the dump
-          data might begin with the revision which added the
-          <filename>trunk/my-project</filename> directory, but it will
-          <emphasis>not</emphasis> contain directives which would
-          create the <filename>trunk</filename> directory itself
-          (because <filename>trunk</filename> doesn't match the
-          include filter).  You'll need to make sure that any
-          directories which the new dump stream expect to exist
-          actually do exist in the target repository before trying to
-          load the stream into that repository.</para>
+        <para>We talk more about replication repositories with
+          <command>svnsync</command> in <xref
+          linkend="svn.reposadmin.maint.replication" />.</para>
 
       </sect3>
 
@@ -2005,12 +1737,29 @@
         various back-end data store files in a fashion generally
         understood by (and of interest to) only the Subversion
         developers themselves.  However, circumstances may arise that
-        call for all, or some subset, of that data to be collected
-        into a single, portable, flat file format and copied or moved
-        into another repository.  Subversion provides such
-        functionality, implemented in a pair of
-        <command>svnadmin</command> subcommands:
-        <literal>dump</literal> and <literal>load</literal>.</para>
+        call for all, or some subset, of that data to be copied to
+        moved into another repository.</para>
+
+      <para>Subversion provides such functionality by way of
+        repository dump streams.  A repository dump stream (often
+        referred to as a <quote>dumpfile</quote> when stored as a file
+        on disk) is a portable, flat file format that describes the
+        various revisions in your repository—what was changed,
+        by whom, when, and so on.  This dump stream is the primary
+        mechanism used to marshal versioned history—in whole or
+        in part, with or without modification—between
+        repositories.</para>
+
+      <warning>
+        <para>While the Subversion repository dump format contains
+          human-readable portions and a familiar structure (it
+          resembles an RFC-822 format, the same type of format used
+          for most email), it is <emphasis>not</emphasis> a plaintext
+          file format.  The format should be treated as a binary file
+          format, highly sensitive to meddling.  Many text editor
+          tools will corrupt the file's contents, often due to
+          automatic line ending character conversion.</para>
+      </warning>
 
       <para>There are many reasons for dumping and loading Subversion
         repository data.  Early in Subversion's life, the most common
@@ -2025,16 +1774,17 @@
         new OS or CPU architecture, or switching between the Berkeley
         DB and FSFS back-ends.</para>
 
-      <para>Whatever your reason, using the <command>svnadmin
-        dump</command> and <command>svnadmin load</command>
-        subcommands is straightforward.  <command>svnadmin
-        dump</command> will output a range of repository revisions
-        that are formatted using Subversion's custom filesystem dump
-        format.  The dump format is printed to the standard output
-        stream, while informative messages are printed to the standard
-        error stream.  This allows you to redirect the output stream
-        to a file while watching the status output in your terminal
-        window.  For example:</para>
+      <para>Whatever your reason for migration repository history,
+        using the <command>svnadmin dump</command> and
+        <command>svnadmin load</command> subcommands is
+        straightforward.  <command>svnadmin dump</command> will output
+        a range of repository revisions that are formatted using
+        Subversion's custom filesystem dump format.  The dump format
+        is printed to the standard output stream, while informative
+        messages are printed to the standard error stream.  This
+        allows you to redirect the output stream to a file while
+        watching the status output in your terminal window.  For
+        example:</para>
 
       <screen>
 $ svnlook youngest myrepos
@@ -2097,9 +1847,10 @@
         repository—the same thing you get by making commits
         against that repository from a regular Subversion client.  And
         just as in a commit, you can use hook programs to perform
-        actions before and after each of the commits made during a load
-        process.  By passing the <option>--use-pre-commit-hook</option> 
-        and <option>--use-post-commit-hook</option> options to
+        actions before and after each of the commits made during a
+        load process.  By passing the
+        <option>--use-pre-commit-hook</option> and
+        <option>--use-post-commit-hook</option> options to
         <command>svnadmin load</command>, you can instruct Subversion
         to execute the pre-commit and post-commit hook programs,
         respectively, for each loaded revision.  You might use these,
@@ -2109,8 +1860,9 @@
         post-commit hook sends emails to a mailing list for each new
         commit, you might not want to spew hundreds or thousands of
         commit emails in rapid succession at that list for each of the
-        loaded revisions!  You can read more about the use of hook 
-        scripts in <xref linkend="svn.reposadmin.create.hooks"/>.</para>
+        loaded revisions!  You can read more about the use of hook
+        scripts in <xref
+        linkend="svn.reposadmin.create.hooks"/>.</para>
 
       <para>Note that because <command>svnadmin</command> uses
         standard input and output streams for the repository dump and
@@ -2188,8 +1940,8 @@
 $ svnadmin dump myrepos --revision 2001:3000 --incremental > dumpfile3
 </screen>
 
-      <para>These dump files could be loaded into a new repository with
-        the following command sequence:</para>
+      <para>These dump files could be loaded into a new repository
+        with the following command sequence:</para>
 
       <screen>
 $ svnadmin load newrepos < dumpfile1
@@ -2213,10 +1965,11 @@
 
       <para>The dump format can also be used to merge the contents of
         several different repositories into a single repository.  By
-        using the <option>--parent-dir</option> option of <command>svnadmin
-        load</command>, you can specify a new virtual root directory
-        for the load process.  That means if you have dump files for
-        three repositories, say <filename>calc-dumpfile</filename>,
+        using the <option>--parent-dir</option> option of
+        <command>svnadmin load</command>, you can specify a new
+        virtual root directory for the load process.  That means if
+        you have dump files for three repositories, say
+        <filename>calc-dumpfile</filename>,
         <filename>cal-dumpfile</filename>, and
         <filename>ss-dumpfile</filename>, you can first create a new
         repository to hold them all:</para>
@@ -2256,15 +2009,9 @@
         repository dump format—conversion from a different
         storage mechanism or version control system altogether.
         Because the dump file format is, for the most part,
-        human-readable,
-        <footnote>
-          <para>The Subversion repository dump format resembles
-            an RFC-822 format, the same type of format used for most
-            email.</para>
-        </footnote>
-        it should be relatively easy to describe generic sets of
-        changes—each of which should be treated as a new
-        revision—using this file format.  In fact, the
+        human-readable, it should be relatively easy to describe
+        generic sets of changes—each of which should be treated
+        as a new revision—using this file format.  In fact, the
         <command>cvs2svn</command> utility (see <xref
         linkend="svn.forcvs.convert"/>) uses the dump format to
         represent the contents of a CVS repository so that those
@@ -2273,6 +2020,294 @@
     </sect2>
 
     <!-- =============================================================== -->
+    <sect2 id="svn.reposadmin.maint.filtering">
+      <title>Filtering Repository History</title>
+
+      <para>Since Subversion stores your versioned history using, at
+        the very least, binary differencing algorithms and data
+        compression (optionally in a completely opaque database
+        system), attempting manual tweaks is unwise, if not quite
+        difficult, and at any rate strongly discouraged.  And once
+        data has been stored in your repository, Subversion
+        generally doesn't provide an easy way to remove that data.
+        <footnote>
+          <para>That's rather the reason you use version control at
+            all, right?</para>
+        </footnote>
+        But inevitably, there will be times when you would like to
+        manipulate the history of your repository.  You might need
+        to strip out all instances of a file that was accidentally
+        added to the repository (and shouldn't be there for whatever
+        reason).
+        <footnote>
+          <para>Conscious, cautious removal of certain bits of
+            versioned data is actually supported by real use-cases.
+            That's why an <quote>obliterate</quote> feature has been
+            one of the most highly requested Subversion features,
+            and one which the Subversion developers hope to soon
+            provide.</para>
+        </footnote>
+        Or, perhaps you have multiple projects sharing a
+        single repository, and you decide to split them up into
+        their own repositories.  To accomplish tasks like this,
+        administrators need a more manageable and malleable
+        representation of the data in their repositories—the
+        Subversion repository dump format.</para>
+
+      <para>As we described in <xref
+        linkend="svn.reposadmin.maint.migrate" />, the
+        Subversion repository dump format is a human-readable
+        representation of the changes that you've made to your
+        versioned data over time.  You use the <command>svnadmin
+        dump</command> command to generate the dump data, and
+        <command>svnadmin load</command> to populate a new
+        repository with it (see <xref
+        linkend="svn.reposadmin.maint.migrate"/>).  The great thing
+        about the human-readability aspect of the dump format is
+        that, if you aren't careless about it, you can manually
+        inspect and modify it.  Of course, the downside is that if
+        you have three years' worth of repository activity
+        encapsulated in what is likely to be a very large dump file,
+        it could take you a long, long time to manually inspect and
+        modify it.</para>
+
+      <para>That's where <command>svndumpfilter</command> becomes
+        useful.  This program acts as path-based filter for
+        repository dump streams.  Simply give it either a list of
+        paths you wish to keep, or a list of paths you wish to not
+        keep, then pipe your repository dump data through this
+        filter.  The result will be a modified stream of dump data
+        that contains only the versioned paths you (explicitly or
+        implicitly) requested.</para>
+
+      <para>Let's look a realistic example of how you might use this
+        program.  We discuss elsewhere (see <xref
+        linkend="svn.reposadmin.projects.chooselayout"/>) the
+        process of deciding how to choose a layout for the data in
+        your repositories—using one repository per project or
+        combining them, arranging stuff within your repository, and
+        so on.  But sometimes after new revisions start flying in,
+        you rethink your layout and would like to make some changes.
+        A common change is the decision to move multiple projects
+        which are sharing a single repository into separate
+        repositories for each project.</para>
+
+      <para>Our imaginary repository contains three projects:
+        <literal>calc</literal>, <literal>calendar</literal>, and
+        <literal>spreadsheet</literal>.  They have been living
+        side-by-side in a layout like this:</para>
+
+      <screen>
+/
+   calc/
+      trunk/
+      branches/
+      tags/
+   calendar/
+      trunk/
+      branches/
+      tags/
+   spreadsheet/
+      trunk/
+      branches/
+      tags/
+</screen>
+
+      <para>To get these three projects into their own repositories,
+        we first dump the whole repository:</para>
+
+      <screen>
+$ svnadmin dump /path/to/repos > repos-dumpfile
+* Dumped revision 0.
+* Dumped revision 1.
+* Dumped revision 2.
+* Dumped revision 3.
+…
+$
+</screen>
+
+      <para>Next, run that dump file through the filter, each time
+        including only one of our top-level directories, and
+        resulting in three new dump files:</para>
+
+      <screen>
+$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile
+…
+$ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile
+…
+$ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile
+…
+$
+</screen>
+
+      <para>At this point, you have to make a decision.  Each of
+        your dump files will create a valid repository,
+        but will preserve the paths exactly as they were in the
+        original repository.  This means that even though you would
+        have a repository solely for your <literal>calc</literal>
+        project, that repository would still have a top-level
+        directory named <filename>calc</filename>.  If you want
+        your <filename>trunk</filename>, <filename>tags</filename>,
+        and <filename>branches</filename> directories to live in the
+        root of your repository, you might wish to edit your
+        dump files, tweaking the <literal>Node-path</literal> and
+        <literal>Node-copyfrom-path</literal> headers to no longer have
+        that first <filename>calc/</filename> path component.  Also,
+        you'll want to remove the section of dump data that creates
+        the <filename>calc</filename> directory.  It will look
+        something like:</para>
+
+      <screen>
+Node-path: calc
+Node-action: add
+Node-kind: dir
+Content-length: 0
+  
+</screen>
+
+      <warning>
+        <para>If you do plan on manually editing the dump file to
+          remove a top-level directory, make sure that your editor is
+          not set to automatically convert end-lines to the native
+          format (e.g. \r\n to \n) as the content will then not agree
+          with the metadata.  This will render the dump file
+          useless.</para>
+      </warning>
+
+      <para>All that remains now is to create your three new
+        repositories, and load each dump file into the right
+        repository:</para>
+
+      <screen>
+$ svnadmin create calc; svnadmin load calc < calc-dumpfile
+<<< Started new transaction, based on original revision 1
+     * adding path : Makefile ... done.
+     * adding path : button.c ... done.
+…
+$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
+<<< Started new transaction, based on original revision 1
+     * adding path : Makefile ... done.
+     * adding path : cal.c ... done.
+…
+$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
+<<< Started new transaction, based on original revision 1
+     * adding path : Makefile ... done.
+     * adding path : ss.c ... done.
+…
+$
+</screen>
+
+      <para>Both of <command>svndumpfilter</command>'s subcommands
+        accept options for deciding how to deal with
+        <quote>empty</quote> revisions.  If a given revision
+        contained only changes to paths that were filtered out, that
+        now-empty revision could be considered uninteresting or even
+        unwanted.  So to give the user control over what to do with
+        those revisions, <command>svndumpfilter</command> provides
+        the following command-line options:</para>
+
+      <variablelist>
+        <varlistentry>
+          <term><option>--drop-empty-revs</option></term>
+          <listitem>
+            <para>Do not generate empty revisions at all—just
+              omit them.</para>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term><option>--renumber-revs</option></term>
+          <listitem>
+            <para>If empty revisions are dropped (using the
+              <option>--drop-empty-revs</option> option), change the
+              revision numbers of the remaining revisions so that
+              there are no gaps in the numeric sequence.</para>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term><option>--preserve-revprops</option></term>
+          <listitem>
+            <para>If empty revisions are not dropped, preserve the
+              revision properties (log message, author, date, custom
+              properties, etc.) for those empty revisions.
+              Otherwise, empty revisions will only contain the
+              original datestamp, and a generated log message that
+              indicates that this revision was emptied by
+              <command>svndumpfilter</command>.</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+      
+      <para>While <command>svndumpfilter</command> can be very
+        useful, and a huge timesaver, there are unfortunately a
+        couple of gotchas.  First, this utility is overly sensitive
+        to path semantics.  Pay attention to whether paths in your
+        dump file are specified with or without leading slashes.
+        You'll want to look at the <literal>Node-path</literal> and
+        <literal>Node-copyfrom-path</literal> headers.</para>
+
+      <screen>
+…
+Node-path: spreadsheet/Makefile
+…
+</screen>
+
+      <para>If the paths have leading slashes, you should
+        include leading slashes in the paths you pass to
+        <command>svndumpfilter include</command> and
+        <command>svndumpfilter exclude</command> (and if they don't,
+        you shouldn't).  Further, if your dump file has an inconsistent
+        usage of leading slashes for some reason,
+        <footnote>
+          <para>While <command>svnadmin dump</command> has a
+            consistent leading slash policy—to not include
+            them—other programs which generate dump data might
+            not be so consistent.</para>
+        </footnote>
+        you should probably normalize those paths so they all
+        have, or lack, leading slashes.</para>
+
+      <para>Also, copied paths can give you some trouble.
+        Subversion supports copy operations in the repository, where
+        a new path is created by copying some already existing path.
+        It is possible that at some point in the lifetime of your
+        repository, you might have copied a file or directory from
+        some location that <command>svndumpfilter</command> is
+        excluding, to a location that it is including.  In order to
+        make the dump data self-sufficient,
+        <command>svndumpfilter</command> needs to still show the
+        addition of the new path—including the contents of any
+        files created by the copy—and not represent that
+        addition as a copy from a source that won't exist in your
+        filtered dump data stream.  But because the Subversion
+        repository dump format only shows what was changed in each
+        revision, the contents of the copy source might not be
+        readily available.  If you suspect that you have any copies
+        of this sort in your repository, you might want to rethink
+        your set of included/excluded paths.</para>
+
+      <para>Finally, <command>svndumpfilter</command> takes path
+        filtering quite literally.  If you are trying to copy the
+        history of a project rooted at
+        <filename>trunk/my-project</filename> and move it into a
+        repository of its own, you would, of course, use the
+        <command>svndumpfilter include</command> command to keep all
+        the changes in and under
+        <filename>trunk/my-project</filename>.  But the resulting
+        dump file makes no assumptions about the repository into
+        which you plan to load this data.  Specifically, the dump
+        data might begin with the revision which added the
+        <filename>trunk/my-project</filename> directory, but it will
+        <emphasis>not</emphasis> contain directives which would
+        create the <filename>trunk</filename> directory itself
+        (because <filename>trunk</filename> doesn't match the
+        include filter).  You'll need to make sure that any
+        directories which the new dump stream expect to exist
+        actually do exist in the target repository before trying to
+        load the stream into that repository.</para>
+
+    </sect2>
+  
+    <!-- =============================================================== -->
     <sect2 id="svn.reposadmin.maint.replication">
       <title>Repository Replication</title>
 
@@ -2286,35 +2321,19 @@
         distribute heavy Subversion load across multiple servers, use
         as a soft-upgrade mechanism, and so on.</para>
 
-      <para>The <command>svnsync</command> program, which is new to
-        the 1.4.0 release of Subversion, provides all the
-        functionality required for maintaining a read-only mirror of a
-        Subversion repository.</para>
-
-      <screen>
-$ svnsync help
-general usage: svnsync SUBCOMMAND DEST_URL  [ARGS & OPTIONS ...]
-Type 'svnsync help <subcommand>' for help on a specific subcommand.
-Type 'svnsync --version' to see the program version and RA modules.
-
-Available subcommands:
-   initialize (init)
-   synchronize (sync)
-   copy-revprops
-   help (?, h)
-$
-</screen>
-
-      <para><command>svnsync</command> works by essentially asking the
+      <para>As of version 1.4, Subversion provides a program for
+        managing scenarios like
+        these—<command>svnsync</command>.
+        <command>svnsync</command> works by essentially asking the
         Subversion server to <quote>replay</quote> revisions, one at a
         time.  It then uses that revision information to mimic a
         commit of the same to another repository.  Neither repository
-        needs to be locally accessible to
-        <command>svnsync</command>—its parameters are repository
-        URLs, and it does all its work through Subversion's repository
-        access interfaces.  All you need is read access to the source
-        repository; commit access and revision property modification
-        access to the destination repository.</para>
+        needs to be locally accessible to machine on which
+        <command>svnsync</command> is running—its parameters are
+        repository URLs, and it does all its work through Subversion's
+        repository access (RA) interfaces.  All it requires is read
+        access to the source repository and read/write access to the
+        destination repository.</para>
 
       <note>
         <para>When using <command>svnsync</command> against a remote




More information about the svnbook-dev mailing list