Thoughts on Chapter 5

C. Michael Pilato cmpilato at
Sat Feb 24 18:00:00 CST 2007

Brian W. Fitzpatrick wrote:
> OK.  This chapter covers a *ton* of data about an arcane subject and
> it's a nice fluid read, but reading the chapter end-to-end, I felt
> like I had to wade through a *ton* of BDB minutiae that 99% of the
> repository admins won't ever have to deal with.  I don't have a
> solution in mind for this, but I found it to be distracting and wonder
> if we can't better title sections that are BDB specific so that FSFS
> admins don't have to read all the way through just to find out that it
> doesn't apply to them.

Yes, that was something we talked about doing -- I simply forgot to do
it.  :-)

> You may need to take some of these comments with a grain of salt as I
> personally don't recommend that people use bdb at all.  Aren't we
> going to prescribe one over the other?

Even if we prescribe one, the book should still contain the information
necessary to assist those who chose the other one.  I personally would
still stay on this side of an outright prescription of FSFS; but yeah, I
think we should be able to say something like, "These days, most folks
choose FSFS for its flexibility in various deployment scenarios and ease
of administration."

> "Planning Your Repository Organization":
> - one other reason to have separate repositories is when you have
> completely different types of data in each project: eg, one project
> has source code, and another has 100MB Photoshop files in it.

Really?  Why is that?  (I can't quickly think of a reason why that would
actually matter.)

> - The last example of repository organization is one that I've rarely
> seen used.  Shouldn't we recommend that most folks use the 1st example
> for multiple projects in a single repo (i.e., I'm not seeing a lot of
> "prescription" here, but mostly "description"

I'm not sure how you missed the prescription-ness of that section.  And
I do think it useful to point out "the other way" (which yes, does still
get used).

> "Choosing a Data Store":
> In the table:
> - "Scalability: repository size": I don't understand what this
> means--does this mean that fsfs repositories take up less space on
> disk or that you can't use it for repositories with tons of data (and
> if it's the latter, I think it's incorrect--Apache uses fsfs).

That could be more clear, yes?  I'm pretty sure that when Ben added
this, he was talking about space consumed on disk.

> - "Performance: Isn't BDB < 10% faster than FSFS in checking out the
> latest revision?  I thought ghudson mailed stats on this to the list
> that showed it's a negligible difference.

I'll have to dig that up to verify.  (Does anyone else on this list have
a pointer to some stats?)

> -We should note that BDB has an extra dependency: BDB itself


> - Also, doesn't FSFS deal better with mixed repository access
> mechanisms (http:// + svn://)?  Should we mention this?

Well, it deals better mixed access by different *OS users*.  BDB has no
problem doing http:// + svn:// if httpd and svnserve run as the same
user.  But I dunno how to make this fit into a smallish table.  :-)

> - Footnote starting "Berkely DB requires": Maybe mention that *no*
> remote filesystem implementation currently does this right?

Why?  It's flatly untrue.

> - BDB & FSFS subsections: Maybe these could be divided into a
> "summary" and "gritty details" part?  I really doubt that most admins
> give a hoot that BDB directory mods are O(n^2) and FSFS's are O(n).

Oh, I'm happy to toss that little bit altogether.

> - FSFS subsection:  fsfs really isn't "immature" any more, and it's
> been stress tested a lot.  I'd say that this paragraph is mostly FUD
> and should go.

Agreed.  (Though, it's hard not to remember two relatively recent data
lossage bugs in the backend ... something we've never had with BDB.)

> "Creating the Repository"
> - Maybe move the 1st tip up a little bit?
> - Make the Warning more threatening?  We had some dude on the #svn
> channel talking about how he edited one of his rev files (I am *not*
> kidding).


> "svndumpfilter":
> - 1st footnote:  I used to agree that the inability to obliterate a
> rev is a feature, but after talking to dozens of people in various
> roles (open source, closed source, including the BSD dudes), I now
> think that it *is* a missing feature.  FreeBSD *can't* have to do
> something that would require thousands of people to recheckout huge
> working copies (eg the ports tree).


> "removing dead transaction"
> - Isn't this BDB only?  I thought these were no-ops in fsfs...

Gosh, I should hope not.  It has file-based transactions, too, which
could get left around on disk.  svnadmin rmtxns doesn't have an
BDB-specific code in it.

> "repository recovery":
> - This should be specified as BDB specific in the title


> "repository replication":
> - The 'svn>' prompt confused me--I thought it was some sort of weird
> svn shell at first.

Yeah, that's not necessary.  I'll drop it.

> - using the username 'syncprop' in your examples is extremely
> confusing--reminds me of properties.  Can't we use harry or sally?

I thought I used "syncproc" (as in "synchronization process").  I don't
want to use harry or sally because I go out the way to recommend that
you setup a custom user for sync stuffs.  Oops!  I see now that
sometimes I typed "syncprop" by accident.  Will fix.  Maybe I'll just
make everything use "syncuser", which is more clear.

Thanks, dude.

C. Michael Pilato <cmpilato at>

"The Christian ideal has not been tried and found wanting.  It has
 been found difficult; and left untried."  -- G. K. Chesterton

More information about the svnbook-dev mailing list