Book Revisions with LaTeX and Git

As anticipated, a quiet summer around these parts as I revise my manuscript on the theory and mechanisms of midcentury fiction. A quick technical update and a couple of questions for those with experience using Git source control for writing projects.

I spent a chunk of the day today getting my head around Git. I’d been thinking about using it for a while and was helped along by my decision to dump Word in favor of LaTeX a couple of months ago; Word’s binary blobs aren’t well suited to version control (though that’s the least of Word’s problems, really). I also use Dropbox, which does basic automatic versioning, so I hadn’t had much reason to mess with the complexity of Git until now. But Dropbox (reasonably enough) only keeps a finite number of old versions of a file, and it doesn’t let you flag any of them to let your future self know what changed in any given rev. And there are a lot of revs, since it creates a new one every time you save a file (there’s no notion of a commit). This is all totally reasonable for Dropbox, which is a dead simple tool that’s made my working life better in every way. But I wanted more control as I hack away at my very long, slightly disorganized, heavily commented, totally in flux mid-revision book.

So … Git. What’s both cool and terrifying about Git is that it morphs the live files in your working directory as you switch from one branch or revision to another. See this concise explanation of the process from Ben Lynn. (Note to self: Do not switch branches while a file is open in your editor.) Git’s worth a look if you haven’t dealt with modern revision control systems before; much easier and niftier than my brief encounters with CVS years ago had lead me to believe.

Anyway, two questions for those more experienced with this stuff than I:

  1. I’m planning to use branches for the major edits to each chapter, so that I can easily go back and consult or restore the large sections that are inevitably hacked off along the way. Does this make sense? Are tags or clones more appropriate? Are branches overkill? Should I just trust my commented commits on a single trunk? What does your workflow for writing and revising with Git look like?
  2. Is there any reason not to combine Git and Dropbox? I’ve put my .git directory inside my current project directory, which already lives in my Dropbox folder. I can’t see any harm in this beyond a bit of redundancy, but I’d welcome any warnings from hard-won experience.

Two last things:

One, I’ll put the full manuscript on GitHub or similar once it’s no longer filled with embarrassing and/or libelous comments.

Two, tomorrow’s project is to merge the massive changes between the existing chapter on William Gaddis and the much more compact version that’s been accepted by Contemporary Literature. This is a good problem to have, but trying to manage it is the proximate cause of all this version control business.

Oh, and DH 2010 starts the day after tomorrow. Very sorry not to be in London, but I’ll have the #dh2010 firehose open next to TeXShop for the next few days.

10 thoughts on “Book Revisions with LaTeX and Git

  1. I’ve found branches for major changes help. At minimum they’re a nice cognitive reminder about what you’re doing. And I tag major releases (usually after I merge a big branch).

    Somewhat off topic, but if people are starting to write using TeX, it seems that the barriers to a (digital) humanities arXiv-like repository are getting lower.

  2. Thanks for the thoughts, Allen. Makes good sense. I’m also finding that formulating a reasonably detailed commit message at the end of the day helps me remember where I left off, even when I don’t go back to look at it. I used to do this in the body of the text, but of course would then overwrite it the next morning.

    I’d love to see something like arXiv for the humanities (or an arXiv section for the humanities). Like an institutional repository for the whole field. Would you agree that the barriers aren’t really technical but social? Is that what you meant by the move to TeX making such a thing easier, i.e., that it signals an openness to some of the more useful working practices of the physical sciences?

    On that front, I’d also add group meetings for graduate students and advisors. Easily one of the most useful social practices I saw in my time as a scientist.

  3. For my own future reference, here’s the workflow I’m currently using. Assume we begin on the master branch and want to work on major revisions or additions.

    # Create a new branch and switch to it
    git checkout -b [branch]

    # Edit away, commit at end of section and/or day
    git add .
    git commit -a

    # Finish a major piece, merge back to master, delete branch
    git checkout master
    git merge [branch]
    git branch -d [branch]

    # Tag the resulting state for future reference
    git tag -s [tag]

    Both commit and tag prompt for a message, which can also be supplied on the command line with -m "[message]". I find it easier to write detailed commit messages in a text editor.

  4. I’d say your branching strategy doesn’t actually gain you anything. Branches are good for making a sandbox for this ongoing piece of work so that it doesn’t conflict with other ongoing work.

    However, in a single-author project with a typical workflow, while the branch is being used, you’re not using the main branch. Then when you merge and delete the branch, you’re not using any other branch. Thus, there’s only one branch that’s ever useful or being used, so there’s no gain.

    If you want to revert or recover things, it doesn’t matter whether you’ve branched or not. All you need is to find the relevant commit (e.g. by searching git log -p, perhaps with the help of tags), and then perhaps doing an interactive rebase (sexy), or checkout-copy-checkout-paste-commit (boring) or whatever.

    Tags are useful for going back to a point you might want to do some time (like the first draft, or the point at which some external contribution occured).

    • Thanks, Mark – that makes good sense. I guess I can imagine a (very strained) case where branches could be useful, but you’re right that for the scenario I’m presenting here it’s hard to see how they help. And the added complexity just means more opportunities for confusion.

      Branches are just so nifty, though!

      • I have a similar workflow as well. Even though one branch is being worked on at a time, so technically the different branches do not have gains, that is not necessarily the case.

        Imagine sending a good rough draft to your publisher. Then, you get a crazy idea! You want to start changing some core concepts, re-work some major characters etc. etc. So you branch off and start working. Your master branch is always in a “releasable” state (or as close as you are in that moment). So while your other branch is crazy and has some drastic changes, if another publisher wants to see what you have, or you’re a student submitting to a conference, the master branch is always releasable, ready to go (or ready to show your advisor). If your PhD advisor wants to see the draft first thing in the morning, yes you could stash/stage/commit your current changes, use tags or search through the log, but why not keep separate branches?! Like you said they are nifty and with git they are cheap!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s