Quick thoughts on journalism and version control (now known as Github for News)

(Update: Read Daniel Bachhuber’s notes from the #bcniphilly 2011 session I led on GitHub for News)

Today at work I got my own sandbox for web development. In a Skype conversation with the lead developer, he said I could get a branch off the trunk for certain projects I might work on.

Basically, I would have a testing environment (sandbox) and it would make sense to also have a place (branch) to work on something separate from core development code (trunk) that the development team uses.

The larger theme here is version control, which I’ve known about generally for a while and am now exploring further as I start using — and reading up on — Subversion.

Version control is one of the parallels I drew between journalism and programming concepts in my second post inspired by computational thinking. Specifically:

Version control: When creating software, a core principle is keeping track of each iteration of the project. In the editing workflow of a news organization, you ideally keep track of different revisions, either on a single document (for The Hurricane, that would be in the WordPress admin) with a history or by saving a new document and noting who last saw it (as The Hurricane did before switching to WordPress).

Now that I better understand the tree, trunk and branch terms in the context of version control, I wonder why we don’t apply these principles to journalism — and how we could.

  • How can we best apply principles of version control to journalism?
  • How much easier would it be to collaborate on journalistic projects if we did?
  • How could it help open up the process, both inside and outside the newsroom?

In short, another instance of rethinking our thinking about how we practice journalism and what areas can better inform what we do.

A conceptual example

Say I’m a reporter working on a story about the Gulf of Mexico oil spill, that story would have its own “trunk.” The journalists in my newsroom would all work off that trunk. But we only have limited resources and we’re not personally experiencing the disaster, so we open up branches for business owners and residents affected by the spill to help them tell their own stories.

Or maybe I’m working on another on-going story, such as unemployment. There’s a trunk that the newsroom works on and branches for unemployed workers to also contribute.

In either case, you can bring in others to directly collaborate in some way. The branches could be working versions of content the collaborators are producing and need to be checked before being added to the trunk.

This is already being done in many places where people submit personal stories, photos, videos, etc. Applying version control concepts would be a way to better incorporate outside material (from other newsrooms or amateur contributors) on a level field, rather than relegating it to a separate and/or lesser space or doing so haphazardly.

The tool

Maybe the platform could be a simple system such as how Wikipedia, WordPress or Google Docs show revisions history. But it could also be as advanced as using Subversion itself, which is

a general system that can be used to manage any collection of files. For you, those files might be source code—for others, anything from grocery shopping lists to digital video mixdowns and beyond.

The ideal would probably be something in the middle that’s more robust than the three examples mentioned above yet simpler and more user-friendly than Subversion.

Applying version control — and other programming concepts — to journalism makes sense to me because of shared fundamentals such as working collaboratively, checking each others’ work and updating/revising.

Anyway, it’s an idea that popped into my head and, to reiterate the questions, made me wonder: could we apply version control concepts to journalism? And, if so, how could we best do this?

UPDATE: (7-11-10) Two posts relevant to the version control and journalism discussion that I’d completely forgotten about: Version Control for Campaign Promises by Brian Boyer and ProPublica’s ChangeTracker Lets You Watch Government’s Moves by Megan Taylor, which is about one of Brian’s projects.

12 thoughts on “Quick thoughts on journalism and version control (now known as Github for News)”

  1. Regardless of how usable some concepts from VCS would be for journalism (I like the “bringing branches from outside contributors back into trunk” metaphor!), I’m very wary of using version control software itself as a repository for story revisions, like e.g. http://messagecms.com/ does. Textual content just doesn’t lend itself to working on the same stuff simultanuously as well as source code does — it’s very difficult to keep a text a coherent whole if multiple people are messing around with it.

    Cheers!

    1. Thanks, Stijn. I had a similar concern about text content being worked on simultaneously. Nevertheless, I think that the concepts could possibly apply at a more abstract level and that it could be feasible to incorporate some kind of version control system — perhaps one custom-built for journalistic purposes.

    1. Yeah, Dropbox is a great tool — even more interesting considering some of the ideas Dave Winer has written about. Thanks for that “Subversion for Writers” link!

  2. Two thoughts:

    First, git is better, as is mercurial. Either one will make your life easier. Consider the git parable.

    Second, the concept of “the story” as a single block of text is tricky to work into version control’s concept of branching and merging, but the topic (which I think is what you’re actually getting at when you say “story”) might work nicely. That’s definitely an area where lots of threads are going in tandem, merging into a richer narrative.

  3. I think you can work on writing simultaneously only using the trunk, as long as you have small commits and are smart about it. The problem is branching.

    Branching wouldn’t really work for writing because of flow. You need to be able to merge your branch back to the head of the trunk eventually. This works with code because it’s so modular and branches usually work on completely separate files. But if you have everyone writing their own version of one article, there’s no way it would fit together on a merge.

    It makes more sense to have a master branch (trunk), like WordPress. I think anything more advanced would just increase editing overhead.

    Also keep in mind the difference between central and distributed version control. Something like git’s workflow might be more what you’re looking for.

  4. I just upgraded a Github repository to the new git-backed wiki system that Github is rolling out (see: http://github.com/blog/699-making-github-more-open-git-backed-wikis ). With this upgrade, each wiki is a set of simple text files that can be written in any of the common markup formats, e.g., Textile, Markdown, etc.

    They also rolled out a neat little tool called Gollum (http://github.com/github/gollum), which makes it easy to view and edit these wiki pages when offline.

    This brings a lot of what you describe above easily within reach, as stories could be contained in these wiki repositories. Each of these can be cloned and managed just like any git repository: branches, pull requests and all. More importantly, the Gollum tool makes it easy for non-technical folk to edit pages offline and make commits without needing to learn much (any?) of the Git lingo.

    Phillip.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.