THATCamp SE 2012: Getting Data to Play Well Together

I have to say that I greatly enjoyed this year’s THATCamp SE, largely because I got to meet some great new Campers (@micahvandegrift, @Musebrarian, @DonnaLanclos, @georgiawebgurl, and others). I also had many stimulating and productive discussions. Leeann Hunter’s “Technology of Human Interaction,” opened up a necessary reflection about the value of face to face interaction in the digital humanities and digital pedagogy; Paul Fyfe’s “Rebooting Graduate Education,” explored the possibility of integrating #alt-ac and DH skills into graduate curriculum; and Amanda French and Micah Vandegrift even started a THATCamp Bibliography project that will hopefully become the beginnings of a movement to archive the vast amount of data emerging weekly in THATCamps across the globe.

I have to say that the conversation surrounding Linked Open Data was my favorite. I say this not because I proposed the session, I actually considered cancelling it before it happened, but rather because I enjoyed the conversation that emerged between myself, Paul Bogen, Richard Urban, Laurie Taylor, Mark Sullivan, Patrick Murray John (on Twitter at least), and others over how practical LOD is and what hurdles might emerge in the quest to create an open data web. You can see a Storify of that conversation here. If you look closely, what emerges is a very interesting moment where my poststructuralist/semiotic training breaks completely down.

[blackbirdpie id=”178868405531721729″]

[blackbirdpie id=”178868468655980546″]

[blackbirdpie id=”178868728908353536″]

The difficulty concerned my understanding of uniform resource indicators (URIs). For those of you who don’t know, URIs are unique signifiers of data objects. If URLs uniquely reference locations, URIs refer to objects. The problem comes when one realizes that URIs not only have to refer to the same object regardless of social context, they also have to do so throughout time. There are, as Patrick Murray-John pointed out, practical solutions for this. Harry Halpin and Patrick Hayes explain that while URIs were initially going to be re-used for different data sets referring to the same thing, in reality

URIs are simply minted anew for each identifier in a Linked Data set. As opposed to the simple exporting of data-sets into RDF, what puts the links in Linked Data is the use of what we term identity links – links that define two things to be identical or otherwise closely related – to link between diverse and heterogeneous data-sets. While there has been some research that deals with this problem, the scope of the problem is just beginning to be understood.

Some programmers use the markup owl:sameAs to link two separate URIs as identical or similar for the purposes of generating new datasets. The problem, as Halpin and Hayes argue, is that the structure of owl:sameAs has difficulty distinguishing between two individual URIs that simply share similar properties, and two individual URIs in which the shared property is derived from their participation in the same class.

The authors use a real-world example of translation in demonstrating how this can quickly become a problem for programmers.

[N]ames from different languages cannot be substituted for each other often. Sentences like “Qomolangma is a word in Tibetan” mention a name, while the sentence “Mount Everest is the highest mountain in the world” uses the name. Obviously, “Mount Everest is a word in Tibetan” is false. Is “Qomolangma is the highest mountain in the world” true? This fact was not necessarily known in Tibet before the era of global geological surveys. So one could easily have a case of a geoscientist who has never visited the mountain knowing it is the highest mountain in the world and a Tibetan monk who lives not too far from the mountain not knowing – or caring – if it is the highest mountain in the world.

owl:sameAs depends upon the idea that we can code data with a consistent syntax, and that truth is derivable from correspondence. As we discussed RDF and owl:sameAs in the LOD session, I kept thinking about URIs as the programming equivalent of the Ariekei language in China Mieville’s Embassytown. If you haven’t read it, the Ariekai is an alien spieces in Mieville’s novel that makes no distinction between words and objects. Consequently, they cannot tell lies and must conceptualize similes as physical objects (like people) engaged in events, and this helps them expand their understanding of the world around them. They cannot, for example, understand that something is like a tree unless they experience that tree – and their understanding of the tree is limited entirely to the tree as a unique object within a unique event. The tree is not just a tree, it is a “tree blowing in the wind on a dark night.” This is radically different from how poststructuralism and cultural studies conceptualize language: in which the meaning of language persists precisely because people lie, interpret things differently, or change the meaning of words and phrases to suit different contexts.

By contrast, Richard Urban mentioned to me that Tim Berners-Lee formed much of his ideas regarding the semantics of RDF from British philosopher Alfred Tarski. This point is repeated by Graham Klyne when he argues that “[m]uch practice to formalize meaning [in semantic programming] builds upon Tarski’s work in semantics and formal logic, in particular the ideas of Model Theory, which have been adopted by some in the artificial intelligence (AI) community for representing knowledge about the physical world.”

What is Model Theory? Model theory, like much formal analytical philosophy, becomes complicated quickly. But it is based upon Tarski’s argument of confirming truth in logical statements. At its most basic level, Tarski expressed this argument formally:

‘p’ is true if and only if ‘p.’

Tarski goes on to argue that the statement only reflects formal languages, not natural ones. But it makes me wonder if the distinction between formal and natural languages really holds up when talking about Linked Open Data, and whether the correspondence theory of truth that is foundational to RDF works when dealing with datasets from across different cultures. Stewart Varner mentioned to me on the drive back from THATCamp SE that programming languages sometimes have different “accents” in different parts of the world. There might be slight differences in how people in Asia signify loops in Python, for example. As an aside, there might be an entirely different element to this argument in considering how specific programming languages might exert imperial control over others, or colonize them in the name of providing clean links.

Ultimately, though, I’m less interested in whether or not Tarski was correct in embracing formal correspondence, and more interested in how these conceptions of meaning impact the tensions apparent (for me) in the DH community. It will be immediately obvious to most of you that the comparison I’m making is a reductive binary. But then again, the beginnings of most conversations are necessarily reductive. I offer this idea not as an iron-clad argument, but more a the beginnings of a (very) rough thought-experiment that addresses cultural tension and attempts to illustrate how theory can help us think through some of these tensions.

I’ll put forward the hypothesis that DH, in its participation in coding culture and in its tighter focus on hacking, is enmeshed in a history that embraces Tarski’s correspondence theory of truth. What kinds of insights could we glean from this? There are two potential answers. First, we could say that the correspondence theory makes it easier for DH practitioners to focus on constructing equivalencies between heterogeneous things: whether they be bits of computer code or people at a THATCamp. Second, and related, the abstract nature of Tarski’s formal logic could make it more difficult for DH to focus on questions of perspective or identity. Identity becomes, with Tarski, a kind of variable – a signifier that (sometimes) confuses class for property.

And I’ll suggest that critical approaches to DH (embodied in #transformdh or the exhortation to apply critical or cultural theory to DH) emerge from a continental tradition that concentrates on difference and perspective. This concentration leads to valuable insights about the power relationships that can be obscured by the drive to collaborate in DH. Yet, concentrating so much on difference and perspective also (sometimes) makes it more difficult to collaborate or to imagine objects and ideas that aren’t reducible to a specific perspective.

Again, things are undoubtedly more complicated than the picture I painted here in two paragraphs. I’m definitely being Hegelian in my presentation of history as a play of ideas, and I haven’t yet considered the social conditions that might also contribute to this divide. However, I want to offer the idea that “theory” isn’t some abstract thing that can be dispensed of before we go off and build something. As Jean Bauer argued, it is embedded in our practices as digital humanists and in our code. Understanding code requires understanding the culture in which code is produced. Theory can, and does, have a direct and practical impact on how, why, or whether particular coding projects succeed.

Comments

Leave a Reply