DH, Archival Silence, and Linked Open Data

The Linking Open Data Cloud Diagram, by Richard Cyganiak and Anja Jentzsch

I’m thinking through many of the interesting conversations occurring around Twitter and the DH blogosphere recently. First, Miriam Posner had a really powerful post about learning code and gender, where she argues that the broad exhortation to code covers up gender and diversity inequity. The large number of coding institutions, she cites Wikipedia as an example, are overwhelmingly male-dominated. “[M]en — middle-class white men, to be specific — are far more likely to have been given access to a computer and encouraged to use it at a young age. I love that you learned BASIC at age ten. But please realize that this has not been the case for all of us.” In a particularly thoughtful response in the  comments, Steven Ramsay describes the environment in a meet-and-greet session with male developers as “like a locker room. I counted three women in a group of at least fifty men, but that wasn’t even the worst of it. Porn joke? Check. Sports and warfare metaphors? Abounding. Do-or-die, you-win-or-you-suck vibe? Very much in evidence.”

Meanwhile, both Katherine Harris and Lauren Klein show how archives are often silent when it comes to the representation of minorities, and Lauren (in one of the most powerful instances of topic modeling I’ve found in recent criticism) shows how digital technology can think through those silences. You can check out Natalia Cecire’s Storify of the conversation I had with them.

Ted Underwood’s response to our conversation opened up some really useful ways to think about issues of representation and archives in DH.

Humanists are used to approaching debates about historical representation as if they were zero-sum questions. I suppose we are on some level still imagining this as a debate about canonicity — which is, as John Guillory pointed out, really a debate about space on the syllabus. Space on the syllabus is a zero-sum game. But the process of building big data is not zero-sum; it is cumulative. Every single thing you digitize is more good news for me, even if I shudder at the tired 2007-vintage assumptions implicit in your research agenda.

I have to agree with what he says in many ways, and I feel that (overall) the issue is less who is represented in the “canon” of data, as if there is a single canon, than who has access to and can manipulate data. This is one of the reasons why Miriam’s discussion is (in my mind) linked to the question of who is represented in large scale text-mining projects. Thinking through not just what appears in archives but also how those archives work and how we can use the data to make better archives is, to me, the same conversation. This doesn’t mean that everyone needs to code, but rather that scholars should understand how digital archives are put together and participate in building and rebuilding those archives.

Archives aren’t representations of historical reality, but rather political compositions in the Latourian sense. Digital scholars as compositionists need to “take […] up the task of building a common world” but with “the certainty that this common world has to be built from utterly heterogeneous parts that will never make a whole, but at best a fragile, revisable, and diverse composite material.” Scholars can’t stand from a safe distance and simply critique, since historical representation isn’t about struggling over a space that already exists, but about creating a fragile, revisable world that works for our current political situation. For me, the DH approach to building or hacking isn’t about neglecting the political importance of critique, but integrating critique into an ethos where participation is essential for progressive change. On one level, this may mean making new archives, on another it may mean using new forms of political association to create the commons that many of us currently lack.

Detail of the Linking Open Data Diagram

To my mind, the advent of linked open data is so exciting because it represents a different way of thinking about archives and data. Classically, the web is conceived as a relationship between text documents created primarily for human consumption. The semantic web, another name for the Latourian composite material created by linked data, is conceived as a set of relations between data that can be read by human or machine. It, thus, represents a logic that runs parallel to the arguments of Actor Network Theory, and Object Oriented Ontology, regarding the importance of non-human actors producing an effect. Tom Heath and Christian Bizer, in their introduction to Linked Data: Evolving the Web into a Global Data Space, argue that the semantic web “presents a revolutionary opportunity for deriving insight and value from data. By enabling seamless connections between data sets, we can transform how drugs are discovered, create rich pathways through diverse learning resources, spot previously unseen factors in road traffic accidents, and scrutinize more effectively the operation of our democratic systems.” I’d argue that we can also scrutinize more effectively the kinds of political compositions we create, and have the ability to consciously recreate those compositions for politically progressive ends.

I’d like to present an example of linked open data, to show what people are already doing with it. DBpedia converts information on Wikipedia to RDF (the web standard used in linked data) with the intention of allowing users to generate sophisticated queries. They cite several examples of sophisticated queries on their website, but to give you one example you could query Wikipedia for “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants.” The site also shows how data from Wikipedia can be easily visualized in a GoogleMap. Currently, the possibilities are limited by Wikipedia’s editorial policies and the querying powers of DBpedia’s programmers, but imagine a situation where you combine data about sixteenth-century women authors from OpenLibrary or GoogleDocs with the information contained on Wikipedia, query that information, then use the results to visualize the silences in any one of those individual archives.

The point here is not utopian. Linked data won’t usher in a new age where the angels of the semantic web deliver us from oppression and make everyone magically equal. Quite the contrary, the struggle will become even more complicated. Critique is a necessary, but by no means sufficient, response to the political struggles of today. We must 1) struggle against the forces of proprietary software, who assume that what is a public good is theirs for the taking; 2) create the tools that help us in our contemporary struggles; and 3) not be afraid to revise the tactics and composite materials that no longer work for us.


  1. Cool. It’s ironic that I was implying that linked open data is a long-term project, since actually I work with SEASR / Meandre a fair amount, and I believe it uses the RDF standard to define data flows. So … maybe not *that* far away.

    1. I think most people are trying to format their data in RDF, but they don’t quite know what to do with it yet. It’s good that projects like DBpedia are showing the way. 

Leave a Reply