Prosopography software




















We are currently starting up the required public tender procedures to inform potential candidates in the private sector of this opportunity. The contract for this development project will be awarded via a European negotiated e-tender procedure with publication of a contract notice. Latest developments in the digital sphere offered new opportunities and challenges to the humanists. Equipped with new digital methods of text analysis, scholars in various fields of humanities are now trying to make sense of huge corpora of literary and historical texts.

Perhaps the most prominent of such attempts is the work of Franco Moretti and his abstract models for literary history that trace long-term patterns in English fiction. Working with the corpus of 10 biographical collections about printed volumes; 45, biographical accounts , I am developing an analytical tool that can be later used to study other biographical collections—ideally, all of them together.

The analysis of different combinations of these data allows one to trace various social, religious and cultural patterns in time and space. I am particularly interested in how the Islamic world changed over the period of — CE: how cultural centers were shifting; how different social, professional and religious groups were replacing and displacing each other; how different regions were connected with each other and how these connections changed over time.

The results of my analysis will be presented in the form of graphs and geographical maps Some current examples of my work can be found at www. Note that the boxes and their labels immediately tell the user what kind of data the DPRR dataset holds that can be used for filtering and there are even more filtering items off the bottom of this screen shot that are also available. This facetted search approach implements interface strategies used in other commonly used sites such as Amazon's, and is designed to help users with a limited knowledge of a field to find things that they want.

The intent of the design is to allow historians of the Roman Republic to use this page effectively with what is only now-a-days conventional web-access skills. Of course, there is not surprisingly more to the RDF server's web-oriented interface than this page alone, so only so much can be learned by examining it critically by itself. Nonetheless, even though we are only looking here at one of the pages that the RDF Server can show to us, one can quickly see that the RDF's server's web interface is built under a very different set of assumptions about the kind of user who will be working with it.

Indeed, although there is a banner at the top of the screen that identifies it with DPRR, the DPRR RDF server's public interface is not specific to DPRR in the way that the facetted search browser presented earlier is, but represents instead a general kind of interface that could be used with any collection of RDF data on any subject. The web-browser interface for DPRR we looked at a moment ago has been tailored specifically to make front and centre how DPRR's materials are organised and to show under what semantic issues they operate.

In fact, this screen is part of rdf4j 's RDF workbench interface which has been specifically designed by the rdf4j 's developers to work usefully with any kind of RDF data.

The important point at this moment is that this browser interface says something about its intended audience: to use it one needs to have a solid technical familiarity with RDF and its technologies, and to be capable of exploiting materials presented in this way. This article will look briefly at some of the other parts of its interface that is derived from the rdf4j workbench later.

I have chosen to include in this article HTML links and forms that actually invoke the DPRR RDF Server, based on the principle that by actually sending readers to the server, they will be better enabled to explore for themselves what the server is doing. Therefore, I recommend that you, the reader, click on the provided links and thus directly engage with the server yourself. The links are set up to cause your browser to open them in a new tab or window. Thus, to return to this article, you can simple close the display the link or form created when you are done with it.

Furthermore, if for some reason you are unable to make the links work you can instead find screen captures in the appendix which show what appears in my Firefox browser when I click on the links. Each figure in the article is linked to the the spot in the article where it is needed. The first point to notice is that the server supports the fundamental principle stated by Berners-Lee and others about open data: that all entities in the data have globally defined URIs for them, and if one gives the URI for any one of these entities to the web as a URL, one gets data back from the server about it.

A screen shot showing what my browser gives me in response is given in the appendix as Figure 3. The data about the same historical person, Cicero, is all included in the web page returned to the browser too, but it is wrapped in rather more complex HTML which has been tailored specifically to represent DPRR Person data, and which is designed to present visually well in a conventional browser for a human reader.

There are two ways to do this. This approach can be relatively readily done if you use the http support in most programming languages such as Python or Java. However, if you are trying to use a web browser to fetch data as simple RDF it is difficult to follow these W3C guidelines and to control the mime-type the browser will specify in the HTTP request it generates for you.

The packaged presentation of DPRR's reader-friendly view is clearly more straightforward for a non-technical DPRR user to understand: that is the intent of its design. Furthermore, the DPRR development team worked to combine together data from various related parts of the DPRR dataset to create a unified and concise presentation that appears assembled together on a single web page.

Since, as we have seen, the browser-oriented interface delivers information about Cicero in a way that is more user-friendly, who would want to use the RDF Server's representation when arguably the browser-oriented presentation is easier for us to read? This question takes us to the point of the Semantic Web and Linked Open Data too: that it expresses its materials in a highly structured form RDF that is suitable for further processing rather than just human viewing.

Whereas arguably the browser-oriented presentation is easier for a person to interpret, it is not as straightforward to use when the purpose is to gather data from it for further processing.

In contrast, RDF has straightforward and consistent structures that are easy to process in a programming language such as Python or Java. Why does having kinds of data other than just persons directly addressable via the WWW matter in what is, after all, a prosopography?

Being able to enter DPRR's data structures in any number of different ways makes possible fresh ways of looking at the data, something that would difficult to achieve if you could only enter the data through persons.

In order to make good use of the different kinds of interconnected data that the DPRR RDF server makes available beyond persons, one needs to know in some detail what is there and how it is organised. This is a place where an rdf4j workbench mechanism available in the RDF server comes in to be useful. Generally, one can navigate to the types display from the browser pages presented by the server via the menu of options on the left side.

DPRR is primarily built by harvesting data from 19th, 20th and 21st century scholarship. They allow one to develop a feel for the meaning of the data simply by browsing through the data itself. However, not all the types of data are immediately understandable in this way, and their relationship between each other can still be difficult to grasp.

It is time, therefore, to step away from its specifics to think about what this approach — providing a data-oriented historical site like DPRR as Linked Open Data in the sense that Tim Berners-Lee conceives of it — might mean for a humanities scholarly community.

So what is its connection, in and of itself, to the LOD perspective? One needs to start by thinking more about the two different kinds of engagement with LOD materials by web users which appear at different points in time in Berners-Lee's conception of Linked Data.

Here we see the authors proposing a data and semantically-rich extension to the already existing document-oriented web in a way that would allow ordinary folk without formal training in digital semantics to exploit this semantic richness.

The authors give a number of imagined examples of agent-based software that could automatically exploit formal semantic data across different sources. One example see page 36 tells us of a user who sends her agent software off to make an appointment with a medical specialist for her mom. To do this requires the agent to find specialists that fit with mom's prescribed treatment, then match up the appointment calendars for mom and those specialists.

The software agent also needs to take into account other parameters such as distance to the appointment, and the need for physical therapists. Allowing a user's software agent to perform this kind of complex task reliably requires that the material it works with must be highly structured and have appropriate software-accessible semantics formally available so that the software agent can, on its own without human intervention, connect it together correctly and exploit it.

In the ideal Semantic Web described by Berners-Lee et al in a human user would be able to safely delegate this task to their agent software and wouldn't need to worry about the details of how the agent did the job, although if she was interested she could ask the system how it went about carrying out the task and, since the computation would be based on structures that semantically mirror parts of our human understanding of the world, receive an answer that could be understood.

As a consequence there has been work in Computer Science to explore the somewhat simpler task of trying to make semantic web data help ordinary, non-technical users better search for things in which they are interested in the vast global internet-wide data graph.

Some of this work involves trying to find ways to enrich google-like searching which is centred primarily on very sophisticated Natural Language retrieval principles NLP applied to the WWW's text-oriented documents with semantically-structured material expressed in RDF and its associated technologies. When researchers tried to build systems that could jointly exploit RDF-like structured data as well as the text in Web pages they found it to be a real challenge.

A good summary of some of the thinking in this area from a few years ago can be found in [ Freitas et al ]. Most Google users are not familiar with the range of material that the web possesses when they start a Google search, and they phrase their question without knowing the structure or vocabulary applied to materials on the web.

In this sense, their querying is intuitive. Similarly, some of the engines that Freitas et al describe are meant to allow users to ask questions in a natural language without knowing much about the domains the data represents. These engines use a combination of NLP techniques combined with a sophisticated understanding of relevant RDF data with their ontologies that describe them, to provide a better query result than NLP could deliver on its own.

The aim is to allow users to come with what are intuitive text-oriented questions and get richer, more trustworthy, results than they would get from the NLP approaches against text-oriented documents alone.

There is a good summary of more recent thinking in this area in [ Noy et al ]. Of course, recent work by Google and others has shown that text-oriented big data strategies can achieve remarkable things with only vast amounts of almost-raw text as data without needing large amounts of hand crafted formal semantic data at all.

Thus, it would seem that if the Semantic Web vision is ever going to be achieved, the emergence of platforms that have rich, widely available, semantic data expressed in RDF and its associated technologies, combined with AI software of the kind envisioned here that can make use of it, are still something for the future.

Perhaps as the challenges of implementation of the ideas in the article became clearer, Berners-Lee began to think about the benefits of having the data without the sophisticated AI-like framework that would be needed to make the more sophisticated ideas of the article work.

In one of Berners-Lee's illustrations we see a person joining together data about what streets a new municipal water pumping station served with demographic data about those streets, and then being able to show how this town's new station was disproportionally serving the wealthier parts of the town.

If someone wishes to join up data from different sources like this they cannot be an intuitive user and take an intuitive approach based on only a limited understanding of the data one is querying. Instead, to join them together they need to understand in some detail the semantic structure and significance of their data sources, and know how to formally join them correctly. The important point for us here is that the Berners-Lee TED talk's researcher's discovery of the link between the new water plant and the people it served was made not with the aid of an intuitive google-like query, but by the deliberate bringing together of two sources of structured data in a way that no one else had done.

To achieve this, the data analyst needed, in some way, to be the opposite of intuitive. Furthermore, only in this way could the strength of the argument that arises from this water plant example come out of the semantic juxtaposition of the materials. These interactions with data are not like queries that are conceived of as Google-like semi-natural language expressions, where one cannot actually be sure either that the result one gets matches a natural human understanding of the query or that one gets all the material that a human would consider relevant to the question asked.

Instead, these crisp structured queries have a kind of processing model that, to the degree that the data being queried can be considered to be an accurate representation of its material admittedly, an important qualification and inasmuch as one can express what one is interested in in the formal nature of the query language, allows one to be sure of the completeness and accuracy of the result.

Can a classical scholar engage with the formally based mechanisms of DPRR with an intention that is similar to Berners-Lee's water plant mashup example? It works by allowing the SPARQL query creator to specify a pattern to look for in the RDF graph, and to display parts of the selected bits that match the pattern as results. A screen shot of what a browser shows when this is submitted is shown in the appendix as Figure The query looks for graph patterns in the DPRR RDF data that show women who are also recorded has holding offices, and displays the woman's name and the name of the office.

Soon thereafter you should receive a response from the Server showing, in a table, the names and offices of all women recorded as holding offices in the DPRR dataset or, click here to see a screen image in the appendix of the beginning of the server's response to this query. And indeed, the query text shown here could be copied and pasted into that screen and run from there, and would have produced essentially the same result as what one gets from the above form.

However, the query can also be run so that it returns results in a structured form more suitable for further processing. Here is the same query set up in a form that causes the result to be returned in JSON — a format suitable for further processing by platforms such as Python or Java if you are curious about JSON, a good starting point is Wikipedia's definition.

Results can also be returned in CSV format which can be opened as a spreadsheet, although this is not shown in this example. A screen capture of the beginning of the display generated by the query is shown in the appendix as Figure Having now briefly seen SPARQL as a querying mechanism against the DPRR dataset, perhaps the reader will still not find it obvious how such a thing could be relevant to the furtherance of the study of the Roman Republic. I can see three possible concerns:.

Whereas Berners-Lees examples draw data from disparate sources and joins them together to make their point, DPRR is, by itself, a single source. It will then find connections within a very large pool of demographic data, and allow aggregate analysis. Ultimately, Prosop aims to make the various historical description and categorization schemes themselves the subject of research. This presentation treats a methodological issue: the techniques that we use to deal with the tremendous volume of data generated by Middle East microhistories.

I will describe my sense of the challenges and potentialities of this aspect of our work, and discuss ways that I think Prosop can support collaborative historical work.



0コメント

  • 1000 / 1000