You are on page 1of 4

Q&A: US Dept of Defense

How the world’s largest government department is deploying semantic technology With an annual budget of $550 billion, the US Department of Defense is the largest government department in the world. It employs 700,000 civilians and around 2.5 million soldiers – more people than any other organisation on the planet. Since its inception in 1949, the DoD has played a pioneering role in information technology. Its research arm, DARPA, invented the precursor to the Internet, ARPAnet, making its record for world-changing innovation pretty much unbeatable. But even with these credentials, the DoD has not escaped one of the most crippling afflictions in enterprise IT – siloed legacy data sources. As Dennis Wisnosky, chief architect and chief technology officer for the DoD‟s Business Mission Area, explains below, a recent attempt to unify the human resources databases of the various military services took 11 years and $1 billion before being scrapped as a failure. Now, though, under Wisnosky‟s leadership, the DoD is solving that problem with one of DARPA‟s own inventions: semantic technology. First developed to help intelligence agencies process data, Wisnosky is rolling it out to allow users to query data across the legacy silos without the need to create a data warehouse. Here, Wisnosky tells Information Age how he came to discover semantic technology, his plan for gradually introducing it to the senior management at the Pentagon and what he expects the benefits to be.

Information Age: What was it that prompted the search for a new approach to data? Dennis Wisnosky: There was a giant project here at the Department of Defense called DIMHRS (Defense Integrated Military Human Resources System), which was designed to be a single integrated personnel system that could access all the Army, Navy, Air Force and Marine Corps human resources databases around the world. But these were all static relational databases, and the cost and complexity of connecting them was extremely high. The services couldn‟t agree on any shared definitions, and the database schemas that they used were changing even as we were trying to connect them together. After 11 years and about $1 billion had been spent on the project, it became clear that the interconnection problems were never going to be solved. In January 2010, former deputy secretary of defense Gordon England cancelled the DIMHRS programme, just as he announced he was leaving office. What alternatives did you consider? Mr England told Congress that, instead of DIMHRS, we were going to build an enterprise data

The data doesn‟t persist – it‟s only there while you answer your question. and I described the situation with DIMHRS and said that we needed something new. the query language for relational databases. and we came across something called data virtualisation. After the talk. Every time you do one of these translations it consumes processor cycles. and „[Dennis] [works for] [the DoD]. but in the Marine Corps database. This means when you run a query. but data virtualisation uses translations – you translate the data into a common definition when you run the query. and SPARQLizer converts it into a SQL query and it goes into the relational database. someone came up to me and said „the Department of Defense already invented the new way‟. the query language for RDF. a definition of a „soldier‟ might be encoded in one way. which converts SPARQL. He was referring to semantic technology. Using the previous example. They are called that because they have three components: the subject. Put those together and you know that Dennis is a person who works for the DoD. in as near real time as possible. it means understanding that the meaning of the words „soldier‟ and „marine‟ is „service member‟. How does that link back to legacy systems? DW: There is a standard from Worldwide Web Consortium (W3C) called R2RML. We thought that was a great idea. But it had always been part of the DIMHRS plan to build a data warehouse. which can be used to translate relational databases into triple stores. How did you come across semantic technology? I was presenting at a conference. the record of all meanings is the ontology. „[Dennis] [is] [a person]‟. So we looked for approaches that did not involve a data warehouse. you reach into your authoritative data stores and aggregate it together in real time. the definition of a „marine‟ will be encoded in a different way. it wasn‟t going to happen.warehouse. It didn‟t take a rocket scientist to figure out that if you‟d tried to build one for 11 years and couldn‟t. and you end up with a very slow system. so you would have to translate them both to „service member‟ when you run a query. For example. I went to the DoD‟s deputy chief management officer and . We also have another technology called SPARQLizer. in an Army database. the predicate and the object. Implementation How did you decide whether or not it would work for you? After I heard about the technology. In semantic technology. which was developed in the 1990s by DARPA [the Defense Advanced Research Projects Agency] for the intelligence community. so we looked for other ways of doing it. which is made up of RDF [Resource Description Framework] triples. How would you explain the technology? Semantic technology involves defining the meaning of concepts. Examples might be. So you run a query based on the definitions in your ontology. which has the job of gathering and analysing both structured and unstructured data from around the world. into SQL.

This is about gradually getting familiar with this technology at a senior level. Benefits What do you anticipate to be the financial benefits? Well. When will the project go live? So far. we set up a four-year plan and a detailed two-year plan. all of our PODs have not been using live data. But from March 30th 2012. we are looking for areas where we can prove that this technology can help us. So we formed a team and set out to prove that we can take data from two separate sources. the cost of building and accessing these large data stores just has to go down. because 90 days is the speed of light here in the Department of Defense. I said I want to establish a little team. we showed that we could answer the question much more quickly than had been the case using the traditional methods. and it didn‟t work. On the ontology side. and a long-term conversion from one way of thinking to another. What is the implementation plan? After the first proof of delivery. we‟ve done eight 90-day proofs of delivery (PODs).said I had this idea that we really need to try. This is based on industry standards. . In one of the PODs. the DoD needed to find any service members that could speak Haitian creole or Haitian French. and we know our percentage of success. during the Haiti earthquake. For example. So far. That keeps us on track with what the rest of the industry is doing – we don‟t want to be behind the industry. and that could be deployed in 24 hours and had 12 months or more service time left. so we can recruit people from outside the organisation to work on this. federate it easily and come back with a result. and come back to you every 90 days with some results. one of the first proofs of concept we did was to answer the question: how many service members do we have in Afghanistan that can speak Arabic? It‟s a simple question. we know how much we spent doing this the old-fashioned way. On the business problem side. with every POD we add more data and use more of the standards that the community has developed. and the second is looking at new business problems to solve. the tools that certain senior personnel at the Pentagon use to answer questions – for example when they have to give information to Congress – will be querying the semantic information. People thought that I was looney. The plan has two parts: the first is building the ontology and keeping up to date with state-of-the-art semantic technology. But I don‟t like to think of this as a „switchover‟ to a new system – the DIMHRS project was about trying to do an instant switchover. which wasn‟t very good. but you need answers quickly. For example. but we also don‟t want to be too far ahead. And it worked. so when new data sets are created using these RDF triples.

www.information-age. 2011. We‟re going to learn how to trust the answers that we get from data a whole lot more than we do today. I think this is what will have the biggest impact. and that our data can be linked when it needs to be. I think we will be able to have unequivocal trust in the data that we convert into information (for a given point in time – all data is temporal). Projects like [US open government portal] data. Source: Pete are all beginning to use the same technology. “Q&A: US Dept of Defense”. Personally. and more certainty in decisions we have to make for the future. Ultimately.We‟re in talks all the time with people on the other side of the river here in Washington about how we can make sure that we‟re moving in the same and [tax transparency site] December 12. . What about the organisational benefits? There is a concept in semantic technology called „provenance‟. This is about building trust in the data that you have. That will lead to faster decisions we need to make now.