Web 2.0 Blog – Discovering Innovation Opportunities using Social Media

Posts Tagged ‘linked data

Opportunity:  Spending of government  money should have a purpose and that purpose should be for the benefit of someone whether directly or indirectly.  The benefit might for an employee to work better and that employee might be working to benefit a group of citizens. The administration wishes to create a more transparent, effective and innovative government as well as to reduce the federal deficit. In order to do this, the administration must identify opportunities for innovation which can increase efficiency as well as decrease spending and make the case to the American people that it is making more effective use of taxpayer funds.  I want to make the case here that linking spending data to benefits of that spending in ways which are detailed,  clear and relevant to large numbers of citizens is the best way to find innovations to create a more effective government as well as to make transparency have meaning and value for the average citizen.


  • Linking Spending to Benefits:  Federal spending is reported in ways which do not clearly connect it to the benefits that specific expenditures provide.  While certain dollar amounts may be reported as going toward ‘Defense’ that is not specific enough to understand whether a given expenditure is justifiable and doesn’t allow an expenditure of group expenditures to be compared to alternative solutions for the same specific benefit oriented goal.  Therefore we must find ways to better connect specific spending to specific benefits.
  • Benefits of expenditures must clear and relevant:  Benefits must be stated in ways which are relevant and understandable for a large number of citizens.  For example, a system which tracks resources in a government program is not relevant until it is connected to the benefits that program provides and to whom it provides those benefits.  Often times expenditures are reported as supportting a program, system or equipment but not clearly connected to an intermediary benefit it attempts to provide a person or to the outcome of the the program or equipment and its end beneficiaries.  What is relevant to the average citizen is not how systems support systems or programs support programs but how overall efforts affect people, in what way it affects those people, who those people are, and what is the cost of providing that benefit.    For instance in the case of a self-help kiosk at a federal office.  The relevant benefit is not how it supports the agency’s program but how many citizens does it serve, how frequently does it serve them, how well does it serve them and at what cost per citizen? 
  • Providing  Spending to Benefit visibiliy to a large audience will spur innovation.  Making the links between spending and outcome visible to a large audience is a critical step in identifying opportunities for innovation in government to increase government’s effectiveness.  Innovation comes from diverse people considering things in different ways (remember KIDFAD from Wisdom of the Crowds),  so making connections between spending and benefits broadly relevant and visible will provide the greatest opportunity for innovation in creating more effective means to achieve similar benefits.  Also innovation comes from novel approaches to address overal goals  so providing information on overall cost to an end benefit served to people provides the greatest opportunity to innovate other ways to provide the benefit.  If, for instance, you simply focused on the cost of gas for a truck to travel 1000 miles, rather than the benefit of transporting chairs on that truck, you might miss the opportunity to send it by train.   Of course if you focused on the goal of having chairs at  a location, you might notice that it might be cheaper to purchase them at the end destination rather than pack them up and transport them back and forth.
  • Meaningful Transparency.  Making the connection between benefits and spending in ways which the average citizen can understand and find relevant is required in order to achieve government transparency in a way in which transparency will have meaning for the average citizen.

Approach: Identify, Find and Link Disparate Data Sources which can clarify the benefits of Government Expenditures

              Datsets must be found which can connect government spending to both outcomes and benefits to people.  For instance, compete.com provides data on how many visitors a website receives.  Connecting the cost of a government website to the number of visitors it receives per year can give a cost per citizen served.  Therefore getting the free data provided by Compete.com and  linking it to the cost of a government website will provide more transparency and a clearly cost of the benefit provided.  This can then be compared to other ways of providing that same benefit of information delivery.   

Another example is connecting the expense of providing office furniture to a known number of employees in an agency can then make it clear, the cost of doing providing office support per employee which could be compared to private sector data.   

Connecting government expenditures to their benefits and making clear the cost per beneciary in relevant ways can become a starting point for encouraging innovation to make a more effective government as well as to give the idea of government transparency meaning and value to largest number of people. 

Case for Using the Resource Description Framework Or Linked Data Model:

While linking data can be done in many different ways,  I do want to give a plug for the linked data model in this instance, because in the long term, I believe it is the best way to connect government spending with the benefits of that spending. 

Of course connecting spending to benefits  is not always as simple as the examples I gave,  nor is the data easy to find and easy to connect.  In fact you may need to link multiple datasets in a chain to get the benefit information in a way which is relevant and broadly understandable.  The resource description framework or Linked Data model gives us a way to start to collect this kind of data in a distributed fashion without strict central control and does not even require it to be on the same server or system in order to be linkable.  This makes RDF or Linked Data an ideal candidate to complete the long term vision of linking complex federal spending data with its outcome and benefits in a way which can have meaning for the average American Citizen.


Tim Berners-Lee concept of linked data clearly is a way to make data more usable whether this is public data or data within a large enterprise.   Linked data promises a future which makes related data more interoperable, discoverable and opens the door for innovation.

But how do we take large existing data stores and apply linked data principles to achieve these benefits?  We currently have massive existing data stores with complex security regimes which are depended upon for many legacy applications.   To make them available as Linked Data is a huge challenge especially if we were to recreate these data stores in XML syntax using RDF/RDFa or even simpler XML schemas.  This is coupled with the fact that many of benefits of the reconstituted data have not yet been invented so an ROI argument cannot clearly be made.  Of course, they haven’t been invented  yet because while many can agree the data would be more usable, those uses must be discovered by fiddling with the data in linked form and discovering the uses that emerge.  Since the linked form,  doesn’t yet exist, we have the classic chicken in the egg problem.

Perhaps there is a step we can take toward linked data without making large changes to the existing data stores in government and industry.  Let’s review the principles of Linked Data first (as paraphrased from wikipedia to add clarity):

  • Use URIs (Unique Resource Identifiers) to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people can locate and look up (dereference) these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.

The striking thing about these principles is that they don’t mention XML or RDFa etc but focus instead on linking data to definitions.  So it would seem a hybrid solution between the linked data concept and existing databases is possible.  We could add URIs as fields in existing databases for important elements and define a central location where we will track information about that element.  For instance, in the US government there are lots of federal buildings used by multiple agencies.  So I would assume many agencies have databases which refer to federal buildings.  Why not establish a central location to define those buildings and assign each a URI. (A URI by the way is essentially a universal identifier for a real world object.  Essentially it is a web page for each building, but the page would more like contain data links than nice pictures.  (Oh and some people refer to URIs as URNs or Unique Resource Name in an effort to make them more human readable which is nice too) .

So each federal building would have a URI/URN and we could of course put more information about each building in a centrally defined schema, but that will start to be real work and have instant security issues.  So why not initially just have URIs contain recipricol links to databases which also contain that identifier?  The links would have brief non-security breaking descriptions of what type of data is stored in the database which is linked to.    This would remove the need to re-securitize a lot of information to make it cross-department/cross-agency available.   And here is the other key to success for this type of solution: Don’t require the back links to the databases to expose data unless they already do so.   If we start requiring data to be exposed in this step,  it opens up the security pandora’s box.   We need to avoid imposing a new security regime for centralized data,  because it is a stumbling block which would create delays and costs.  And if people do not clearly see the benefits of this step, then it would simply die in committee in most cases.

So that is fine you say.  We have URIs for important data elements and for databases which contain those elements but it is not exposing data,  so where is the benefit?  I think this stripped down version of linked data would have 4 definite benefits:

  • Reference.  The URIs could serve as reference documents to find where similar information is stored. Users could then apply for security permissions on an as needed basis when they need to link to other databases.
  • Innovation.  Users, who would now have a more complete map of available data could be begin to suggest more uses for linking the data.
  • Discoverability.  Search engines (internal or external depending on the security decided upon for the URNs) could make existing databases more discoverable because the engines could discover  important data elements in the databases.  Search engines make use of links to discoverable relevance to searches and are often key to researching problems .
  • Interoperability.  The process of assigning URIs will begin to expose problems in data interoperability due to different definitions in different databases. The URI map would serve as a survey of issues in creating truly interoperable data.

So now the readers of this blog are in at least 2 camps.

  • Those who feel this is a half measure and would be a distraction from advocating for more completely linked data.
  • Those who are still not clear on the benefits of bothering to start the process of linking data at all.

I am hoping there is a third camp which sees this as a doable step in large enterprises such as the US government.  And that it would be the first step toward data which is more linked and therefore more usable for both public and internal uses, and eventually interoperable.

Let me know which camp you are in!

This post is in beta. I am looking for help in better understanding the connection between policy and effort so we can discuss it at the upcoming Gov 2.0 camp.  I am not an expert by any means in this area, but am struggling to understand the problem from a data perspective.  The semantic web initiatives and in general the goal of a collaborative government drove me to seek this understanding of how policy is connected to effort.

One of the 3 things which the NAPA paper on Enabling Collaboration:  Three Priorities for The New Administration identifies as a barrier to a more collaborative government is  ‘An inability to relate to information, and information to decision making.’   This hints at a critical problem in creating new initiatives which is not having enough information to plan a path to implement a new initiative.  I believe the solution is to map the connections between policy, responsibility, effort and procedure as critical pieces of data to inform decision making.  This has the potential to speed  progress in creating a more agile, innovative and collaborative government  just as mapping the genome has sped progress in genetics.

Specifically, I see missing connections between policy, responsibility, procedure and effort required to create new initiatives.   Let’s call it PREP (Policy-Responsibility-Effort-Procedure) data since everyone loves an acronym.   So PREP is essentially a line connecting 4 points from policy to the person trying to create a new initiative.  From a policy at a high level, to offices which have responsibility to ensure the policy is followed, to procedures created by those offices and to the effort to follow those procedures.  (I am sure in reality its more complicated than that but lets keep it simple for argument’s sake).  Of course each initiative has multiple policies it must be compliant with, so multiple lines between the effort and policies.  The procedures are often interdependent, yet created independently by separate offices often in isolation from what other offices do.  In the end you have a thick mesh to get through, that needs to be rediscovered for each new initiative.   I  have come to the conclusion that mapping PREP data is critical to creating a more collaborative, agile and innovative government.

The problem starts with policy being handled as a 19th century invention, that is as an isolated document.   Then the document is passed to various departments with responsibility to make sure the policy is followed.  These departments create procedures to ensure the policy is followed. When someone wants to create a new initiative or project, they need to determine all procedures from all policies involved and then put in an effort to follow  these often disjointed procedures which often have hidden interdependencies.  This seems to be a primary cause of what we commonly call the ‘bureaucracy’.

Many well intended policies come together to produce unintended, entangled procedures which form a barrier to quickly creating new initiatives.  Essentially this is a emergent property of the many policies which have been implemented over the years, as well as the many offices created to follow the many policies.  The result is fewer or slowed new initiatives leading to less innovation and collaboration. (Since almost by definition collaborative efforts will involve new initiatives.)  A confounding problem is that new technology is causing procedures to have to be reconsidered and policies reinterpreted which adds to the complexity.

There is data on policy to effort connections but it does not seem to be centrally accessible or uniformly stored.  And the large differences between  interpretation by individuals on every node in the PREP data which can change for every decision confound the problem of understanding what is really happening.

New policies to instigate new initiatives are now being issued and fast results are expected, but because of this unseen mesh which holds up execution, the top levels are frustrated with the work not getting done.   Meanwhile the people in agencies feel that they are too constrained to get the new initiative started.  Since the mesh is invisible, solutions to change the system become confusing and difficult to follow because they normally add to the mesh rather than disentangle and streamline it.

Solution: Map the PREP (Policy-Responsibility-Effort-Procedure) data and use this map to create guidance on streamlining implementation of policies as well as identifying duplicate or unnecessary procedures.  The data should include the amount of ongoing or one time man hours involved in the effort to follow a procedure, the average calendar delay caused by a procedure, and any interdependency with other procedures.

How? Initially just collect the data in a standardized and centrally accessible format.   It will be almost immediately useful. Use collaborative techniques to collect a lot of data quickly even if it means lower quality data initially. Then gradually move the data to a Semantic/RDF storage system where it can be queried in many different ways and linked to the broader set of definitions such as law, case history etc.

This will be the start of making a more agile, collaborative and data centric government.

The Challenge to this approach: Besides data management which is not too bad initially.   A lot of these hidden paths are not 100% ok with 100%  of interpretations of policies,  so how do we create a collaborative environment without people worrying that the interpretations which allow them to get things done will be ruled to be incorrect?  It seems this needs to be a research project that can’t be looked for that purpose.