Web 2.0 Blog – Discovering Innovation Opportunities using Social Media

Archive for May 2009

Tim Berners-Lee concept of linked data clearly is a way to make data more usable whether this is public data or data within a large enterprise.   Linked data promises a future which makes related data more interoperable, discoverable and opens the door for innovation.

But how do we take large existing data stores and apply linked data principles to achieve these benefits?  We currently have massive existing data stores with complex security regimes which are depended upon for many legacy applications.   To make them available as Linked Data is a huge challenge especially if we were to recreate these data stores in XML syntax using RDF/RDFa or even simpler XML schemas.  This is coupled with the fact that many of benefits of the reconstituted data have not yet been invented so an ROI argument cannot clearly be made.  Of course, they haven’t been invented  yet because while many can agree the data would be more usable, those uses must be discovered by fiddling with the data in linked form and discovering the uses that emerge.  Since the linked form,  doesn’t yet exist, we have the classic chicken in the egg problem.

Perhaps there is a step we can take toward linked data without making large changes to the existing data stores in government and industry.  Let’s review the principles of Linked Data first (as paraphrased from wikipedia to add clarity):

  • Use URIs (Unique Resource Identifiers) to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people can locate and look up (dereference) these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.

The striking thing about these principles is that they don’t mention XML or RDFa etc but focus instead on linking data to definitions.  So it would seem a hybrid solution between the linked data concept and existing databases is possible.  We could add URIs as fields in existing databases for important elements and define a central location where we will track information about that element.  For instance, in the US government there are lots of federal buildings used by multiple agencies.  So I would assume many agencies have databases which refer to federal buildings.  Why not establish a central location to define those buildings and assign each a URI. (A URI by the way is essentially a universal identifier for a real world object.  Essentially it is a web page for each building, but the page would more like contain data links than nice pictures.  (Oh and some people refer to URIs as URNs or Unique Resource Name in an effort to make them more human readable which is nice too) .

So each federal building would have a URI/URN and we could of course put more information about each building in a centrally defined schema, but that will start to be real work and have instant security issues.  So why not initially just have URIs contain recipricol links to databases which also contain that identifier?  The links would have brief non-security breaking descriptions of what type of data is stored in the database which is linked to.    This would remove the need to re-securitize a lot of information to make it cross-department/cross-agency available.   And here is the other key to success for this type of solution: Don’t require the back links to the databases to expose data unless they already do so.   If we start requiring data to be exposed in this step,  it opens up the security pandora’s box.   We need to avoid imposing a new security regime for centralized data,  because it is a stumbling block which would create delays and costs.  And if people do not clearly see the benefits of this step, then it would simply die in committee in most cases.

So that is fine you say.  We have URIs for important data elements and for databases which contain those elements but it is not exposing data,  so where is the benefit?  I think this stripped down version of linked data would have 4 definite benefits:

  • Reference.  The URIs could serve as reference documents to find where similar information is stored. Users could then apply for security permissions on an as needed basis when they need to link to other databases.
  • Innovation.  Users, who would now have a more complete map of available data could be begin to suggest more uses for linking the data.
  • Discoverability.  Search engines (internal or external depending on the security decided upon for the URNs) could make existing databases more discoverable because the engines could discover  important data elements in the databases.  Search engines make use of links to discoverable relevance to searches and are often key to researching problems .
  • Interoperability.  The process of assigning URIs will begin to expose problems in data interoperability due to different definitions in different databases. The URI map would serve as a survey of issues in creating truly interoperable data.

So now the readers of this blog are in at least 2 camps.

  • Those who feel this is a half measure and would be a distraction from advocating for more completely linked data.
  • Those who are still not clear on the benefits of bothering to start the process of linking data at all.

I am hoping there is a third camp which sees this as a doable step in large enterprises such as the US government.  And that it would be the first step toward data which is more linked and therefore more usable for both public and internal uses, and eventually interoperable.

Let me know which camp you are in!

Advertisements

The future of the internet will involve more authentication than it does today but here is a potential interim solution to provide some level of authentication for Gov 2.0 presence on online social networks such as facebook and twitter. standard policy of having a reciprocal link back to a facebook fan page or twitter account on a .Gov/.Mil website which the social network page points to could be a simple interim solution. I call it Reciprocal Link Authentication.

Government 2.0 includes a government presence on non-government websites such as online social networks (OSNs) (think facebook fan pages and twitter accounts) so that citizen’s can encounter government guidance and assistance where they ‘live’ in cyberspace.  But how can citizens be certain that the government account/representative is authentic?    If you run into someone in the street and they say they are working for the government, how do you know for certain?  They provide you will a badge or ID right at the beginning of the conversation.

If we encounter government workers as official government representatives in non-government cyberspace, should we also be able to see some sort of identification?   Since cyberidentity is more easily assumable in many cases than aliases in real life (especially on social networks), shouldn’t there be a way to verify the authenticity of someone claiming to represent a government? Often times government officials on OSNs such as agency fan pages on facebook or informational twitter accounts will have an official seal or emblem. The problem with this is that it is trivial and relatively low-risk to copy or create an image of a seal or official looking emblem and put it on an anonymous OSN account compared to duplicating a paper credential which someone might show you in person.

The commercial solution for authentication won’t work on social network pages. Here’s why.

Commercial websites sometimes provide SSL encrypted links to independent authentication websites (Verisign, Godaddy, among others) to prove their authenticity.  The problem with the government using this method is that it would add paperwork and costs to implement SSL badges or require changes in existing online social networks profile options.  Also I don’t think there are products which work with OSNs and the authenticators to verify anyone on social networks yet.  Perhaps more importantly, the government would be then depending on a commercial company to prove its authenticity.  Basically it’s a non-starter if you want to actually achieve a Government 2.0 presence online in the near future for several reasons ranging from practicality to policy to politics to costs.

But wait, there may be a much easier and better way. .Gov and .Mil web sites already are monitored and checked for authenticity unlike .com and .org sites.   So you don’t need an independent cyber authenticator such as Verisign because any .Gov or .Mil site can serve as that authenticator.

Reciprocal Link Authentication.

Why not have a simple policy that any online social network account or non-.Gov/.Mil online presence have a link to a .Gov/.Mil webpage which then links back to that same OSN account?   So if someone wanted to verify a government twitter account, they could simple click on the URL provided and easily find a linkback to that same twitter account on the .Gov/.Mil webpage they landed on.  If the account is hijacked then a notice of the problem could be put up until the account identity is secured again.  If this is done on all federal OSN accounts, the cybercommunity will become quickly accustomed to the authentication method and if a hijacker removed the authentication link, the visitors will know to dismiss the account.  And if they see something which sounds a bit off, then can instantly verify it by following the link back to the OSN account.     It would not mean much work since online government representatives at non .Gov/.Mil sites almost always have some .Gov/.Mil landscape under their control.

Reciprocal Link Authentication seems easy, low cost and instantly provides a universal method to authenticate any online government representation without much effort.  Sure its not perfect from a cybersecurity point of view, buts it goes a long way to addressing several important concerns about government representation on non-government websites.