Web 2.0 Blog – Discovering Innovation Opportunities using Social Media

Archive for the ‘Cloud Computing’ Category

Tim Berners-Lee concept of linked data clearly is a way to make data more usable whether this is public data or data within a large enterprise.   Linked data promises a future which makes related data more interoperable, discoverable and opens the door for innovation.

But how do we take large existing data stores and apply linked data principles to achieve these benefits?  We currently have massive existing data stores with complex security regimes which are depended upon for many legacy applications.   To make them available as Linked Data is a huge challenge especially if we were to recreate these data stores in XML syntax using RDF/RDFa or even simpler XML schemas.  This is coupled with the fact that many of benefits of the reconstituted data have not yet been invented so an ROI argument cannot clearly be made.  Of course, they haven’t been invented  yet because while many can agree the data would be more usable, those uses must be discovered by fiddling with the data in linked form and discovering the uses that emerge.  Since the linked form,  doesn’t yet exist, we have the classic chicken in the egg problem.

Perhaps there is a step we can take toward linked data without making large changes to the existing data stores in government and industry.  Let’s review the principles of Linked Data first (as paraphrased from wikipedia to add clarity):

  • Use URIs (Unique Resource Identifiers) to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people can locate and look up (dereference) these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.

The striking thing about these principles is that they don’t mention XML or RDFa etc but focus instead on linking data to definitions.  So it would seem a hybrid solution between the linked data concept and existing databases is possible.  We could add URIs as fields in existing databases for important elements and define a central location where we will track information about that element.  For instance, in the US government there are lots of federal buildings used by multiple agencies.  So I would assume many agencies have databases which refer to federal buildings.  Why not establish a central location to define those buildings and assign each a URI. (A URI by the way is essentially a universal identifier for a real world object.  Essentially it is a web page for each building, but the page would more like contain data links than nice pictures.  (Oh and some people refer to URIs as URNs or Unique Resource Name in an effort to make them more human readable which is nice too) .

So each federal building would have a URI/URN and we could of course put more information about each building in a centrally defined schema, but that will start to be real work and have instant security issues.  So why not initially just have URIs contain recipricol links to databases which also contain that identifier?  The links would have brief non-security breaking descriptions of what type of data is stored in the database which is linked to.    This would remove the need to re-securitize a lot of information to make it cross-department/cross-agency available.   And here is the other key to success for this type of solution: Don’t require the back links to the databases to expose data unless they already do so.   If we start requiring data to be exposed in this step,  it opens up the security pandora’s box.   We need to avoid imposing a new security regime for centralized data,  because it is a stumbling block which would create delays and costs.  And if people do not clearly see the benefits of this step, then it would simply die in committee in most cases.

So that is fine you say.  We have URIs for important data elements and for databases which contain those elements but it is not exposing data,  so where is the benefit?  I think this stripped down version of linked data would have 4 definite benefits:

  • Reference.  The URIs could serve as reference documents to find where similar information is stored. Users could then apply for security permissions on an as needed basis when they need to link to other databases.
  • Innovation.  Users, who would now have a more complete map of available data could be begin to suggest more uses for linking the data.
  • Discoverability.  Search engines (internal or external depending on the security decided upon for the URNs) could make existing databases more discoverable because the engines could discover  important data elements in the databases.  Search engines make use of links to discoverable relevance to searches and are often key to researching problems .
  • Interoperability.  The process of assigning URIs will begin to expose problems in data interoperability due to different definitions in different databases. The URI map would serve as a survey of issues in creating truly interoperable data.

So now the readers of this blog are in at least 2 camps.

  • Those who feel this is a half measure and would be a distraction from advocating for more completely linked data.
  • Those who are still not clear on the benefits of bothering to start the process of linking data at all.

I am hoping there is a third camp which sees this as a doable step in large enterprises such as the US government.  And that it would be the first step toward data which is more linked and therefore more usable for both public and internal uses, and eventually interoperable.

Let me know which camp you are in!

I noticed after writing this post that the underlying theme emerging from the fanciful thought droppings below is that it is best for the end user if data and applications are separate and interoperable.   The theme is starting to highlight for me the promise of semantic technology and open data standards.

I keep hearing will facebook win? Will google win? Will microsoft ever get out in the running? Will twitter be bought and by whom?  I wanted to offer another option.  Could the people win?

How would the people win?
Well what is a social network anyway? It’s a series of connections between people and it has rules for distributing information to people based on their connection.  Mutually agreed friends, followers and non-connected voyers following what you do and when you do it  as well as sharing with you.  The connecting and sharing  rules of the social network you choose determines what others see and, if you are up on the privacy settings, how you are connected with them.

Right now our choice is which networks to be on and we make that choice based on the connection rules, the type of content and interactions that can be had and where the people we want to connect with already are. As facebook or another network become more popular, it becomes more difficult not to choose it.

But we pay a price for choosing an online social network.
1. We have to accept the interface which is chosen for us. And while more customizations and widgets are coming out, the essential choice of interface is the control of the provider, not us.
2. We can’t choose our ideal mix. For instance what if we want a myspace style interface but with our facebook friends feed?   There are some configuration options available but still trying to match what we want with what is out there can be challenge.
3. We get targeted advertising based on our peronal information. Maybe we want it, maybe we don’t but at any rate we are not in full control of our information which gets mined for these ads.
4. We can’t move our information to another network or cross link to people in other networks. This is changing some but our information is still not in our control.
5. We can’t create our own rules for connection and viewing, we have to relay on a central authority to do this, even if they allow some flexibility. Very non-Web 2.0.

So how do we win?

What if instead of our data residing on a social network server, it resided on our own private space in the cloud?

And what if we could choose or even create the applications which would allow our data to be seen but others and with the rules which we decide on.  So we could use a facebook style application to interact with our friends but our friends wouldn’t have to be “on” facebook. They would simply have their own ‘cloud space’ and they could send twitter style updates back to us and not have to look at the vacation pics we just posted if they don’t want to.  But they could also choose to send some updates only to some people if they wanted, rather than having the choice tweet to all or tweet directly to one.  Basically the social network core of connections and activity of you and your friends could be managed by any number of applications and rule configuration more tailored to each individual. The way you want to interact with your friends and who your friends could be would not be determined by the popularity of a social network but by you.

Would this kill facebook or google?  Facebook would probably be the most popular application for people to choose to use to interact with their friends with and they could still get their ad revenue.  Google could provide the cloud space to host our data securely for free with ads or for a small cost as well as provide an interface application if you want it.

Twitter provides the first step in separating social data from the social application and it is good evidence of why this approach would be so popular.  I don’t mean the asynchronous relationships or the 140 character limit, but the fact that anyone can build a twitter application to interact with the “cloud space” of twitterfeeds.  Tweetdeck, tweetgrid, and many other twitter applications let people choose how to interact with their social connections and what their interface looks and feels like to some extent.  I am suggesting is widening this approach to include all of your personal information which you would want to potentially share and putting you back in control of your own information.

So you could have one interface for your immediate family, another window for friends and another for interesting people you follow or combination you choose.  Application vendors could make money through ads but you would choose who had a privacy policy on what those ads could find out about you.  Or you could choose to keep everything very private and pay for a service and place to keep your data.  This is similar to what people refer to as interoperability between networks but also with the twist of separating our peronsal data from the network itself.  So its more of an interoperable data model for social networking than an interoperable social network model.

Would this work?  Is part of a social network, the common rules and ways to connect which we are all are agreed upon?  If some people could stop sharing a lot of information except with their BFs, would the fabric of the social network be weakened and this whole idea result in a less networked world?    I don’t think it would because the culture has started to discover the benefits of sharing, but it’s definitely an open question.

So how do we get there?  Hmmm. Not sure.  Google’s free app engine could potentially power something like this. Something like a user rebellion which occurred when facebook tried to change its privacy policy a couple of months ago might be the start of an online privacy movement.  Right now people seem to be having too much fun though to worry about being in charge of their own information. Will this change?  I guess it depends what the social networks decide to do with all of our information that they have.

At lunch Friday, I jokingly asked the question, what would be the economic impact of google mail going down be?

But after thinking a lot about cloud computer and semantics this weekend, I started to wonder if that is a serious issue.  Someone at the table did mention that google mail did go down for 2 hours recently.

In the next 5 years or so in the commercial side as well as federal, there will be a massive shift from single server to cloud computing as well as an increasing reliance on everything being always up because of the interwoven nature of the semantic web.  Websites and webservers will no longer be individual and isolated but exist on the ‘cloud’ or rely on it in one way or another.

By definition the cloud is supposed to be more reliable and more redundant than a single server. But is it more reliable and redundant than millions of individual servers?

I don’t pretend to understand cloud architecture, but I did note that Vivek Kundra a few months ago, said that any data with a national security requirement, would not exist on the new federal cloud.  So what are the implications for  massive civilian clouds from Google, Amazon and Microsoft that business email, websites and data services would rely on?

So if most commercial data will be on the growing commercial civilian clouds, doesn’t the economic impact of large outages, start to pose in itself a national economic security risk?  Especially if an outage could include data loss?

Of  course the temptation is to say the companies themselves will make sure that doesn’t happen because they have such a large financial stake in reliability. That seems reasonable.  After all look how well that approach worked in banking.