Why The Web Isn’t Semantic Yet
The semantic web. You’ve heard about it. What is it? Chances are, you don’t have a completely clear idea. But why isn’t the web “semantic” yet?
First, it’s complex to understand. There are tons of standards around it – RDF, RDFa, RDFS, URI, OWL, SPARQL, etc. It’s hard to get started, because you have to put a lot of unfamiliar stuff in your head. Compare that to the simple, old web page using HTML, where you write some tags and you have a webpage. Pretty easy.
Complexity wouldn’t be that much of an issue, if there were visible benefits. What the semantic web relies on, is that every website should conform to a semantic notion (e.g. use RDF). But what if you expose your data in a structured way? Nothing. Yes, you page becomes easier to make sense of my machines, but you don’t get anything. Why would a web shop or a social network or a personal website be “semanticized”? It requires extra effort for some invisible and hard to understand benefits. Note that I’m not saying there aren’t benefits – there certainly are, but they are too far away from the mind of the average website / online business owner.
It’s a bit of a chicken-and-egg problem – there’s no immediate benefit, and so people are not making their web sites semantic. And because there are so few semantic websites, there aren’t any popular products that would provide the desired benefit. And here comes one great example – the Facebook Open Graph protocol. It is a proprietary solution (never mind the “open” in the title) that defines a way to provide metadata about a web page. Something that RDF (among other things) is supposed to do. Yet, Facebook considered RDF to be too complex (the first point above) and “invented” a new thing. And guess what – everyone is using it. Because it provides immediate benefit – your page becomes “facebook friendly”, and since millions are on Facebook, it is important to expose your metadata. The Open Graph protocol is rather limited, of course, but it shows that in order for the web to become more semantic, website owners need a business incentive.
My last point is about openness and collaboration. The premise of the semantic web is that people would like to make their data usable by others. This post from 2007 is quite skeptical about it. Many websites do not provide RSS feeds or APIs, hide their data under password-protected realms, ban scrapers (even if they do not put any significant load). Because their data is their asset, and they aren’t willing to give it out for free.
So, instead of relying on the web being semantic, companies and open source projects appear that try to make sense of the unstructured data on the web. Data, that is useful to 3rd parties, but website owners do not give out.
Combining the three factors, what we need is a way to incentivize website owners to use more semantic technologies and make them want to make their data public (because of that same incentive). And of course, try to make it a bit simpler to get started. All of that, assuming that it is good to have a semantic web, a web of data that makes sense to machines. It looks like the right cause, but we are not there yet.