How AI will make the Semantic Web possible
The year was 2001. I was in my teens, just taking my first steps into coding, when I first read about the "Semantic Web" in Scientific American. It completely blew my mind. The Semantic Web was supposed to be a revolution that would change everything.
It didn't happen at all.
The truth is, machines could not understand the meaning of words on the web. You can easily search for the word "car" on a webpage, and the computer shows you all occurrences of the word "car." If what you want is to find, "Ferrari" or "Tesla," you would be out of luck because the computer would match only the word “car” and not words that describe things that are cars. However, if we just expressed information in a way that machines could understand, we could create robots that would navigate the web and do most tasks on our behalf.
After reading the semantic web article, the web felt strangely primitive to me. Now, more than 20 years later, it still does. Think about the number of clicks and copy and pasting necessary for basic web research.
The unfulfilled potential of the Semantic Web
Web search operates over pages. However, the result pages are a means to an end. What we want is answers and information that is contained in those pages. We still need to extract that information from the pages ourselves.
When your search is complex like creating a list of things. For example: "books that won the Hugo award for best novel and also talk about AI." Unless someone already created a page compiling this list, you will need to copy and paste from many pages until you end up with your desired list.
Looking into multiple pages and compiling snippets is time-consuming and tedious. It's common to see executives outsource this "grunt" work to interns and VAs. Not only that, even with full-time assistants, the time-consuming nature of compiling information means that we don't ask for it all the time. Most people don't have assistants or interns and need to do tedious parts themselves, which means that they only do it when necessary.
Clicking through many pages and copying and pasting results is tedious and time-consuming. But the situation is even worse: because research is so tedious, you end up limiting your creativity. After years of using Google, it conditioned you to know what type of information you can easily find with a single search and the ones you don't. You will not even think about trying some complex search because you know it will not work.
Imagine using the whole web as a personal database and having your own robots navigating the web and finding information for you. The Semantic Web was supposed to turn this into reality, and it failed.
Why the Semantic Web failed
Computers couldn't read and understand texts as humans do. This was the central problem the Semantic Web was supposed to solve. Since computers could not understand human language, we would need to annotate the whole web to become accessible for machines to read. The annotations were in a format called XML, and they would not be visible to the end user of a page, only to the machines navigating the web. This annotation is a type of Metadata, which means "data about data" -- information like keywords, page-length, title, word-count, abstract, location, SKU, ISBN, and so on.
The extra work of annotating the web for computers was never appealing enough. Doing this extra work without a clear short-term benefit would not appeal to most web content creators. It also would not be immediately beneficial for someone reading an annotated page. Another reason is that there are multiple ways to describe the same thing. People can disagree on how to describe things and still be correct. Making people agree on standard terminology is a losing game.
The Semantic web failed because people creating websites didn't get on board with annotating their content. Without a rich set of websites annotated with metadata, the dream of autonomous agents roaming the web and doing tasks remained just a dream.
Why could the Semantic Web succeed now?
Basic reading comprehension in a machine was a pipe dream in 2001. Now it's a daily reality with apps like ChatGPT exploding in usage. How close are we now to realizing the Semantic Web promise?
The reason annotations and metadata were the main initial focus of the Semantic Web is that computers could not extract the "meaning" of a sentence. What I mean by "extracting meaning" is reading a passage and understanding it enough to answer multiple questions about it and understanding the underlying relationships of concepts like "Apple" is a "company."
Twenty years after the original dream of a semantic web, a new window of opportunity opens. AI performance has improved intensely since the 2000s, as you can see in this graph:
AI Reading comprehension started to improve rapidly after 2015, achieving a close to human performance in 2020 and even surpassing it in some benchmarks. These, of course, are based on established task benchmarks, and many critics point out that passing the benchmarks at a human level is not the same as reading at a human level.
The fact is that AI systems now have an unprecedented capability of understanding language. For the purpose of our exploration of the Semantic Web, the critical question is not whether they are as good or better than humans but if they are good enough to achieve our goal.
ChatGPT is one of the most capable Language Models available today. Let's see how good it is in reading comprehension.
Notice that in the original description of the book, there is no mention of AI directly. ChatGPT can infer that the book is about AI even though the word is not contained in the text.
ChatGPT cannot navigate the web yet, but it will soon. Combining the reading comprehension capability of modern AI systems with web navigation will allow us to have agents roaming the web and finding the information we want.
The Semantic Web was a hack to solve the lack of capability of AI systems. The hack is not necessary anymore. The growing capability of AI systems will build an even better web than the one dreamed of in the Semantic Web.