The Road to A Truly Semantic Web
Some of you might have heard about the Trees of Mystery in Klamath, California. A 49-foot Paul Bunyan and his 35-foot blue ox – Babe stare down at visitors.Upon closer examination, you will notice a small door hidden in Paul’s left foot. It reveals a series of pipes, ladders, and platforms that eventually converge to form a single room, located somewhere in Paul’s chest and decorated with a single microphone and a sink for its operator to use the restroom. The tree may seem ordinary from the outside, but if you look closer, the inside structure is more complicated than it previously seemed. Using the Trees of Mystery as an analogy for the complex structure of the Web, let’s examine it in more depth.
To the left foot and through the trap door
We see these variations of talking Paul Bunyan statues on the Web – processes that we don’t really question because they just make sense – for the most part with no understanding of their underlying methods. Now we’ll enter this tiny door leading to the Web’s basement and examine its hidden mechanical processes.
Think big: YouTube, Amazon, Facebook. These sites are web giants because of the way their content fits together to paint a bigger picture.
Web developers, content managers, content strategists spend a lot of time worrying about how to present their content, asking questions like “what form should this content take?” and “how should we structure this information?”
What is often overlooked is the area of content that doesn’t show up on a page. Like the small bridges of “content” that we use to form connections between our meticulous content pages – the basic foundation of the semantic web. Let’s change that.
The microphone and the sink
Let’s go back to YouTube, Amazon, and Facebook. All of these websites have a glaringly obvious Paul Bunyan statue: a series of related content items (“related videos” on YouTube, “related items” on Amazon, and “people you may know” on Facebook). Despite how simple and seamless these systems look to users, they too have their own systems of mechanical workings in place that allow them to function.
Put it simply: a computer cannot read content. “Plato”, to a computer, means “Plato”. “Plato” doesn’t let a computer know that we are talking about the philosopher who lived in Greece many years ago. The computer doesn’t even know that “Plato” sounds like “Play-Doh,” a modeling substance used by children to create arts and crafts. It can read the string “Plato,” but it knows no context.
To counteract this, a web taxonomy system was developed. Essentially, we can tag content with “keywords” that explain it, and whenever separate pieces of content have their matching descriptions, a “connection” is formed programmatically between them. This “connection” exists within the context of a website, but it also gives it a place among the rest of the Web (thanks, Google!).
So how exactly do we start converting a meaningless web structure to a semantic web? We have some tools – our own versions of Paul’s microphone and sink – that we can use to add to a page’s context.
1. Using SEO friendly URLs
This gives a place to our content. A meaningless, unaltered URL like website.com/pageid=45 doesn’t tell the search engine very much. On the other hand, a URL like website.com/Plato lets the search engine know that, “OH! This web page is about Plato.” This brings us back to the dilemma mentioned earlier: a computer can only recognize the string “Plato” and it isn’t aware of any further meaning. But besides having SEO-friendly URLs (ie, easier for search engine to recognize and display when someone conducts a relevant search), we can let the search engine know what we are talking about by…
2. Using meta tags
Maybe someone doesn’t know who Plato is, but they want to learn about philosophy. We can add a couple of meta tags, such as
<meta name=”description” content=”This Web page is about Plato, a philosopher who lived in ancient Greece.”>
<meta name=”keywords” content=”classical Greece, philosophy, Western philosophy, theory of forms, metaphysics, meaning of life, Aristotle, Socrates, Republic, Allegory of the Cave”>
and suddenly, the search engine knows about a few other concepts and people that are linked to Plato. With various meta tags like this on multiple web pages, the search engine begins to establish relationships between different pages that it otherwise would not have recognized.
3. Using alt tags
This is a standard practice for accessibility reasons, but it is worth noting that this can also help to add context to a page. A picture of a statue of Plato, to a visitor, brings a few obvious ideas to mind. A search engine, though, will see that a picture is being linked, but has no idea what the picture is about, unless we specify, like so:
<img src=”/images/Plato.jpg” alt=”a bust of the classical Greek philosopher Plato” />
Doing so will walk the search engine through the most glaringly obvious components of a picture. Keep in mind that a computer can’t even read an image that is text, so alt tags are worthy to be used there, too.
4. Using purposeful, semantic markup
The html tags that we have are structured in order of importance. For example, our h1…h6 tags are structured in order of importance. Using the lower numbered header tags for more important content not only helps with search engine optimization, but also quickly gives the search engine a brief outline of our page. Likewise, we should use the new HTML5 elements (nav, header, article, etc.) to house information that fits within those groups.
Looking through the operator’s room
The Web is still fairly complex, and creating a consistent way to add context to Web pages is something we need to do now before the Web becomes even more of an incomprehensible marsh of content. Adding context to our Web pages – building a semantic web – is something that will benefit not only individual websites but also the Web and its visitors, i.e. you and your audience, as a whole.