Q & A: What are Semantics?

You mentioned “Symantics” as part of the Data Architect Role; what is that?

In most of the many sand boxes where data architects play (integration, cleaning data, modeling) data is found and given improved quality.  But once we have data, there still remains a question of what it means.  When the data is integrated each piece of data is given a definition,.  For example, if I’ve built a database in an insurance company, that company defines the data element “agent” as “a person or business that sells insurance to the public.”  But not everyone is privy to viewing that beautiful definition.  So when the word “agent” appears on business forms from other industries or countries, confusion can occur.  For an “agent” can be a chemical that causes a reaction, a campaign manager, a government role (as FBI agent), or a person who performs a particular function.  So in this situation we’re missing clarity on data.   We’re not sure of an agent’s data because one word can have many meanings synonyms (words spelled the same).

(Similarly, someone may say “I ‘licked’ the ice cream cone,” or ”He doesn’t have a ‘lick’ of sense”,” or “The Eagles ‘licked’ the Cardinal 10 to 7.”)

But another problem is that many words can mean the same thing (call homonyms).  For example, a migraine headache can also be called a spltting headache, a sinus headache, megrim, cephalagia, and hemicrania.  Thus, if we’re using Google to find information on one of those words, we may have missed thousands of results that would have come from a different word.

The data architect basically performs these three steps in order to resolve these synonyms:

1.        A model is built for a specific business defining what a term means and does not mean.  It also defines word connections such as “the doctor prescribed some medicine,” which says that when “prescribe” appears with “doctor” it’s a different kind of “prescribe than when a carpenter advises on a way to fix a hole in the roof.  Thus this model will omit the limit searches for the word “prescribes” to the medical industry.

2.        The data architect “publishes this model” on the World-wide web in a special format so that the world knows these rules.

3.        Next, when any search engine such as Google sees this model, it records in its own index that “this is the place you need to go whenever anybody uses the word “prescribe” and  when in the same request asks about “doctor” and “medicine.” 

4.        Thus, as a researcher on the web, (a.) your search time is reduced and (b). non-relevant “finds” are eliminated.