Q & A: What’s a Data Architect

Q: What Does a Data Architect do?

A: I think that most people would never ask a doctor, lawyer, carpenter, etc. what they do. They generally have at least a vague idea from either personal encounters, TV or books. But ask anyone what a data architect does and they probably don’t have a clue.

Let’s first get back to the medical profession. Why do we need them? We need doctors in this world simply because people get sick. So similarly, we need data architects because data gets sick. Data in businesses and homes and governments and science labs gets all messed up. And data architects determines the cause of this bad data and fix it.

Q: In what ways does data get so messed up?

A: Data is problematic in several ways. First data gets duplicated. We may find Mary Jones’ address having different values in different places for the person “Mary Jones.” Or here’s another example. When we ask a company “what were your earnings last year?” we may get three different answers—all of which are correct depending on who you talk to.

Secondly, data can have poor quality. Perhaps it’s not defined to be understandable, it’s spelled wrong, or it’s in the wrong place. If the data is about salary, then the salary value is wrong. Or the format is defined differently in different places. Or the data field can have several meanings. (The discussion of “Semantics” on this site will explore that further.

And finally, data is difficult to find. Often we don’t even know where to start, Ir if we only knew who owns (creates) that data we might know where to begin. And if we “google” for the answer, we get many answers, of which 5% is the correct answer; the other 95% is trash. And of coruse data has become as convoluted as spaghetti. We wind through a maze of records which point to other records, usually losing track of where we are with no authoritative person to point us in the right direction.

Q: Is this problem that important or critical?

Yes it certainly is, mainly because it affects so many people. If affects them not only because they don’t have the data they need, but also because thcomapnies start working more slowly or start making mistakes (due to bad data) that cause dollar losses. Unreliable data could affect peoples’ safety, or could result in arguments and personal stress. Let’s take an example.

A: Suppose I’m looking for Louise Johnson who I know lives in St. Louis. Going through various records from her employers and residences, I find a record that says she’s married with one child and lives on 27th and Oak, and in another record she’s married with 3 children and lives on 1345 Crown Point. And guess what. In a third record says she’s single with two children and lives in California.

How does this happen? One cause would be where the Personnel department, Employee Benefits Department, and Payroll Department each has different physical records on Louise these records are separated by thick walls hundreds of miles and updated at different rates and for different reasons. Thus, we may not be able to access this person in an emergency, when this person owes us money, orr we simply make reliable judgments about that person.

Q: How do data architects take care of this problem?

A: I thought you’d never ask. Data architects do a variety of things. Some specialize in data integration—taking data that spread all over the place and putting it in one central location, eliminating duplications so there’s one authoritative source. Some build data warehouses (explained elsewhere in this site) which make it easy for people to get the reports they could never get before. But cleaning up data is an expensive process. So management is faced with spending time and money to totally rebuild their data to eliminate garbage. Some chose only to select the most important (or “strategic”) data and clean that up. Unfortunately, there are many companies or managers who simply close their eyes to the problem because they lack resources or time to do this correctly or they see no immediate payoff. It’s a case of fix it now or fix it later.