Q & A: What is Data Modeling?

Can you describe what a Data Model is”?

In general, a data model is a document—either graphic, tabular or both—that describes business data either in business terms (called a Logical Model) or in technical or computerized terms (called a Physical Model).


OK, first describe more clearly the Physical model.

Well, are you familar with database software like Oracle or Microsoft SQL Server?  If you are,  then realize that this software deals with  physical databases (as opposed to logical databases).  And each of those physical databases has a model or picture of what that database has in it, called a “model”.  Thus, for example, if you have an Oracle database for your company’s human resources data, then there will be a model that describes all of the things in that database (employee tables, benefits tables, payroll tables, etc.), portrayed in a model that is usually graphic or as a picture.  We distinguish this model as “physical” (and not logical) because it represents how the data in this specific company with this specific version of Oracle; how it will be used on the computer (as opposed to mental business decisions or published to the public). 

This physical database may have some tables that are not part of the business (such as tables that control the performance and speed of the database or tables that control how tightly and neatly the data is stored—in other words, it’s just computer stuff.  On the other hand, there may be missing in this physical data base some tables that are in the business.  From a business point of view, management might want to see ALL data on people—both payroll and personnel, so he/she can compare between the two.  But only Payroll  got computerized, so it’s model is a little limited to the business people.  Thus a logical model would have ALL of the company’s data, but may not be totally computerized.  Some data is on paper or limited to one department or an on-line service.

Before you go on, tell us briefly what a “table” is.

Yes, think of it this way.  If you know what a spreadsheet is, then a table is a spreadsheet on steroids.  Thus, just like a spreadsheet it, the table has columns and rows.  The columns represent “fields” in the table, rows represents instances. Thus, columns in a human resources table could be name, address, salary, and sex.  Rows would represent each individual people in the company.  Bradley’s salary and address are found in row 17, while Callahan’s salary and address are found in row 1027.  All of this data on the Payroll is in a table.

OK, next tell me what a Logical model.

It is a model destined for the business.  It’s about things that are really found in the business but not necesarrily in any computerized database.  As such, the logical model doesn’t care if the data is integrated or unique or easily accessible or takes up much space;.  Rather it’s conerned with looking at business data discarding how it will be used or who needs it. The theory is that, whether the data is computerized or not, it can be a CANDIDATE for computerzing at now or at a later time time.  Thus, the more we know about the corporate data from a business model viewpoint--its constraints and instability to help us make decisions, the better and faster we can automate it..

Q: I’ve heard of the “Normalized” related to data models.  What’s that about?

That’s a good question., because normalization is very important to having good data.  Let’s suppose that you build a record on employees, their address, city state, etc.  And let’s say that you allow people type anything they want in the “city” field.  Then we might find “Houston” in one record,”Hou” in another, and “Huston,” in another.  A database llowing this type of data to be stored is called an un-normalized database. 

On the other hand, along with this Payroll table I could create a matching table called a City table.  This City table could hold all of the correct or legitimate values for Cities in the U.S.  Then a link between these tables, called a “relationship,” Whenever anyone entered a value in the city table, the database would do a look-up in the city table to see if that value exists.  If it does exist in the City table, then it’s accepted.  If not, a message is sent to the data clerk saying that there’s no such city in the database.  Thus accuracy is insured.

Another rule of normalization is that all data should be centralized.  Suppose that the employee’s address is contained in every payment (check) record.  Now suppose that a person’s address changes.  Would we change the address in every payroll record?  Absolutely not.  This would be a nightmare.  We should keep ONE ADDRESS in the Employee table and maintain it there.

Thus, we can define normalization is the building of relationships between all tables, as needed to assure accuracy, and the assigning of fields to their correct  owner (like assigning address to person, not to payment).


Q: Could you show me a graphic model that displays normalization.


Sure, look at Figure1 (Normalized Model) and Figure 2 (Un-normalized model) below.

In Figure 1, an order is showed to involved product (what’s ordered), Customer (who’s ordering the product), Country (where the product is made), and Continent (in one continent does the order occur).

Note the “Y-shaped” lines that represent the relationships.  The single-line side of leach line means “1” and the Y-shaped (often called crows-foot) end means many.  Thus for example, an order can only have one product but a product can be found in many orders.  You can make similar statements for the other three relationships.  The red “PK” means Primary Key—that is, the Product ID uniquely identifies every product and can no repeats, and the same for CustomerID, etc.  Finally, when an order is created, no product or country or customer or continent can be entered unless it finds a match in its Product, Customer, Country, or Continent master table.  So that’s Normalized.

In Figure 2 nothing is validated. The person managing orders can enter anything he/she wants.  A bad way to manage data, and the way things were done before normalization was invented (in the 1980s).  

Having said that, I will say something that may surprise you.  Having un-normalized data is OK if it’s in a Data Warehouse.   For an explanation, read the section on Data Warehousing in this website.





















Text Box: Figure  : Un-normalized Model