September 29, 2003
Last Revised: May 23, 2012 4:50 PM
This Company Policy is a continuation of an earlier one, located HERE. If you have not already read that Policy, you are advised to do so. That previous Policy explains where we had been, with our old database, and how we arrived at using MySQL as our new database system. That Policy explained some of the basic concepts of a MySQL database, but not all.
This is the Policy that explores the central concept of MySQL, which is an "object." The entire phrase to understand is "Object Oriented Programming."
This looks very new, but in fact the seeds of this concept were to be found in the file-based, dBase III programming described in the Policy called History of Programming Technology.
There were many other highly technical developments, and false starts, but the one new concept that has carried through to day is called the "object-oriented database." The "relational database" needed something to go with it -- finally that was developed in "structured query language." That, in turn introduced the central idea of an "object" which must be understood to understand, then, the whole concept of SQL.
The first thesis on the feasibility of object-oriented databases, written by Copeland and Mayer, appeared in 1984.
Object-oriented programming tools, from which object-oriented databases developed, are rooted in the search for new software programming development techniques and software maintenance. Let's have a look at the most important concepts of object-oriented databases.
The concept of an object first appeared in the mid-60's, in the simulation programming language Simula '67. The attractiveness of object-oriented programming lies in its ability to use abstract objects to represent actual events. Its strength is that it allows any data types and procedures. This saves users time spent in converting data to fit specific data formats and has increased the number of objects that can be represented on a computer. However, since the programmer can now determine freely procedures, it is now the responsibility of the programmer to determine whether the result of them is the correct one or not.
In the 1970's, the field of artificial intelligence started to use the concept of Frame theory, which incorporated the concept of inheritance. This proved an effective method for sharing and reusing code, as well as for organizing information.
Object oriented programming sprang into the limelight in 1980 when it was used in SmallTalk-80, a programming language developed at Xerox USA's Palo Alto Research Center. Following this, related concepts such as multiple inheritance introduced by Flavors, were developed as object oriented programming gradually adopted basic concepts.
These concepts emerged as the result of research and experience with object oriented programming techniques, and were not uniformly integrated into object-oriented databases. Because of the versatility of its programming techniques, a whole range of object-oriented databases with new functions appeared in the late 1980's. (source)
"Structured Query Language" then and now, has become an "object-oriented" data base, so the important new concept to understand is: "What is an Object?"
Keep in mind that this type of database can ONLY be used on a server, not on your local hard disk.
So, let's begin with that strange word, "object."
You may recall from the earlier Policy that even in dBase III there were certain categories of data and certain categories of instructions that fit those data categories only. I'll give an example:
Dates are usually stored, in dBase III, in a "date format." When you open up the database and look at a field for a date, it reads as if it were an English-language date:
September 30, 2003 is displayed as: "09/30/03."
However it is stored within the database as "09302003" which is then divided up with slant lines into a more normal display. It may look like one number, but it is really three separate batches of "numbers" governed by a rule of how many days any one month can have, including allowance for leap year!
If you want to add one DAY to the date, the program adds one to the fourth character from the left, making 30 become 31. Before leaving this action, however, the program looks at the "09" part of the field, then looks internally to see what is the maximum number that can be in the second area of numbers when the first area is "09." If this internal check indicates that 31 would be too high for the second part of the number, when the first part of "09," it advances the "09" to "10" and places "01" in the third and fourth positions. If this is complicated to you? That is because dBase chose to have a different category of date formats so that it would be possible to add and subtract days or months or years.
(You can not add the number ONE to the letter R!) They are two different forms of data and the rule of "add" doesn't apply to each one in the same way. You could, of course, have a "rule" that says that when you add ONE to a letter you simply move one space in the alphabet. That, however, would not be the same rule as used for numerical data.
That "rule" says that when you add ONE to NINE you get TEN.
So the "rule" of adding one number to another number cannot be a "general rule" when it comes to dates, but must take into account that the "numbers" used in dates are governed by a special rule.
If you would want to convert the normally formatted dBase III date into something like, "September 30, 2003" you would use what dBase III called a "function:" "DTOC(datex)" which means, "Date converted to Characters using DATEX as the source of the data."
Thus you can see that even dBase III had special instructions that applied only to certain forms of data.
So it is with "Object Oriented Programming" ("OOP") but with far more room for expansion of data types and instructions. OOP allows for many different types of data, and for many different types of methods, or instructions, and simply groups the instructions for a particular type of data ALONG WITH that data.
I started writing these Policies just as I was, for the first time, also studying this concept of "Object Oriented Programming." It took me a few days before I had traveled, intensely, down the wrong path in my journey to understand this concept.
If you are trying to understand "1, 2, 3, 4 and 5" and your study path takes you through "1, 2, 5, 4 and 3" you may well discover that understanding "3" was essential to understanding "4." You might stumble on the fact that trying to understand "5" when you have not yet studied "3" and "4" would give you a loss.
I have discovered that this has been the case and suggest that you, too, might suffer the same fate if you read the hundreds of articles and essays on the subject of "Object Oriented Programming." The problem is that all the explanations I could find placed the emphasis on the word, "object" and the definition of an "object." I worked on this for a long time until I realized that "class" was a word with senior importance.
There is, of course, a fault on the part of many authors who, themselves, probably understand all the terms, but in their understanding do NOT understand where the new person is coming from. There is also a fault on the part of me, a student, for not recognizing earlier that things were not progressing smoothly in my own understanding and that, perhaps, there were missing elements that needed to be understood.
I can express that simply now.
I had been familiar with the "files" used to hold the database in dBase III. By the time we get to MySQL, there is no "file" containing a database. Instead the "database" is made up of "objects" and there can be many classes of objects. In most cases you could consider that each class would be comparable to a single database in the old terminology.
An "object" is an example of a "class." A database, old terminology, with 20 records in it would have, in the new terminology, 20 objects.
A "class" serves as the model from which many similar "objects" can be created. They could be created with values left "empty" or they could be created in the process of assigning values to the variables. There could be certain key variables that MUST have values assigned, or the object would not qualify as an object within that class.
Each object would be different from any other object, but all the objects would have the common characteristics of the class.
As an example, go back to my illustration about the "dog" for an object.
I described a dog object that had the characteristics of "weight," "height" and many more. Each of these characteristics could be represented as a variable within the object and the value of that variable could be changed. This meant that any one object could "mean" a different type of "dog" but it would always still be within a "class" of dogs.
So, the logical thing to do is to create a "class" before you create objects.
Create the class of "dog" and decide what will be the characteristics of any dog. You put those characteristics into the description of the class. These would include that a dog has weight, height, teeth, a state of hunger, and any number of other characteristics you can choose. Each of these characteristics, further, is assigned a variable, but not a value for that variable.
Thus, the height of the dog can be described with the variable "A23," and the weight of the dog could be "A13." A "value" can then be assigned to these variables, such as saying, "A13 = 20 inches." Perhaps several dogs have a "20 inch height?" Fair enough, but when you take into account ALL the different characteristics, including, for instance the "name of breed," you would never have any two objects that were exactly the same.
The "class" determines what are the characteristics to be then placed in all the objects within that class.
We could decide that one of our "classes" is the "class of terminals." We would define "terminal" as a person with an address, and undoubtedly other parts of that definition. These parts would probably all also become characteristics of the class.
A class of "terminal" would include characteristics such as are in our old MASTER file: First name, last name, full name, name prefix, name suffix, salutation name, address 1, address 2 (assuming we figure that two lines of address information are "enough), city, state, province, country, postal code.
Each of these, and many more characteristics would be described for the CLASS of terminal.
Not only are the characteristics (types of data) included in the class, but also all the actions or methods that can be used on any object in this class. So, the class includes both data and actions -- both referenced by assigning a variable. The action might be "replace" and might be given the variable "c33." By tradition data gets variables starting with capital letters and actions get variables starting with lower case letters.
A "method" or action, has it own definition. Thus c33 could be an action "REPLACE A44 with 'temporary variable xx." This action would not "work" unless there were the two "parameters" included for that variable. It might be written: [c33,A44,"text"]. If the program did sees an instruction such as "c33" it immediately checks its class definition table and notes that "c33" must have two parameters, that one of them MUST be one of the variables already assigned within that class, and the other variable must be one of the allowed "temporary variables" used to introduce new information into an object to replace old information. The program would also note that the variable "A44" is a character type (as opposed to numeric type) data. The program would not allow numeric data to replace data that is supposed to be character type data.
One of the characteristics could well be what we now refer to as "Terminal Number," meaning whether the person is a customer, prospect, or any of several different types of "Terminal."
Once we have defined this Class, we can then use the Class to create new objects, but now the variables in that class would be assigned values, for that object.
So, Class includes the characteristic that A13 is the variable that describes height of the dog (well defined as to how measured) and within any object there would be "A13" with (or possibly without) a value assigned to it. (We might not yet know the height of some dog, or it might change?)
(Sometimes the word "class" is replaced by "template." This substitution is logical if you understand what is happening, but it is also very possible that you will see the word "template" and the word "class" and think that they are different!)
The class is NOT copied to make objects. Instead the word is instantiating, and the process is different. In a normal "copy" you do get an exact duplicate, but with "instantiating" you get not only the exact duplicate but you get a permanent connection to the Class. This means that if you do want to add a characteristics to a class, you would automatically be adding it to all the objects so far "instantiated."
It turns out that grouping the instruction in the same space as the data (within an "object") makes many improvements possible in the database system.
In the phrase "Object Oriented Programming," let's look at and define each of the words.
The word "object," as used here, means something like a dog, desk or car. In other words there is something "real" and it exists. The word "object" as used in this "electronic" sense is, of course, not a meat and bones dog, but could be a "virtual" representation of a dog -- a bundle of software that represents a "real dog."
A dog, whether meat and bones or electronic, has certain characteristics, often called its "state" in this usage. It is some number of inches tall, some color, weight and condition of its teeth. It also has a state of awake or asleep, hunger, emotion, thought, sight, etc. You could have an infinitely large number of characteristics for a dog, or it might be that, for your purposes, you only needed ten.
The object can also have actions, or "behaviors" as used in OOP. It barks, it drinks, eats, wags its tale, looks at a cat, smells something, and has other behaviors. Generally its state could be described with nouns while its behavior would be described with verbs.
In the terminology used in the earlier Policy, the state would be a "havingness" and the behavior would be a "doingness."
Notice how "grammar" again enters. In the very early, simple "file-based" database there were static pieces of data and dynamic actions of code. Now we are putting both the data and the code in the same object. This is the enormous difference between file-based and object-oriented databases.
The act of looking results in a state of an image in view. The "looking" is the behavior while the image is the noun that describes the condition in the dog at that instance. The data in an object is usually included in the form of "variables."
Thus, you could say, "Variable D55 equals the image being looked at." In this case "image" would have to be specially defined. Let's say that we have a system where the ONLY images were numbers, and only the numerical value of the number was wanted. So, the dog looks and sees "99" or some such, and the variable D55 then EQUALS "99," depending on how these views are to be formatted.'
In this way the overall characteristics of this object can be changed. The object has one state of existence, one instance, when D55 equals "99." It would have a different state of existence, a different instance, when D55 equals "19492304" or some such.
If there were many variables in this object, in other words many possible characteristics, and many possible values for any one characteristic, then the object could be, indeed, a very complex creature.
When the dog is electronic instead of meat and bones, then there can be one "variable" which represents the color of the dog. That variable can have any number of possible colors assigned to it, within the allowed parameters. Another variable might be the weight, degree of hunger, etc. There could then be other variables assigned to different types of behavior or action. Thus, barking could have one variable while eating, looking, running, etc., could each be represented by different variables.
To put some flesh to these bones, let's say that variables for the condition or state of the dog all start with "a" while variables for the actions of the dog all start with "A."
Then a3 could stand for the color of the hair while a15 could stand for the number of teeth the dog has. Carrying this on, A3 could be the variable that represents eating while A10 stands for the action of barking.
All these variables, both action and condition, are then included in one object. This object, with any one set of values for the variables, represents one instance of the existence of the dog. If one of the actions of the dog is "running" then there might be a "state" such as "10 miles per hour." There could be, of course, virtually an infinite number of possible behaviors and states of existence, depending on how finely you divide them. "Walking" could be one behavior while "running" could be a different behavior. If the needs for this action are modest, then ONE variable could cover both walking and running, or any other form of "moving."
You can see how logic enters in. "Motion" is one concept and "moving" is a different one. We can probably see the logic of calling "motion" a state of existence, while "moving" is a behavior or action.
At one instant of time the dog could be running at 10 mph. Then there could be the action of "slowing" and the next instant of time might see the dog's condition as including running at 5 mph. This would be two different instances of the same object. There could be, of course, an infinite number of instances for this one object. Another way of putting this would be say that every time there is some action that changes the state of the dog is a new instance.
The condition of the dog could be going through an infinite number of changes if you wanted to take note of every different heart beat (action) and movement of blood (action), resulting in a condition where certain blood cells are in different locations -- a different state of existence.
This is rather clear as long as you describe this object in terms of a real dog. It is probably OK to suggest that these actions, and these different states of existence could be virtual rather than real, and could then be represented by variables. A "variable," of course, is something which can vary or change.
So, we can now change what a variable stands for. The variables that stand for the condition of the dog would be changed by the variables that stand for action.
Whether or not an ACTION is called forth (and how much) could depend on another internal action, but just like in life actions could also be triggered by factors external to this object.
The dog looks (an action) at food (external data/picture). The dog's internal (copied) image (a state) of food triggers another action "start salivating" and a state -- "hungry." Or, the dog looks at an empty plate and has the action of sitting, and waiting.
So, an "object" has within it variables. Some of the variables represent the state of existence for the object while other variables stand for the behavior of the object. Behavior of the object can and does affect the state of existence of the object itself, and can, of course, affect other (external) objects (dogs, cats, humans, etc.)
If you wanted to build a model of a dog and figure out how much food could be metabolized into how much energy, that into how much stamina for running at some particular speed, you might want to use an object-oriented method of storing data about a dog.
There would be action variables for eating, digesting, conversion of food into energy, message to muscles to move, and many more.
There could then be variables for the state of the dog. Some of these might be held constant, such as weight, temperature of surroundings, etc.
One variable for the state of the dog might be variable, depending on different values for the variable of eating xx pounds of food.
Oh yes, an action could be defined in detail -- not only "eating" but also "eating a dish of food," or drinking three ounces of water, etc.
If you got good at this you could build a model that would allow you to predict how much food would result in a certain dog being able to run a certain number of miles.
I have not used the word "class" much yet, thinking to keep this subject simple, but you need to know that "objects" are usually created by making them from a "class." You can have a class called "dog" and many objects created from that class. The class would contain all the types of data and behavior for every object within that class.
The class could, for instance, say that the "weight" of the dog is set to be the variable b3, and the action of barking for the dog is set to be the variable A6.
Then every object will have these same variables, all meaning the same thing, but each object can then be made different by assigning different values to the variables.
So, one object shows that the dog is "b3 = 40 pounds" and another object can show that "b3 = 10 pounds."
With a computer to keep track of the thousands of possible variables, and to change these in each object you could represent many different dogs in an a batch pf "objects" of virtual data.
Perhaps this would seem more real if you replaced "dog" with "weather system." We have a "storm" as a thing that exists. It has a direction and speed -- more states. It speeds up (action) because it travels over warm water (external factor). It is pushed "east" (a behavior) that results in a new state ("east of where it had been").
Objects can send and receive "messages!" That's a big deal. Hit an object with some source of energy! The energy "goes in" moves through various variables and comes out, probably changed, to be a "response" to the incoming message (energy).
You are getting, finally, very close to "artificial intelligence," but long before you get there you have a "language" that allows us to handle infinitely more types of quantities of data, with infinitely more complex methods of manipulation.
That is not, of course, the entire story, but you have a good beginning of the history of how we got from dBase III to a modern form of database, called "MySQL." So, let's see what more this new creature has to offer.
The illustration above charts the evolution of the content management as it has expanded its information modeling functionality and information management functionality. It is a useful comparison because it shows that the market has overwhelmingly preferred trusted management functionality to enhanced modeling capabilities. The early relational products were ridiculed by the then dominant network (CODASYL) DBMS vendors (you couldn't even model an organization chart let alone a parts assembly with the relational model!). The buying public decided that elegant modeling could reduce coding efforts and improve design, but management facilities were essential to running a business. Many years later, the early Object-Oriented (OO) DBMS vendors had visions of conquering the then multibillion-dollar DBMS market due their revolutionary information model. Again, the market spoke unambiguously in favor of predictable and scalable platforms. (source)
I touched on "messages" that might be received by "objects" and originated by objects. Let's explore that a bit more.
Instead of having a series of instructions that proceeded in order, they created a set of objects that communicated with each other and asked each other for information.
OOP [object oriented programming] requires a lot of overhead processing, since it has a central switchboard keeping track of which method is doing what and routing messages between objects. (source)
Here is a simple example:
It's a revolution coming slowly. Even this database is sort of object-oriented. It's a series of cards with buttons that send messages to the main switchboard (HyperCard) and to each other. This text field is one object, containing data and the code that determines what happens if you double-click on it. (source)
In fact the type of variable called "behavior" could also be called a "message." These behaviors are triggered by receiving some message -- whether from inside the object, or outside. In turn, every behavior results in some sort of outgoing message -- whether aimed externally or internally. It could be possible that some behavior isn't triggered until TWO or more incoming messages arrive, and it may sometimes be that after some behavior is done there are two or more outgoing messages.
You can see that this is a rather complicated system!
You might think of successive lines of code, in the old file-based system, in terms of each line is "triggered" by the message from the previous line of code. The line then "does its thing." That line of code, in doing its own thing, also sends a message to the next line of code -- "do your thing." But, in an object-oriented system these messages can go to many different places, and the message coming in can come from any of many different places. So, it is almost like a three dimensional model with an infinite number of possible sources of messages, and an infinite number of possible places for the outgoing message to be sent to. This takes a very sophisticated switchboard to keep track of all these messages. The fact that they are all taking place within one very tiny object makes for very fast communications.
These messages could be as simple as one-word commands. The word could be "copy" in A30. But, the command could be "copy Variable c15, unless it contains x." The receipt point of some message would be a30 (a general purpose "copy command that needs additional instructions to tell it where and what to copy and what to do with it). The variable a30 could include the further instruction that after the desired item is copied, "replace C11 with the result" and then send a message to q3 that the behavior was completed. I've made this up. It may not be valid, but it seems to me that is what the logic of the system would call for.
A hundred different "behavior" variables could each contain the message to A30, and when that message was received at A30, the incoming message would have a "signature" identifying where it came from.
The many characteristics of an Object
I have covered several of the characteristics of an "object" as in an "object oriented database."
Here is yet another.
This is the cornerstone of the object-oriented data model. In procedural languages and for that matter relational databases, the data (i.e. Tables) and the code used to manipulate the data are kept completely separate from one another. Any piece of code may access any piece of data. The object-oriented model ties the data to the code [BOO94, SIL97, MCF92]. Objects have well defined interfaces which are used to manipulate the object’s state. This interface may also be used to inquire about the state of an object. Let us attempt a brief definition of an object of type Person with the attributes "name" and "age". (source)
Let me, now, introduce a few more of these characteristics, with brief descriptions.
Class Of Objects
Going back to the earlier description of "objects" I used the example of a dog. You could create a "class" called "animals" of which one set of objects could be dogs, another set could be cats, etc. Each of all of these objects would have characteristics in common with the "Class" called "animal."
Here are more:
Generalization is related to inheritance. However, it is not necessary to have inheritance in order to have the concept of generalization. Generalization refers to the ability of an object to specialize its behavior. For instance a "Square" may be thought of as a specialization of "Shape". This has an extraordinary benefit. Any function that accepts a pointer to a class "Base" must also be able to accept pointers to objects derived from Base without altering its behavior. Therefore, if Shape accepts a draw message, then when we pass a Square or a Circle object to an appropriate function, that function is completely unaware of the differences between Squares and Circles and can send either object type the draw message.
Inheritance is the concept that an object may acquire the behavior of a parent object. The object may then enhance or add to this inherited behavior. In the traditional and OODBMS domain, inheritance is a mechanism for code reuse. Furthermore, in the OODBMS domain it is used to group objects of similar types and can play a role in the development of a query language for an OODBMS. However, inheritance creates some special problems.
2.9 Multiple Inheritance
Multiple inheritance is when on object inherits traits from more than one parent. There is a problem with multiple inheritance. We have the problem of object identity mentioned earlier. When operating in a system that allows multiple inheritance, we must make a special effort to determine what is meant by object equivalency. Do we use the value or the address? Even if we use the address we cannot be guaranteed to always be able to tell when two objects are identical. This is because a single object may have more than one address. Therefore, we must use object identifiers. (source)
One of our hopes was that we could design and develop a "system" that could be encoded, made proprietary, and thus serve as a source of sales and revenue.
That will be hard to do with MySQL:
"Other companies refer to this kind of license as Open Sourceware. You can use it an redistribute it if it's free. Products that use MySQL in embedded commercial systems... they have to use a license. (source)
And here is more on why this is not likely to change:
According to Widenius, as a result of the open source movement, the software business has changed forever. "The hardware business is what's pushing open source," he says, "because they can sell more hardware." MySQL benefits from this push, getting a lot of hardware for free. Recently Compaq gave him an Alpha DS20, just so that they could be sure that it would be compatible with MySQL. From the hardware manufacturer's point of view, free software is the best thing that ever happened to them. That's why you see companies like IBM getting behind the movement, he explains. (source)