QUANTITATIVE VERSUS QUALITATIVE DATA
When talking about what type of data is this, will usually assume that you are asking
whether or not it is mostly quantitative or qualitative. It is likely the most common way of
describing the specific characteristics of a dataset.
For the most part, when talking about quantitative data, are usually (not always) talking
about a structured dataset with a strict row/column structure (because we don't assume
unstructured data even has any characteristics). All the more reason why the preprocessing
step is so important.
These two data types can be defined as follows:
1. Quantitative data: This data can be described using numbers, and basic mathematical
procedures, including addition, are possible on the set.
2. Qualitative data: This data cannot be described using numbers and basic
mathematics. This data is generally thought of as being described using "natural"
categories and language.
The four levels of data
It is generally understood that a specific characteristic (feature/column) of
structured data can be broken down into one of four levels of data.
The levels are:
1. The nominal level
The first level of data, the nominal level, (which also sounds like the word
name) consists of data that is described purely by name or category.
Basic examples include gender, nationality, species, or yeast strain in a
beer. They are not described by numbers and are therefore qualitative
2. The ordinal level The nominal level did not provide us with much flexibility in terms of
mathematical operations due to one seemingly unimportant fact—we
could not order the observations in any natural way. Data in the ordinal
level provides us with a rank order, or the means to place one observation
before the other; however, it does not provide us with relative differences
between observations, meaning that while we may order the
observations from first to last, we cannot add or subtract them to get any
real meaning.
3. The interval level
Now we are getting somewhere interesting. At the interval level, we are
beginning to look at data that can be expressed through very quantifiable
means, and where much more complicated mathematical formulas are
allowed. The basic difference between the ordinal level and the interval
level is, well, just that—difference. Data at the interval level allows
meaningful subtraction between data points.
4. The ratio level
Finally, we will take a look at the ratio level. After moving through three
different levels with differing levels of allowed mathematical operations,
the ratio level proves to be the strongest of the four. Not only can we
define order and difference, but the ratio level also allows us to multiply
and divide as well. This might seem like not much to make a fuss over, but
it changes almost everything about the way we view data at this level.
As we move down the list, we gain more structure and, therefore, more
returns from our analysis. Each level comes with its own accepted practice in
measuring the center of the data. We usually think of the mean/average as being
an acceptable form of center, however, this is only true for a specific type of data.