Lesson - 2A6

Factors are a special vector that are represented by categorial data
This data can contain ordered and unordered objects -

Factors can be thought of as an integer with a name -

Utilizing factors is thought of a better since factors can have names being self-describing while basic integers are not this fortunate - 

    ie: Male and Female are more descriptive then 1 and 2

Factors are treated specially by the lm() and glm() modeling functions which will be represented later in the lessons -

In the following example we will utilize many functions, please make note of their use and understand their output -

f <- factor(c("yes","no","yes","yes","no"))

Above we created a variable "f" designated as a factor and include some boolean data objects to factor

[1] yes no  yes yes no 
Levels: no yes

We display the objects with the print() function by just issuing the variable "f" -
The data is iterated and displayed to represent the data objects stored in the "f" variable -
The levels represented in the var "f" are of Yes and No - 

 no yes 
  2   3 

Using the table() function we can get a quick overview and represent some basic statistical analysis on the data objects contained -
The levels are classified and counted producing the output of these 
[1] 2 1 2 2 1
[1] "no"  "yes"

The unclass() function takes all data from the variable in a unclassed state to ensure the output is properly displayed -
Bringing all objects to an integer vector - coding Yes as a 2 and No as 1 - 

The ordering of level can be adjusted with the levels attribute to the factor() function - 
This attribute is mostly important when dealing with linear modeling as the first level is used for the baseline -
Hence "N" comes before "Y" in the scale of alphabetical order, you may find this as an undesired action - 

linear <- factor(c("yes","yes","no","yes"), levels = c("yes", "no"))
[1] yes yes no  yes
Levels: yes no

The example above represents the linear attribute being used to set the Yes value to be used as the baseline -
The primary reason for designating your own baseline is without direction the default action is to that the first letter in order to be used as the baseline -