Factor Analysis And Its Applications  Understanding Factor Analysis
Let's say, your dataset contains 200 variables.
Can you imagine how cumbersome its gonna be if you analyse your dataset using all the 200 variables?
Using Factor Analysis you can reduce a large number of variables into a smaller set of variables (factors), which is capable of explaining the observed variance in the larger number of variables.
In short, Factor Analysis summarizes your large dataset so that relationships and patterns can be easily interpreted and understood.
Five steps to Factor Analysis:
 Create a correlation matrix for all the variables
 Factor Extraction
 Calculate Initial Factor Loadings
 Factor Rotation
 Calculation of Factor Scores
Correlation Matrix
 It searches for variables that are strongly correlated to each other.
 If the correlation between variables are relatively small, it is very unlikely that they share a common factor.
 It focuses to extract factors that accounts for as much variation in the observed variables as possible.
Factor Extraction
 The main purpose of Factor Analysis is to identify combinations of variables, and those combinations are called factors.
 Different Factor Extraction methods:
 Maximum Likelihood
 Principal axis factoring
 Unweighted Least Square
 Generalized Least Square
 Image Factoring
How to decide the number of factors?
 Look for the Factor Correlation  If correlation between factors are too high (> 0.7) then there is a high possibility that factors are pretty similar and in this case, merge the two related factors.
 Easily Explainable? Are you able to easily interpret and explain associated items of the each factors?
 The more items are present in a factor, there is a higher chances to consider it for further analysis.
 It represents the correlation between the factor and the variable.
 It tells you how much a factor explains a variable.
 Factor Loadings close to:
=> 1 or 1 indicates that the factor strongly influences the variable
=> 0 indicates that the factor has a weak influence on the variable
 For example, lets say we have nine variables i.e. Algebra, Chemistry, Geometry, Physics, Game theory, Number theory, Set Theory, Probability, Biology
Subjects

Algebra

Chemistry

Geometry

Physics

Game theory

Number theory

Set Theory

Probability

Biology

Subjects

Factor1

Factor2

Algebra

0.788

0.542

Chemistry

0.368

0.912

Geometry

0.729

0.367

Physics

0.541

0.875

Game theory

0.891

0.333

Number theory

0.795

0.412

Set Theory

0.832

0.390

Probability

0.955

0.324

Biology

0.289

0.816

 Algebra, Geometry, Game Theory, Number System, Set, Theory and Probability have high Factor Loadings in Factor1.
 Chemistry, Physics and Biology have high Factor Loading in Factor2.
 Items of Factor1 is associated to a common latent relationship and can also be labeled as 'Mathematics' and similarly Factor2 can be labelled as 'Science'.
Factor Rotation
 Once the Initial Factor Loadings have been calculated, the factors are rotated.
 It is a process of manipulation or adjusting the factor axes in order to achieve a simpler and pragmatically more meaningful factor solution.
 Rotation creates a simpler factor structure and makes the factors more clearly distinguishable.
 Orthogonal Rotation  It assumes that factors are not correlated.
 Oblique Rotation  Unlike Orthogonal Rotation, it allows for factor correlation.
Factor Scores
 Factor Scores are the estimated value of the factors.
 It is used to prioritize and rank the factors.
 With the help of Factor Score, you may decide easily that which factors are more important or which factors you need to focus more.
 In most of the cases, you look for the Factor Scores (positive or negative) >= 0.7
 Initially the obtained Factor Score can be low but after some iteration it can be achieved to a high score.
Deciding questions before using Factor Analysis
 Is there are any outliers in data? Since it assumes that there are no outliers in data.
 Is there any multicollinearity between the variables?
Since for Factor Analysis, there should not be any perfect multicollinearity between the variables.
 What are the minimum number of factors that can explain all the variation of dataset?
 How well do these factors describe all the data?
Factor Rotation
 Once the Initial Factor Loadings have been calculated, the factors are rotated.
 It is a process of manipulation or adjusting the factor axes in order to achieve a simpler and pragmatically more meaningful factor solution.
 Rotation creates a simpler factor structure and makes the factors more clearly distinguishable.
 Orthogonal Rotation  It assumes that factors are not correlated.
 Oblique Rotation  Unlike Orthogonal Rotation, it allows for factor correlation.
Factor Scores
 Factor Scores are the estimated value of the factors.
 It is used to prioritize and rank the factors.
 With the help of Factor Score, you may decide easily that which factors are more important or which factors you need to focus more.
 In most of the cases, you look for the Factor Scores (positive or negative) >= 0.7
 Initially the obtained Factor Score can be low but after some iteration it can be achieved to a high score.
Deciding questions before using Factor Analysis
 Is there are any outliers in data? Since it assumes that there are no outliers in data.
 Is there any multicollinearity between the variables?
Since for Factor Analysis, there should not be any perfect multicollinearity between the variables.
 What are the minimum number of factors that can explain all the variation of dataset?
 How well do these factors describe all the data?
Can you explain more on orthogonal rotation and oblique rotation?
ReplyDeletecould you compare pca with factor analysis in some of your posts?
ReplyDeleteHow factor analysis is different from Clustering since in factor analysis also we group similar variables into dimension?
ReplyDelete