Custom Principle Component Analysis (PCA) Plots in R

There are several functions that calculate principal component statistics in R. Two of these are “prcomp()” and “princomp()”. The “prcomp()” function has fewer features, but is numerically more stable than “princomp()”. Both of these functions can be invoked by simply passing in a suitable data frame, in which case all columns will be used:

    pca1 = prcomp(d)
    pca2 = princomp(d)

Alternatively, the columns to be used can be specified using a formula notation:

    pca1 = prcomp(~ temp + humidity + elev, data=d)
    pca2 = princomp(~ temp + humidity + elev, data=d)

where, “temp”, “humidity”, and “elev” are named columns in the data frame “d”. The object returned from the call to “prcomp()” has the following members:

  • sdev
  • rotation
  • center
  • scale
  • x
  • call

while the object returned from the call to “princomp()” has the following members:

  • sdev
  • loadings
  • center
  • scale
  • n.obs
  • scores
  • call

As shown in the internal documentation, as well as various other places on the web, you can pass the “prcomp()” and “princomp()” objects to “plot()” and “biplot” to provide quick plots of the loadings and the scores of the PCA. You can also call “summary()” and “loadings()” on these objects to get more information about the analysis, and, of course, you are free to inspect the attributes of these objects directly. However, what I wanted was a scatter plot of the scores, with the points color-coded by a particular category (in my case, the model that generated the datum). There are probably a number of ways to do this, but the way I finally settled on was to directly plot the values given in the “x” attribute of the “prcomp()“ object or “scores” attribute of the “princomp()” object. These attributes are R matrices, with the rows corresponding to the rows of the data frame, the columns corresponding to the principal component axes, and the value for in row i and column j corresponding to the score of datum i on principal component j. Thus, to plot the first and second components on the x- and y-axes using the native R scatter plot function:

    plot(pca1$x[,1], pca1$x[,2])
    plot(pca2$scores[,1], pca2$scores[,2])

Of course, this does not get us the color coding. One flexible way to do this is to define a function that returns the approprite color:

    pca.point.color = function(model.id)  else if (model.id == "frag")  else if (model.id == "step")  else
    }

< p>This function will be used to generate a vector of color values for each row in the data frame (using, in the case, the column “model.id”), and this vector will be passed in as the “col“ argument of the “plot()” function:

    plot(pca1$x[,1], pca1$x[,2], col=sapply(d$model.id, pca.point.color))
    plot(pca2$score[,1], pca2$score[,2], col=sapply(d$model.id, pca.point.color))
Share