Our PCA calculator takes in data with multiple dimensions, transforms it into principal components (scores), and then generates a biplot and scree plot.
The principal Component Analysis (PCA) is a technique that reduces the number of dimensions in data while minimizing the loss of information. The method works by rotating the axes in such a way that there is more variance along them, and then transforming the data into principal component values, also known as scores. These principal components serve as the new axes, and the PC scores represent the projections of the original dimensions onto the new axes.
PCA prioritizes the principal components based on importance, with PC1 being the component that explains the most variation in the data, followed by PC2, and so on. By only considering the first few principal components, such as the first two, a significant percentage of the variance in the data can be explained. This enables the representation of high-dimensional data on a two-dimensional chart.
When there are more samples than dimensions, the number of principal components is the same as the number of dimensions.
The scree plot is a graphical representation of the eigenvalues of the principal components, which indicate the amount of variation explained by each component. The plot is arranged so that the eigenvalues are listed in descending order, from the highest to the lowest.
In our scree plot, the columns represent the eigenvalues, and a line is plotted to show the cumulative percentage of variation explained by the principal components. For example, if the line for PC2 reaches 93%, it means that the first two components explain 93% of the variance in the data. If the data is represented on a two-dimensional chart using these two components, only 7% of the information is lost.
A biplot is a graphical representation of multidimensional data that displays the relationships between variables in a two-dimensional plot. In this representation, the principal component (PC) scores are represented by dots, and the loading vectors are represented by lines. These elements of the biplot allow for a clear visualization of the underlying structure of the data.