Today, we are going to talk about how Principal Components Analysis (PCA) can be used to solve complex problems related to accommodating multiple human dimensions at the same time. PCA is a computational technique that transforms a set of data into a set of new variables that are uncorrelated with each other. Sound complex? It can be, by definition, but it’s easier to understand when looking at specific examples.
Suppose you are designing a doorway to fit a population. You could design the door to accommodate everyone’s height and body width, but that would be expensive. To save money, you could accommodate only 90% of the population rather than all of it. This works well, and PCA is not needed here. But most design problems are considerably more complex than the doorway.
Suppose you want to accommodate 90% of a population for sitting height for a car’s interior. You would eliminate 5% of the tallest sitting heights and 5% of the shortest sitting heights, leaving 90% accommodated in the center of the distribution. Simple enough, or so it seems. Now, suppose you need to accommodate leg length so the driver can reach the pedals. You might take the central 90% of leg length as you took the central 90% of sitting height. But now the accommodation rate, instead of 90%, is actually 83% in a sample of males. Where did the other 7% go? The reason for the drop in accommodation is that some of the 10% dropped for leg length are different people from the 10% dropped for sitting height. Because of this, the percent accommodated will decrease with each new dimension added. Further, to the extent that dimensions are not well correlated with each other – as hip breadth is not well correlated with sitting height – the decrease in accommodation will be even greater.
With more complex needs, comes a more complex analytical approach. For an airplane cockpit, there are a number of critical dimensions: the pilot needs to be able to see over the nose of the plane, reach the critical ejection handle, foot controls, joystick and the fuses overhead. The pilot shouldn’t hit the canopy with his or her helmet, and the knees can’t be so far forward that they are injured during an ejection. There are at least 12 important dimensions required to design that cockpit. It’s not possible to accommodate that many dimensions using a traditional percentile approach to capture the central 90% for each dimension. Here is where PCA is especially helpful.
PCA is a way of more efficiently organizing the variation in a given set of data. For any number of input variables (dimensions), the PCA will produce the same number of principal components. For the cockpit example, PCA analysis would produce 12 new variables – the principal components (PCs). You can think about a single dimension – sitting height, for example – as a line, and each person in your data set is a point on the line. There’s a minimum value, a maximum value, and everybody else is in between. A principal component has the same characteristic – everybody in the data set has a value for the principal component and can be placed on the line (Figure 1). The difference is that the PC value is calculated from all 12 of the original values.
Figure 1. Sample data set along a single principal component (PC) line
Now, the analysis is calculated in such a way that the first PC always contains as much of the total variance in the data set as possible. The second PC is calculated so it is orthogonal – at right angles – to the first PC. If you were to plot the two, PC1 and PC2 would be plotted as two axes on a graph (Figure 2). The individual people would be dots scattered across the area.
Figure 2. Sample data set plotted with two PCs
PC2’s calculation has a similar form to the first PC (and to all PCs) in that it is a combination of all 12 original dimensions. It has the additional characteristic that it is completely uncorrelated with PC1. This means that all the variation accounted for by PC2 is different from variation accounted for by PC1. The third PC is orthogonal (at right angles) to the other two, so if you plotted it, it would form a 3-dimensional axis, and the dots would be arranged in 3 dimensions. Because each PC is not correlated with the previous PCs, it turns out the variance in the data set is being organized much more efficiently. In fact, although there is always the same number of PCs as there are original dimensions, typically most of the data set variance is accounted for in the first 3 or 4 PCs.
Thanks to PCA, with 12 cockpit dimensions and a sample dataset, the first 3 PCs account for 86% of all the variance in the population. The next 3 PCs account for less than 4% each, and the last 6 PCs account for less than 2% each. Instead of worrying about 12 dimensions for design, you can just worry about the first 3 PCs instead. Your design task has just become considerably easier.
Moving from the PCA to an actual design is the next step, and we’ll cover one approach to that in a future blog. Contact us today, and we will send you the full article that is the basis for this blog.