Testing a Biometric Age Estimation System

In this exercise we will conduct an experiment using a biometric system found on the internet. Because many biometric systems are available on the internet other similar experiments can easily be designed.

This exercise uses Microsoft's Age Estimator that estimates a person's age from a photo. For reasonable statistical analysis the exercise works best for a class or group of at least 10 people, preferably 20-30 or more.

Before performing the exercise, we can discuss possible hypotheses and come to a consensus on a particular hypothesis to test.
By choosing a hypothesis, we are basically using the ** hypothesis-testing experimental design**.
Several plausible hypotheses are mentioned here but we will go with the first one for this exercise. **Possible hypotheses:**

- For adult users, the age estimator tends to underestimate a person's age so they feel younger and good about themselves.
- For young people, the age estimator tends to overestimate a person's age so they think they look older and more important.
- For a group with a reasonable age distribution, the age estimation error increases as a person's age increases.

There are several ways to conduct the exercise -- we will use the first:

- We will use the Google doc so each participant can edit the document simultaneously and then copy the data into their own spreadsheet. Once the data are entered into the Excel spreadsheet, all calculations are automatically performed.
- Alternatively, in a classroom environment the teacher could have
- One or several students taking photos, easily done with smartphones -- each person taking a selfie, or one or a few phones used to take photos of everyone.
- One or several students entering the photos into the estimator to obtain the age estimates.
- Another student entering the data into the spreadsheet.

Student learning outcomes:

- Students learn about a biometric system, in this case an age estimator.
- Students learn about experimental design like the hypothesis-testing design. They can also learn about experimental controls to minimize extraneous variables -- in this experiment, for example, photo capture could be limited to one camera/smartphone so as not to introduce variations due to different camera resolutions, etc.
- Students learn about hypothesis testing. From the histogram and the mean, median, and mode values it is easy to get a rough idea whether the hypothesis is true or not. More precise statistical hypothesis tests can determine the degree to which the hypothesis is true or not by obtaining the probability of the null hypothesis, usually looking for a p-value < 5% for reasonable significance (a Google search will find material on statistical hypothesis testing).
- Students learn mathematical measures used in statistics: min, max, mean (average), median, mode, and standard deviation.
- Students learn about histograms. Probability distributions can also be taught here since the probability distribution in this case is obtained directly from the histogram by dividing the counts by the number of age estimates.