Resnet18

ResNet-18 Performance

For Resnet18 on the FER2013 dataset, the baseline model achieved an accuracy and F1 score of 0.548 and 0.542 respectively. Partial layer training showed slight improvements, bring accuracy and F1 Score to 0.611 and 0.608 respectively. The usage of data augmentation yielded the most substantial improvement, with the accuracy increasing to 0.619 and the F1 Score to 0.641 on the same dataset.

On the Dartmouth Database, baseline figures were 0.604 (accuracy) and 0.597 (F1 Score). increasing to 0.619 and the F1 Score to 0.641 on the same dataset. On the Dartmouth Database, baseline figures were 0.604 (accuracy) and 0.597 (F1 Score). Partial layer training led to slight improvements, achieving an accuracy of 0.653 and F1 Score of 0.655. Data augmentation had a significant impact, increasing accuracy to 0.757 and F1 Score to 0.754.

Vision Transformer

Vision Transformer Performance

The Vision Transformer model had a baseline accuracy of 0.637 and F1 Score of 0.593 on the FER2013 dataset. Partial layer training marginally increased these to 0.657 (accuracy) and 0.652 (F1 Score). Data augmentation significantly boosted performance, yielding an accuracy of 0.697 and F1 Score of 0.694.

The Vision Transformer model had a baseline accuracy of 0.637 and F1 Score of 0.593 on the FER2013 dataset. Partial layer training marginally increased these to 0.657 (accuracy) and 0.652 (F1 Score). Data augmentation significantly boosted performance, yielding an accuracy of 0.697 and F1 Score of 0.694.

On the Dartmouth dataset, the baseline yielded an accuracy and F1 score of 0.642 and 0.631 respectively. The application of partial layer training improved the accuracy to 0.668 and the F1 Score to 0.661. Notably, data augmentation On the Dartmouth dataset, the baseline yielded an accuracy and F1 score of 0.642 and 0.631 respectively. The application of partial layer training improved the accuracy to 0.668 and the F1 Score to 0.661. Notably, data augmentation provided the greatest boost, resulting in an accuracy of 0.729 and an F1 Score of 0.728.

EfficientNetB0

EfficientNetB0 Performance

EfficientNetB0 had a baseline accuracy of 0.515 and F1 Score of 0.511 on the FER2013 dataset. Partial layer training slightly increased accuracy to 0.530 but reduced the F1 Score to 0.496. Data augmentation improved both metrics to an accuracy of 0.571 and F1 Score of 0.534.

On the Dartmouth dataset, the baseline showed an accuracy of 0.520 and F1 Score of 0.511. With partial layer training, these metrics improved to 0.558 (accuracy) and 0.542 (F1 Score). Data

EfficientNetB0 had a baseline accuracy of 0.515 and F1 Score of 0.511 on the FER2013 dataset. Partial layer training slightly increased accuracy to 0.530 but reduced the F1 Score to 0.496. Data augmentation improved both metrics to an accuracy of 0.571 and F1 Score of 0.534. On the Dartmouth dataset, the baseline showed an accuracy of 0.520 and F1 Score of 0.511. With partial layer training, these metrics improved to 0.558 (accuracy) and 0.542 (F1 Score). Data augmentation positively impacted the results, enhancing accuracy to 0.585 and F1 Score to 0.549.

Common Performance Trends

EfficientNetB0 Performance
EfficientNetB0 Performance

Top: FER2013 Accuracy and F1 Score Performance. Bottom: Darthmouth Accuracy and F1 Score Performance.

The application of data augmentation has shown significance improvement of facial recognition models. By including more training data variation in terms lighting and partial occlusions, accuracy and F1 scores in all models saw a boost to its performance. This has demonstrated the importance of using a diverse dataset that allows better generalizations. On the other hand, applying partial layer training has resulted in less drastic performance improvements in comparison to data augmentation, however, this method took advantage of pre-trained models’ ability to learn deep features only.

The different performance and improvements observed between ResNet18, Vision Transformer, and EfficientNetB0 demonstrate the importance of architecture in facial expression recognition problems. Resnet18 and Vision Transformer were very responsive to our optimization techniques and showed that their suited better for capturing children’s expression. However, EfficientNetB0 poor performance scores highlights that model architecture is important when looking at specific tasks.