Description
Challenge Description
The goal of this mini challenge is to identify the attribute label depicted in a facial photograph. The data for this task comes from the CelebA dataset [1], which contains 200 thousand images belonging to 40 attribute labels. Specifically, the challenge data for this course consists of 160,000 images for training, 20,000 images for validation and 20,000 images for testing. The images will be pre-cropped and aligned to make the data more manageable.
For each image, algorithms will produce a list of all possible attribute labels. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple attribute labels in an image given that humans often describe a face using different words (e.g. black hair, big eyes, smiling).
Assessment Criteria
We will evaluate and rank the performance of each submitted solution based on the average accuracy across all attributes on a test set.
The higher your prediction accuracy is, the higher the score you will receive. In general, scores will be awarded based on the Table below.
accuracy 3 90% 3 88% 3 86% 3 84% 3 82% 3 80% * Scores 20 18 16 14 12 10 0
Notes:
- We will award bonus marks (up to 2 marks) if the solution is interesting or novel.
- Marks will be deducted of the submitted files are not complete, e.g., important
parts of your core codes are missing or you do not submit a short report.
Submission Guideline
Students should improve the classification accuracy of their network models.
- ● Download dataset: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html , use the images in “img_align_celeba.zip” as well as the attribute labels.
- ● Train your network using the training set of CelebA
- ● Tune the hyper-parameters using the validation set of CelebA.
- ● Submit predictions of the test set for evaluations. Note that the test set is different
from that to the one released on the CelebA website. The new test set will be available one week before the deadline (this is a common practice of major computer vision challenges).
- ● No external data are allowed in this mini challenge. Only ImageNet pre-trained models are allowed.
- ● You should not use an ensemble of models.
Each student can only turn in one submission. Resubmission is allowed. But only the latest one will be counted.
Submit the following files (all in a single zip file) to NTU Learn before the deadline:
- A short report in pdf format of not more than five A4 pages (Arial 10 font) to describe the model that you use, the loss functions and any processing or operations that you have used to obtain your results. Report the accuracy for each attribute and the average accuracy that you have obtained.
- A folder that contains the predictions of the test set, as well as the codes for training and testing your model.
o The predictions of the test set should be a txt file named “predictions.txt”, with each row: <test_image_name> <predicted_attribute_labels>.
Example:
000001.jpg-1 1 1-1-1-1-1-1-1-1-1 1-1-1-1-1-1-1 1 1-1 1-1-1 1-1-1 1-1-1-1 1 1-1 1-1 1-1-1 1
where “1” indicates positive prediction while “-1” indicates negative prediction for each attribute.
• A Readme.txt containing the following info:
o Description of the files you have submitted.
o References to the third party libraries you are using in your solution (leave
blank if you are not using any of them).
o Any details you want the person who tests your solution to know when
he/she tests your solution, e.g which script to run.
Tips
Refer to reference [2] to get started.
Use the following techniques to boost the recognition accuracy:
Computational Resource
You can use the computational resources assigned by the MSAI course. Alternatively, you can use Amazon’s EC2 or Google CoLab for computation. As a student, you can sign up to receive free $100 credit through the AWS Educate program. We encourage students to use g2.2xlarge instances running Ubuntu for maximal ease of installing. Note that $100 of Amazon credit allows you to run a g2.2xlarge GPU instance for approximately 6 days without interruption (you should keep it on only while using it).
References
[1] Z. Liu et al. Deep Learning Face Attributes in the Wild, ICCV 2015
[2] Face attribute prediction: https://github.com/d-li14/face-attribute-prediction
[3] He et al. Bag of Tricks for Image Classification with Convolutional Neural Networks, ArXiv 2018
[4] He et al. Deep Residual Learning for Image Recognition, CVPR 2016
● data augmentation, e.g. random flip [3]
● deeper model, e.g. ResNet-50 [4]
● advanced loss functions, e.g. focal loss [5]
[3] T-Y Lin et al., Focal Loss for Dense Object Detection, ICCV 2017



