Hands-on data de-identification

Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques (PPTs). However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. This course covers the de-identification process which aims to protect the individual’s privacy while maintaining the interpretability of the data (i.e., its usefulness). The program includes:

  • Introduction to Python
  • Introduction to data privacy
  • Predictive modeling
  • Privacy-preserving techniques
  • Training
  • Current directions on data privacy
Machine Learning Researcher

My main research interests include machine learning systems and data privacy constraints.