题目:Differentially Private Data Release for Mixed-type Data via Latent Factor Models

报告人:张艳青教授(云南大学)

报告地点:维格堂319

报告时间: 12月3日9:30-10:30

摘要:Differential privacy is a particular data privacy-preserving technology which enables synthetic data or statistical analysis results to be released with a minimum disclosure of private information from individual records. The tradeoff between privacy-preserving and utility guarantee is always a challenge for differential privacy technology, especially for synthetic data generation. In this paper, we propose a differentially private data synthesis algorithm for mixed-type data with correlation based on latent factor models. The proposed method can add a relatively small amount of noise to synthetic data under a given level of privacy protection while capturing correlation information. Moreover, the proposed algorithm can generate synthetic data preserving the same data type as mixed-type original data, which greatly improves the utility of synthetic data. The key idea of our method is to perturb the factor matrix and factor loading matrix to construct a synthetic data generation model, and to utilize link functions with privacy protection to ensure consistency of synthetic data type with original data. The proposed method can generate privacy-preserving synthetic data at low computation cost even when the original data is high-dimensional. In theory, we establish differentially private properties of the proposed method. Our numerical studies also demonstrate superb performance of the proposed method on the utility guarantee of the statistical analysis based on privacy-preserved synthetic data.

个人简介:云南大学数学与统计学院统计系教授,博士生导师,国家自然科学基金海外优青,云南省兴滇英才青年项目获得者,主要从事差分隐私、推荐系统、张量数据分析、缺失数据分析、贝叶斯分析等方面的研究,并在《Journal of Machine Learning Research》、《Journal of the American Statistical Association》等机器学习、统计学重要期刊杂志上发表多篇论文。

邀请人:刘芳 徐礼柏