Sınıflandırma için diferansiyel mahremiyete dayalı öznitelik seçimi

Esra Var, Ali İnan
548 136

Öz


Veri madenciliği ve makine öğrenmesi çözümlerinin en önemli ön aşamalarından biri yapılacak analizde kullanılacak verinin özniteliklerinin uygun bir alt kümesini belirlemektir. Sınıflandırma yöntemleri için bu işlem, bir özniteliğin sınıf niteliği ile ne oranda ilişkili olduğuna bakılarak yapılır. Kişisel gizliliği koruyan pek çok sınıflandırma çözümü bulunmaktadır. Ancak bu yöntemler için öznitelik seçimi yapan çözümler geliştirilmemiştir. Bu çalışmada, istatistiksel veritabanı güvenliğinde bilinen en kapsamlı ve güvenli çözüm olan diferansiyel mahremiyete dayalı özgün öznitelik seçimi yöntemleri sunulmaktadır. Önerilen bu yöntemler, yaygın olarak kullanılan bir veri madenciliği kütüphanesi olan WEKA ile entegre edilmiş ve deney sonuçları ile önerilen çözümlerin sınıflandırma başarımına olumlu etkileri gösterilmiştir.

Anahtar kelimeler


diferansiyel mahremiyet, sınıflandırma, öznitelik seçimi

Tam metin:

PDF


Referanslar


Kantarcioglu M., Privacy-Preserving Distributed Data Mining And Processing On Horizontally Partitioned Data, PhD thesis, Purdue University, 08-2005.

Vaidya J., Privacy Preserving Data Mining over Vertically Partitioned Data, PhD thesis, Purdue University, 08-2004.

Sweeney L., Achieving k-anonymity privacy protection using generalization and suppression, Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5), 571-588, 2002.

Machanavajjhala A., Kifer D., Gehrke J., Venkitasubramaniam M., l-diversity: privacy beyond k-anonymity, ACM Trans. Knowl. Discov. Data, 1(1), 1-36, 2007.

Li N., Li T., t-closeness: privacy beyond k-anonymity and l-diversity, Proc. of IEEE 23rd Int’l Conf. on Data Engineering, İstanbul-Turkey, 106-115, 2007.

Dwork C., Differential privacy: A survey of results, Proc. of the 5th International Conference on Theory and Applications of Models of Computation, Heidelberg-Berlin, 1-19, 2008.

Yang Y., Pedersen J. O., A comparative study on feature selection in text categorization, Proc. of the Fourteenth International Conference on Machine Learning, San Francisco CA - USA, 412-420, 1997.

Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H., The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1), 10-18, 2009.

Aggarwal C. C., On k-anonymity and the curse of dimensionality, Proc. of the 31st International Conference on Very Large Data Bases, Trondheim-Norway, 901-909, 2005.

M. M. Zhang G. L. Zou. A new data perturbation method of reference control in statistical database. Applied Mechanics and Materials, 241, 3134-3137, Trans. Tech. Publications, 2013.

Zayatz L., Evans T., Slanta J., Using noise for disclosure limitation of establishment tabular data, Journal of

Official Statistics, 14(4), 537-551, 1998.

Demirelli Okkalıoğlu B., Koç M., Polat H., Deriving private data in partitioned data-based privacypreserving

collaborative filtering systems, Journal of the Faculty of Engineering and Architecture of Gazi

University, 32(1), 53-64, 2017.

Shlomo N., Skinner C. J., Privacy protection from sampling and perturbation in survey microdata. Journal of

Privacy and Confidentiality, 4(1), 155-169, 2012.

Kadampur M. A., Somayajulu D. V. L. N., A noise addition scheme in decision tree for privacy preserving

data mining. The Computing Research Repository, arXiv:1001.3504, 2010.

Soria-Comas J., Domingo-Ferrer J., Optimal data-independent noise for differential privacy, Information

Sciences, 250(0), 200-214, 2013.

D.G.Y. Lee. Protecting Patient Data Confidentiality Using Differential Privacy, MSc. Thesis, Oregon Health

and Science University, 2008.

Lee N. Y., Kwon O., A privacy-aware feature selection method for solving the personalization-privacy

paradox in mobile wellness healthcare services. Expert Syst. Appl., 42(5), 2764-2771, 2015.

Gkoulalas-Divanis A., Loukides G., Sun J., Publishing data from electronic health records while preserving

privacy: A survey of algorithms, Journal of Biomedical Informatics, 50, 4-19, 2014.

Çelik C., Bilge H. Ş., Feature selection with weighted conditional mutual information, Journal of the Faculty

of Engineering and Architecture of Gazi University, 30(4), 585-596, 2015.

Akben S. B., Alkan A., Density-based feature extraction to improve the classification performance in the

datasets having low correlation between attributes, Journal of the Faculty of Engineering and Architecture of

Gazi University, 30(4), 597-603, 2015.

Xiao X., Tao Y., Output perturbation with query relaxation. Proc. VLDB Endow., 1(1), 857-869, 2008.

Dwork C., McSherry F., Nissim K., Smith A., Calibrating noise to sensitivity in private data analysis, Lecture

Notes in Computer Science, 3876, 265-284. Springer, Berlin Heidelberg, 2006.

John G. H., Kohavi R., Pfleger K., Irrelevant features and the subset selection problem. Proc. of the Eleventh

International Conference on Machine Learning, New Brunswick NJ – USA, 121-129, 1994.

Xiao Z., Dell E., Dou W., Chen L., ESFS: A new embedded feature selection method based on SFS,

Rapports de recherché, RR-LIRIS-2008-018, 1-10, 2008.

Lichman M., UCI machine learning repository, http://archive.ics.uci.edu/ml, published: 2013, accessed: Jan.

Mitchell T. M., Machine Learning, McGraw-Hill Inc., New York NY-USA, 1st edition, ISBN 0070428077,

Jagannathan G., Pillaipakkamnatt K., Wright R. N., A practical differentially private random decision tree

classifier. Trans. Data Privacy, 5(1), 273-295, 2012.

Allison P.D., Missing Data, SAGE Publications, ISBN 9780761916727, 2002.

Sayyad Shirabad J., Menzies T.J., The PROMISE Repository of Software Engineering Databases. School of

Information Technology and Engineering, University of Ottawa, Canada.

http://promise.site.uottawa.ca/SERepository, published: 2005, accessed: Jan. 2018.




Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.