Web Content Classification Using Artificial Neural Networks
DOI:
https://doi.org/10.15612/BD.2008.332Keywords:
Artificial neural networks, Text categorization, Content classification, Web page categorization, Information managementAbstract
Recent developments and widespread usage of the Internet have made business and processes to be completed faster and easily in electronic media. The increasing size of the stored, transferred and processed data brings many problems that affect access to information on the Web. Because of users’ need get to access to the information in electronic environment quickly, correctly and appropriately, different methods of classification and categorization of data are strictly needed. Millions of search engines should be supported with new approaches every day in order for users to get access to relevant information quickly. In this study, Multilayered Perceptrons (MLP) artificial neural network model is used to classify the web sites according to the specified subjects. A software is developed to select the feature vector, to train the neural network and finally to categorize the web sites correctly. It is considered that this intelligent approach will provide more accurate and secure platform to the Internet users for classifying web contents precisely.
Downloads
References
Apte, C., Damerau, F. ve Weiss, S.M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12, 233–251.
Google. (2007). 20 Nisan 2007 tarihinde http://www.google.com.tr/intl/tr/why_use.html adresinden erişildi.
Haykin, S. (1994). Neural networks: A comprehensive foundation. New York: Macmillan College.
Joachims, T. (1997). Text categorization with support vector machines: Learning with many relevant features (Technical Report LS-8 Report: 23). Dortmund: University of Dortmund.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. C. N'edellec ve C. Rouveirol (Ed.), Proceedings of the European Conference on Machine Learning içinde (s. 137-142). Berlin: Springer.
Levenberg, K. (1944). A method for the solution of certain nonlinear problems in least squares. Quarterly of Applied Mathematics, 2, 164-168.
Lewis, D. ve Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR’94) içinde (s. 81-93). Las Vegas.
Lewis, D.D., Schapire, R.E., Callan, J.P. ve Papka, R. (1996). Training algorithms for linear text classifiers. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval içinde (s. 298-306). New York: ACM.
Marquardt, D.W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11, 431-441.
McCallum, A. ve Nigam, K. (1998). A comparison of event models for naive Bayes text classification. Learning for Text Categorization: Papers from the 1998 Workshop içinde (s. 41-48). San Francisco, CA: AAAI Press.
Miniwatts International Inc. Internet Usage Statistics: The Big Picture. (2006). 01 Aralık 2006 tarihinde http://www.internetworldstats.com/stats.htm adresinden erişildi.
Moulinier, I. ve Ganascia, J.G. (1996). Applying an existing machine learning algorithm to text categorization. S. Wermter, E. Riloff ve G. Scheler (Ed.), Connectionist, statistical, and symbolic approaches to learning for natural language processing içinde (s. 343-354). Heidelberg: Springer Verlag.
Ng, H.T., Goh, W.B. ve Low, K.L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. N.J. Belkin, A.D. Narasimhalu, P. Willett ve W. Hersh (Ed.), Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval içinde (s. 67-73). Philadelphia, PA: ACM.
Ruiz, M.E. ve Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval, 5, 87-118.
Sağıroğlu, Ş., Beşdok, E. ve Erler, M. (2003). Mühendislikte yapay zekâ uygulamaları I: Yapay sinir ağları. Kayseri: Ufuk Kitabevi.
Shanks, V. ve Williams, H.E. (2001). Fast categorisation of large document collections. Proceedings: Eight Symposium on String Processing and Information Retrieval November 13-15, Laguna de San Rafael, Chile içinde (s. 194-204). San Rafael, Chile: IEEE Computer Society.
Wiener, E.D., Pedersen, J.O. ve Weigend, A.S. (1995). A neural network approach to topic spotting. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95) içinde (s. 317-332). Las Vegas.
Witten, I.H., Moffat, A. ve Bell, T.C. (1999). Managing gigabytes: Compressing and indexing documents and images. San Francisco, CA: Morgan Kaufmann.
Yang, Y. ve Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97) içinde (s. 412-420). San Francisco, CA: Morgan Kaufmann.
Yu, E.S. ve Liddy, E.D. (1999). Feature selection in text categorization using the Baldwin effect. Proceedings of IJCNN '99 (International Joint Conference on Neural Networks) içinde (s. 2924-2927). Washington, DC: IEEE Press.
Published
How to Cite
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.