Details

PERFORMANCE EVALUATION OF SEMANTIC BASED AND ONTOLOGY BASED TEXT DOCUMENT CLUSTERING TECHNIQUES

Ashish Punya

Department of Computer Science & Engineering, KEC Ghaziabad

Meenakshi Rana

Department of Computer Science & Engineering, KEC Ghaziabad

181-190

Vol: 5, Issue: 3, 2015

Receiving Date: 2015-06-24 Acceptance Date:

2015-07-21

Publication Date:

2015-08-22

Download PDF

Abstract

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during pre processing in order to improve clustering results and allow for selection between results. We built various views basing our selection of text features on a hierarchy of concepts. Based on these aggregations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline pre processing strategy. The amount of digital information is created and used is steadily growing along with the development of sophisticated hardware and software. This has increased the need for powerful algorithms that can interpret and extract interesting knowledge from these data. Data mining is a technique that has been successfully exploited for this purpose. Text mining, a category of data mining, considers only digital documents or text. Text Clustering is the process of grouping text or documents such that the document in the same cluster are similar and are dissimilar from the one in other clusters. This paper studies the working of two sophisticated algorithms. The first work is a hybrid method that combines pattern recognition process with semantic driven methods for clustering documents, while the second uses an ontology-based approach to cluster documents. Through experiments, the performance of both the selected algorithms is analyzed in terms of clustering efficiency and speed of clustering.

Keywords: communication; Repositories; powerful new technology; warehouses

References

  1. Shawkat Ali, A.B.W. (2008) K-means Clustering Adopting RBF-Kernel, Data Mining and Knowledge Discovery Technologies, David Taniar (Ed.), Pp. 118-142.
  2. Cao, T.H., Do, H.T., Hong, D.T. and Quan, T.T. (2008) Fuzzy named entity-based document clustering, Proceedings of IEEE International Conference on Fuzzy Systems, Hong Kong, Pp. 2028-2034.
  3. P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proc. of KDD-1998, New York, NY, USA, August 1998, pages 9–15, Menlo Park, CA, USA, 1998. AAAI Press.
  4. A. Hinneburg and D.A. Keim. Optimal gridclustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In Proc. of VLDB-1999, Edinburgh, Scotland, September 2000. Morgan Kaufmann, 1999.
  5. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York, 1990.
  6. A. Maedche and S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 2001.
  7. M. Devaneyand A. Ram. Efficient feature selection in conceptual clustering. In Proc. of ICML1997, Nashville, TN, 1998. Morgan Kaufmann, 1998. G. Bisson, C. N_edellec, and D. Ca namero. Designing clustering methods for ontology building: The Mo'K workbench. pages 13{19, 2000. 3
  8. Macskassy, S. A., Banerjee, A., Davison, B., & Hirsh, H. (1998). Human performance on clustering web pages: a preliminary study. In Proceedings of KDD-1998, pages 264-268. AAAI Press.
  9. Schuetze, H. & Silverstein, C. (1997). Projections for efficient document clustering. In Proceedings of SIGIR-1997, pages 74-81. Morgan Kaufmann.
  10. A. Maedche and S. Staab, “The Text-To-Onto ontology learning environment,”in Proc. 8th Int.Conf. Conceptual Struct., Darmstadt, Germany,2000, pp. 14–18.
  11. D. Roussinov and H. Chen, “Document clustering for electronic meetings:An experimental comparison of two techniques,” Decis. Support Syst.,vol. 27, no. 1/2, pp. 67–79, Nov. 1999.
  12. H. Li, K. Zhang, and T. Jiang, “Minimum entropy clustering and applications to gene expression analysis,” in Proc. 3rd IEEE Comput. Syst. Bioinform. Conf., Stanford, CA, 2004, pp. 142–151.
  13. T. H. Cheng and C. P. Wei, “A clustering-based approach for integrating document-category hierarchies,” IEEE Trans. Syst., Man, Cybern.A,Syst., Humans, vol. 38, no. 2, pp. 410–424, Mar. 2008.
  14. H. J. Kim and S. G. Lee, “An effective document clustering method using user- adaptable distance metrics,” in Proc. ACM Symp. Appl.Comput.,Madrid, Spain, 2002, pp. 16–20.
  15. Fabiano D. Beppler,”An Architecture for an Ontology-Enabled Information Retrieval”
Back

Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.

We are one of the best in the field of watches and we take care of the needs of our customers and produce replica watches of very good quality as per their demands.