Ashish Punya
Department of Computer Science & Engineering, KEC Ghaziabad
Meenakshi Rana
Department of Computer Science & Engineering, KEC Ghaziabad
Download PDFText clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during pre processing in order to improve clustering results and allow for selection between results. We built various views basing our selection of text features on a hierarchy of concepts. Based on these aggregations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline pre processing strategy. The amount of digital information is created and used is steadily growing along with the development of sophisticated hardware and software. This has increased the need for powerful algorithms that can interpret and extract interesting knowledge from these data. Data mining is a technique that has been successfully exploited for this purpose. Text mining, a category of data mining, considers only digital documents or text. Text Clustering is the process of grouping text or documents such that the document in the same cluster are similar and are dissimilar from the one in other clusters. This paper studies the working of two sophisticated algorithms. The first work is a hybrid method that combines pattern recognition process with semantic driven methods for clustering documents, while the second uses an ontology-based approach to cluster documents. Through experiments, the performance of both the selected algorithms is analyzed in terms of clustering efficiency and speed of clustering.
Keywords: communication; Repositories; powerful new technology; warehouses
Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.