Developing an Integrated Model to Enhance the Efficiency in the Detection and Erasing of Duplicate Files from Cloud

Rishit Garkhel


Vol: 10, Issue: 4, 2020

Receiving Date: 2020-08-19 Acceptance Date:


Publication Date:


Download PDF


Identifying and disposing of the copied document is one of the serious issues in the wide space of information cleaning and information quality in the framework. Ordinarily, a similar sensible true element might have numerous portrayals in the information distribution centre. Copy disposal is hard because it is brought about by a few blunders like typographical mistakes and various pictures of similar consistent worth. Our primary aim of this study is to recognise specific and inaccurate representations by utilising copy description and end rules. This methodology is used to work on the proficiency of the information. The significance of information precision and quality has expanded with the blast of information size. In the copy disposal step, just one duplicate of accurate copied records or documents is held and dispensed with other copy records or documents. The end cycle is vital to delivering cleaning information. Before the end sequence, the similitude limit esteems are determined for every one of the records available in the informational collection. The closeness limit admires significant for the end communication.

Keywords: Duplicate record recognition; Duplication; information linkage


  1. Radu-Ioan,Ciobanu,Valentin Cristea, Ciprian Dobre and Florin Pop, Big Data Platforms for the Internet of Things,2014,Springer
  2. Flavio Bonomi, Rodolfo Milito, Preethi Natarajan and Jiang Zhu,Fog Computing: A Platform for Internet of Things and Analytics, springer (2014)
  3. Shintaro Yamamoto, Shinsuke Matsumoto,Sachio Saiki, and Masahide Nakamura Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo 657-8501, Japan, Using Materialized View as a Service of Scallop4SC for Smart City Application Services (2014)
  4. Mukherjee, A.; Datta, J.; Jorapur, R.; Singhvi, R.; Haloi, S.; Akram, W. “Shared disk big data analytics with Apache Hadoop” (18-22 Dec. 2012)
  5. Kudakwashe Zvarevashe1, Dr. A Vinaya Babu, Towards MapReduce Performance Optimization: A Look into the Optimization Techniques in Apache Hadoop for Bigdata Analytics (2014)
  6. Gartner: Hype cycle for big data, 2012. Technical report (2012)
  7. IBM, Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. 1st edn. McGraw-Hill Osborne Media, New York (2011)
  8. Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., Tufano, P.: Analytics: The real-world use of big data. IBM Institute for Business Value—executive report, IBM Institute for Business Value (2012)
  9. Evans, D.: The internet of things—how the next evolution of the internet is changing everything. Technical report (2011)
  10. Cattell, R.: Scalable sql and nosql data stores. Technical report (2012)
  11. Apache: Hadoop (2014) (Online 20 Oct 2015)
  12. Jo Foley, M.: Microsoft drops dryad; puts its big-data bets on hadoop. Technical report (2011)
  13. Locatelli, O.: Extending nosql to handle relations in a scalable way models and evaluation framework (2012012)
  14. Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O'Reilly Media, Incorporated (2013)
  15. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels,W.: Dynamo: amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 205–220 (2007) Big Data Management Systems for the Exploitation 89
  16. Riak: Riak (Online Oct 2015)
  17. Apache: Couchdb (Online; Oct 2015)
  18. MongoDB: Mongodb (Online; Oct 2015)
  19. MongoDB: Mongodb (Online; Oct 2015)
  20. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.A., Mankovskii, S.: Solving big data challenges for enterprise application performance management. Proc. VLDB Endow. 5, 1724–1735 (2012)
  21. Neo Technology, I.: Neo4j, the world's leading graph database. (Online;Oct 2015)
  22. Amato, A., DiMartino, B., Venticinque, S.: Semantically augmented exploitation of pervasive environments by intelligent agents. In: ISPA, pp. 807–814.(2012)
  23. Amato, A., DiMartino, B., Venticinque, S.: Semantically augmented exploitation of pervasive environments by intelligent agents. In: ISPA, pp. 807–814.(2012)
  24. (online Oct 2015).

Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.