Abstract:
This research paper describes the results oriented from experimental study of conventional document clustering techniques implemented in the commercial spaces so far. Particularly, we compared main approaches related to document clustering, agglomerative hierarchical clustering and K-means. though this paper, we generates and implement checker’s algorithms which deals with the duplicacy of the document content with the rest of the documents in the cloud. We also generate algorithm required to deals with the classification of the cloud data. The classification in this algorithm is done on the basis of the date of data uploaded and the how much that data is accessed by the client. We will take the ratio of both vectors and generate a score which rates the document in the classification. We propose an explanation for these results that is based on an analysis of the specifics of the clustering algorithms and the nature of document data.
Keywords: algorithm, commercial, classification, hierarchical, nature, etc.