Emc research survey redwood city

12/15/2023

As an indispensable tool of data mining, clustering algorithms play an essential role in big data analysis. Moreover, extracting meaningful information from such volumes of data is not an easy task ( Bhadani & Jothimani, 2016). The attributes of Big Data such as huge volume, a diverse variety of data, high velocity and multivalued data make data analytics difficult. With the emergence of 5G technologies, a tremendous amount of data is being generated very quickly, which turns into a massive amount that is termed as Big Data. This survey also highlights the new research directions in the field of clustering massive data. Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010–2020.

To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data. Moreover, we propose a new taxonomy for the Spark-based clustering methods. In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data. However, Spark-based clustering research is still in its early days. Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing. Traditional clustering methods are greatly challenged by the recent massive growth of data. The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters.

A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition.

0 Comments

Emc research survey redwood city

Leave a Reply.

Author

Archives

Categories