Big Data Analysis: Challenges, Issues and Tools

Topics:
Words:
2026
Pages:
4
This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

Cite this essay cite-image

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. Big data decodes previously untouched data to derive new insight that gets integrated into business operations. However, as the amounts of data increase exponential, the current techniques are becoming obsolete. Dealing with big data requires comprehensive coding skills, domain knowledge, and statistics.

In the digital world, data are generated from various sources, and the fast transition from digital technologies has led to the growth of big data. It provides evolutionary breakthroughs in many fields with the collection of large datasets. In general, it refers to the collection of large and complex datasets, which are difficult to process using traditional database management tools or data processing applications. The amount of data that are being generated every day, whereas velocity is the rate of growth and how fast the data are gathered for being analysis. Variety provides information about the types of data such as structured, unstructured, semi-structured, etc. The fourth V refers to veracity that includes availability and accountability. The prime objective of big data analysis is to process data of high volume, velocity, variety, and veracity using various traditional and computational intelligent techniques. Some of these extraction methods for obtaining helpful information was discussed by Gandomi and Haider. In this case, extracting the precise knowledge from the available big data is a foremost issue. Most of the presented approaches in data mining are not usually able to handle the large datasets successfully. The key problem in the analysis of big data is the lack of coordination between database systems, as well as with analysis tools such as data mining and statistical analysis. These challenges generally arise when we wish to perform knowledge discovery and representation for its practical applications. A fundamental problem is how to quantitatively describe the essential characteristics of big data. There is a need for epistemological implications in describing data revolution. Additionally, the study on complexity theory of big data will help understand essential characteristics and formation of complex patterns in big data, simplify its representation, gets better knowledge abstraction, and guide the design of computing models and algorithms on big data.

Save your time!
We can take care of your essay
  • Proper editing and formatting
  • Free revision, title page, and bibliography
  • Flexible prices and money-back guarantee
Place an order
document

Challenges in Big Data Analytics

Recent years big data has been accumulated in several domains like health care, public administration, retail, biochemistry, and other interdisciplinary scientific researches. Web-based applications encounter big data frequently, such as social computing, Internet text and documents, and Internet search indexing. Social computing includes social network analysis, online communities, recommender systems, reputation systems, and prediction markets, whereas Internet search indexing includes ISI, IEEE Xplorer, Scopus, Thomson, which is an unprecedented challenge for researchers. It is because existing algorithms may not always respond in an adequate time when dealing with these high dimensional data. Automation of this process and developing new machine learning algorithms to ensure consistency is a major challenge in recent years. In addition to all these, clustering of large datasets that help in analyzing the big data is of prime concern. The major challenge, in this case, is to pay more attention to designing storage systems and to elevate efficient data analysis tool that provides guarantees on the output when the data comes from different sources. Furthermore, the design of machine learning algorithms to analyze data is essential for improving efficiency and scalability.

Data Storage and Analysis

In recent years, the size of data has grown exponentially by various means such as mobile devices, aerial sensory technologies, remote sensing, radio frequency identification readers, etc. These data are stored on spending much cost whereas they ignored or deleted finally because there is not enough space to store them. Therefore, the first challenge for big data analysis is storage mediums and higher input/output speed. In such cases, data ingestion tools can support data accessibility, which must be a top priority for knowledge discovery and representation. The prime reason is being that it must be accessed easily and promptly for further analysis. In past decades, the analyst uses hard disk drives to store data, but it’s slower random input/output performance than sequential input/output. To overcome this limitation, the concept of solid-state drive (SSD) and phase change memory (PCM) was introduced. However, available storage technologies cannot possess the required performance for processing big data.

Scalability and Visualization of Data

The most important challenge for big data analysis techniques is its scalability and security. As the data size is scaling much faster than CPU speeds, there is a natural dramatic shift in processor technology being embedded with an increasing number of cores. This shift in processors leads to the development of parallel computing. Real-time applications like navigation, social networks, finance, Internet search, timeliness, etc., require parallel computing. The objective of visualizing data is to present them more adequately using some techniques of graph theory. Graphical visualization provides the link between data with proper interpretation. However, an online marketplace like Flipkart, Amazon, eBay has millions of users and billions of goods to sell each month. This generates a lot of data. To this end, some company uses a tool Tableau for big data visualization. It has the capability to transform large and complex data into intuitive pictures. This help employees of a company to visualize search relevance, monitor the latest customer feedback, and their sentiment analysis.

Open Research Issues in Big Data Analytics

Big data analytics and data science are becoming the research focal point in industries and academia. Data science aims at researching big data and knowledge extraction from data. Applications of big data and data science include information science, uncertainty modeling, uncertain data analysis, machine learning, statistical learning, pattern recognition, data warehousing, and signal processing. Effective integration of technologies and analysis will result in predicting the future drift of events.

The research issues pertaining to big data analysis are classified into three broad categories namely the Internet of Things (IoT), cloud computing, bio-inspired computing and quantum computing.

IoT for Big Data Analytics

The Internet has restructured global interrelations, the art of businesses, cultural revolutions and an unbelievable number of personal characteristics. Currently, machines are getting in on the act to control innumerable autonomous gadgets via the Internet and create the Internet of Things (IoT). Thus, appliances are becoming the user of the Internet, just like humans with web browsers. The Internet of Things is attracting the attention of recent researchers for its most promising opportunities and challenges. It has an imperative economic and societal impact for the future construction of information, network and communication technology. The new regulation of the future will be eventually, everything will be connected and intelligently controlled.

Bio-Inspired Computing for Big Data Analytics

Bio-inspired computing is a technique inspired my nature to address complex real-world problems. Biological systems are self-organized without central control. A bio-inspired cost minimization mechanism searches and find the optimal data service solution on considering the cost of data management and service maintenance. These techniques are developed by biological molecules such as DNA and proteins to conduct computational calculations involving storing, retrieving, and processing of data. A significant feature of such computing is that it integrates biologically derived materials to perform computational functions and receive intelligent performance. These systems are more suitable for big data applications.

Bio-inspired computing techniques serve as a key role in intelligent data analysis and its application to big data. These algorithms help in performing data mining for large datasets due to its optimization application. The most advantage is its simplicity and their rapid convergence to the optimal solution while solving service provision problems. Some applications to this end using bio-inspired computing were discussed in detail by Cheng et al. From the discussions, we can observe that the bio-inspired computing models provide smarter interactions, inevitable data losses, and help is handling ambiguities. Hence, it is believed that in future bio-inspired computing may help in handling big data to a large extent.

Tools for Big Data Processing

Large numbers of tools are available to process big data. In this section, I will discuss some current techniques for analyzing big data with emphasis on three important emerging tools, namely MapReduce, Apache Spark, and Storm. Most of the available tools concentrate on batch processing, stream processing, and interactive analysis. Most batch processing tools are based on the Apache Hadoop infrastructure, such as Mahout and Dryad. Stream data applications are mostly used for real-time analytics. Some examples of large-scale streaming platform are Strom and Splunk. The interactive analysis process allows users to directly interact in real time for their own analysis. For example, Dremel

Apache Hadoop and MapReduce

Hadoop works on two kinds of nodes such as master node and worker node. The master node divides the input into smaller subproblems and then distributes them to worker nodes in the map step. Thereafter the master node combines the outputs for all the subproblems in reduce step. Moreover, Hadoop and MapReduce work as a powerful software framework for solving big data problems. It is also helpful in fault-tolerant storage and high throughput data processing.

Apache Spark

Apache spark is an open-source big data processing framework built for speed processing and sophisticated analytics. It is easy to use and was originally developed in 2009 in UC Berkeley AMPLab. It was open sourced in 2010 as an Apache project. Sparklets you quickly write applications in Java, Scale, or Python. In addition, to map reduce operations, it supports SQL queries, streaming data, machine learning, and graph data processing. Spark runs on top of existing Hadoop distributed file system (HDFS) infrastructure to provide enhanced and additional functionality. Spark consists of components namely driver program, cluster manager and worker nodes. The driver program serves as the starting point of execution of an application on the spark cluster. The major advantage is that it provides support for deploying spark applications in an existing Hadoop cluster.

Apache Drill

Apache Drill is another distributed system for interactive analysis of big data. It has more flexibility to support many types of query languages, data formats, and data sources. It is also specially designed to exploit nested data. Also, it has an objective to scale up on 10,000 servers or more and reaches the capability to process petabytes of data and trillions of records in seconds. Drill use HDFS for storage and map reduce to perform batch analysis. Apache Drill are the big data platforms that support interactive analysis. These tools help us in developing big data projects.

Suggestions for Future Work

The amount of data collected from various applications all over the world across a wide variety of fields today is expected to double every two years. It has no utility unless these are analyzed to get useful information. This necessitates the development of techniques which can be used to facilitate big data analysis. The development of powerful computers is a boon to implement these techniques leading to automated systems. The transformation of data into knowledge is by no means an easy task for high-performance large-scale data processing, including exploiting the parallelism of current and upcoming computer architectures for data mining. Moreover, these data may involve uncertainty in many different forms. the data collected have missing values. These values need to be generated, or the tuples having these missing values are eliminated from the data set before analysis. More importantly, these new challenges may comprise, sometimes even deteriorate, the performance, efficiency, and scalability of the dedicated data-intensive computing systems.

Conclusion

In recent years data are generated at a dramatic pace. Analyzing this data is challenging for a general man. To this end in this paper, I survey the various research issues, challenges, and tools used to analyze these big data. From this survey, it is understood that every big data platform has its individual focus. Some of them are designed for batch processing whereas some are good at real-time analytics. Each big data platform also has specific functionality. Different techniques used for the analysis, including statistical analysis, machine learning, data mining, intelligent analysis, cloud computing, quantum computing, and data stream processing. I believe that in future researchers will pay more attention to these techniques to solve problems of big data effectively and efficiently.

Make sure you submit a unique essay

Our writers will provide you with an essay sample written from scratch: any topic, any deadline, any instructions.

Cite this paper

Big Data Analysis: Challenges, Issues and Tools. (2023, September 08). Edubirdie. Retrieved December 22, 2024, from https://edubirdie.com/examples/big-data-analysis-challenges-issues-and-tools/
“Big Data Analysis: Challenges, Issues and Tools.” Edubirdie, 08 Sept. 2023, edubirdie.com/examples/big-data-analysis-challenges-issues-and-tools/
Big Data Analysis: Challenges, Issues and Tools. [online]. Available at: <https://edubirdie.com/examples/big-data-analysis-challenges-issues-and-tools/> [Accessed 22 Dec. 2024].
Big Data Analysis: Challenges, Issues and Tools [Internet]. Edubirdie. 2023 Sept 08 [cited 2024 Dec 22]. Available from: https://edubirdie.com/examples/big-data-analysis-challenges-issues-and-tools/
copy

Join our 150k of happy users

  • Get original paper written according to your instructions
  • Save time for what matters most
Place an order

Fair Use Policy

EduBirdie considers academic integrity to be the essential part of the learning process and does not support any violation of the academic standards. Should you have any questions regarding our Fair Use Policy or become aware of any violations, please do not hesitate to contact us via support@edubirdie.com.

Check it out!
close
search Stuck on your essay?

We are here 24/7 to write your paper in as fast as 3 hours.