Class 12 AI Unit 5 Introduction to Big Data and Data Analytics Book Solution

EXERCISES Solution
A. Multiple Choice questions
1. What does “Volume” refer to in the context of big data?
a) The variety of data types
b) The speed at which data is generated
c) The amount of data generated
d) The veracity of the data
Ans: c) The amount of data generated
2. Which of the following is a key characteristic of big data?
a) Structured format
b) Easily manageable size
c) Predictable patterns
d) Variety
Ans: d) Variety
3. Which of the following is NOT one of the V’s of big data?
a) Velocity
b) Volume
c) Verification
d) Variety
Ans: c) Verification
4. What is the primary purpose of data preprocessing in big data analytics?
a) To increase data volume
b) To reduce data variety
c) To improve data quality
d) To speed up data processing
Ans: c) To improve data quality
5. Which technique is commonly used for analyzing large datasets to discover patterns and relationships?
a) Linear regression
b) Data mining
c) Decision trees
d) Naive Bayes
Ans: b) Data mining
6. Which term describes the process of extracting useful information from large datasets?
a) Data analytics
b) Data warehousing
c) Data integration
d) Data virtualization
Ans: a) Data analytics
7. Which of the following is a potential benefit of big data analytics?
a) Decreased data security
b) Reduced operational efficiency
c) Improved decision-making
d) Reduced data privacy
Ans: c) Improved decision-making
8. What role does Hadoop play in big data processing?
a) Hadoop is a programming language used for big data analytics.
b) Hadoop is a distributed file system for storing and processing big data.
c) Hadoop is a data visualization tool.
d) Hadoop is a NoSQL database management system.
Ans: b) Hadoop is a distributed file system for storing and processing big data.
9. What is the primary challenge associated with the veracity aspect of big data?
a) Handling large volumes of data
b) Ensuring data quality and reliability
c) Dealing with diverse data types
d) Managing data processing speed
Ans: b) Ensuring data quality and reliability
B. True or False
1. Big data refers to datasets that are too large to be processed by traditional database systems.
Ans: True
2. Structured data is the primary type of data processed in big data analytics, making up the majority of datasets.
Ans: False
3. Veracity refers to the trustworthiness and reliability of data in big data analytics.
Ans: True
4. Real-time analytics involves processing and analyzing data as it is generated, without any delay.
Ans: True
5. Cloud computing is the only concept used in Big Data Analytics.
Ans: False
6. A CSV file is an example of structured data.
Ans: False
7. “Positive, Negative, and Neutral” are terms related to Sentiment Analysis.
Ans: True
8. Data preprocessing is a critical step in big data analytics, involving cleaning, transforming, and aggregating data to prepare it for analysis.
Ans: True
9. To analyze vast collections of textual materials to capture key concepts, trends, and hidden relationships, the concept of Text mining is used.
Ans: True
C. Short answer questions
1. Define the term Big Data.
Ans: Big Data refers to a vast collection of data that is characterized by its immense volume, which continues to expand rapidly over time.
2. What does the term Volume refer to in Big Data?
Ans: Volume refers to the quantity of data to be stored. In the case of big data, a huge amount of data is generated in a very short period.
For example,
Walmart deals with big data. They handle more than 1 million customer transactions every hour, importing more than 2.5 petabytes of data into their database.
3. Mention some important benefits of big data in the health sector.
Ans:
- Predictive analysis for predicting disease outbreak, patient analysis and other health risks.
- Personalized medicine
- Clinical decision support
- Healthcare resource management
4. Enlist the four types of Big Data Analytics.
Ans: The four types of Big Data Analytics are:
- Descriptive Analytics: Summarizes historical data to identify patterns and trends.
- Diagnostic Analytics: Analyses past data to understand the reasons behind specific outcomes.
- Predictive Analytics: Uses historical data to forecast future events or trends.
- Prescriptive Analytics: Recommends actions to achieve desired outcomes based on data insights.
These types are designed to provide insights at different levels of decision-making and problem-solving
D. Long answers questions
1. Explain the 6 V’s related to Big data.
Ans: The 6 V’s of Big Data are:
- I) Volume: Refers to the massive amount of data generated daily, ranging from terabytes to exabytes. For example, 328.77 million terabytes of data are created
every day. - II) Velocity: Describes the speed at which data is generated, delivered, and analyzed. For instance, Google processes over 40,000 search queries per second.
- III) Variety: Indicates the different forms of data, including structured (e.g., databases), semi-structured (e.g., XML files), and unstructured data (e.g., videos, images, and social media posts).
- IV) Veracity: Focuses on the accuracy, quality, and trustworthiness of data. It involves ensuring that data is reliable and suitable for analytical models by addressing inconsistencies or inaccuracies.
- V) Value: Refers to the insights and business benefits that can be extracted from Big Data. Without deriving value, the other characteristics hold little significance.
- VI) Variability: Highlights the inconsistencies or unpredictability in data flow, requiring systems to adapt and extract meaningful insights even in dynamic conditions.
2. Explain the differences between structured, semi-structured, and unstructured data.
Ans:
Aspect | Structured Data | Semi-structured Data | Unstructured Data |
---|---|---|---|
Definition | Quantitative data with a defined structure | A mix of quantitative and qualitative properties | No inherent structures or formal rules |
Data Model | Dedicated data model | May lack a specific data model | Lacks a consistent data model |
Organization | Organized in clearly defined columns | Less organized than structured data | No organization exhibits variability over time |
Accessibility | Easily accessible and searchable | Accessible but may be harder to analyze | Accessibility depends on the specific data format |
Examples | Transaction information, Names, Dates, Addresses | XML, CSV, HTML, JSON, Email, Web pages | PDFs, Images, Text Files, Social Media post, Video, Audio files. |

3. Explain the process of Big Data Analytics.
Ans: The process of Big Data Analytics can be divided broadly into four major steps. They are as follows:
- Step 1. Gather data
Each company has a unique approach to data collection. Organizations can now collect structured and unstructured data from various sources, including cloud
storage, mobile apps, and IoT sensors. - Step 2. Process Data
Once data is collected and stored, it must be processed properly to get accurate results on analytical queries, especially when it’s large and unstructured. The
analysis can be done either batch wise or stream wise. - Step 3. Clean Data
Scrubbing all data, regardless of size, improves quality and yields better results. Correct formatting and elimination of duplicate or irrelevant data are essential. Dirty data can lead to inaccurate insights. - Step 4. Analyze Data
Getting big data into a usable state takes time. Once it’s ready, advanced analytics processes can turn big data into big insights.
4. Why is Big Data Analytics important in modern industries and decision-making processes?
Ans: Big Data Analytics is important in modern industries and decision-making processes because it:
- Enables Data-Driven Decisions: By analyzing vast and diverse datasets, organizations can make informed decisions based on insights and trends.
- Improves Efficiency and Productivity: Identifying inefficiencies and optimizing resource allocation helps streamline processes.
- Enhances Customer Insights: Understanding customer behavior and preferences enables personalized marketing and improved customer experiences.
- Provides Competitive Advantage: Leveraging analytics helps organizations uncover market trends, identify opportunities, and stay ahead of competitors.
- Fosters Innovation and Growth: Insights derived from data analysis drive the development of new products, services, and business models.
5. A healthcare company is using Big Data analytics to manage patient records, predict disease outbreaks, and personalize treatments. However, the company is facing challenges regarding data privacy, as patient information is highly sensitive. What are the potential risks to patient privacy when using Big Data in healthcare, and how can these be mitigated?
Ans: Potential Risks to Patient Privacy:
- Unauthorized Access: Sensitive patient information could be accessed by unauthorized individuals, leading to breaches of confidentiality.
- Data Breaches: Cyberattacks could expose patient data to malicious actors.
- Misuse of Personal Information: Patient data might be used for purposes beyond its intended scope, such as marketing or profiling.
- Regulatory Non-Compliance: Failing to comply with data protection laws like GDPR or the Digital Personal Data Protection Act, 2023, could lead to legal and financial penalties.
Mitigation Strategies:
- Data Encryption: Encrypt data during storage and transmission to protect against unauthorized access.
- Access Controls: Implement strict access controls to ensure that only authorized personnel can access sensitive data.
- Anonymization: Remove personally identifiable information (PII) from datasets to safeguard patient identity during analysis.
- Regular Audits: Conduct regular security audits to identify and address vulnerabilities.
- Compliance with Regulations: Adhere to data protection laws to ensure ethical handling of sensitive information.
- Employee Training: Educate staff about data privacy practices and the importance of protecting patient information
6. Given the following list of data types, categorize each as Structured, Unstructured, or Semi-Structured:
a) A customer database with fields such as Name, Address, Phone Number, and Email.
Ans: Structured
b) A JSON file containing product information with attributes like name, price, and specifications.
Ans: Semi-Structured
c) Audio recordings of customer service calls.
Ans: Unstructured
d) A sales report in Excel format with rows and columns.
Ans: Structured
e) A collection of social media posts, including text, images, and hashtags.
Ans: Unstructured
f) A CSV file with daily temperature readings for the past year.
Ans: Structured
E. Competency-Based Questions:
1. A retail clothing store is experiencing a decline in sales despite strong marketing campaigns. You are tasked with using big data analytics to identify the root cause.
- a. What types of customer data can be analyzed?
- b. How can big data analytics be used to identify buying trends and customer preferences?
- c. Can you recommend specific data visualization techniques to present insights to stakeholders?
- d. How might these insights be used to personalize customer experiences and improve sales?
Ans:
- a. Analyze purchase history (items bought together, frequency, time of purchase), demographics (age, location, income), and browsing behavior (clicks, time spent on product pages) of the customer.
- b. Big data analytics can help
- i. identify items that are frequently purchased together to optimize product placement and promotions.
- ii. group customers. based on demographics and buying habits
- iii. track customer journeys on the website, identify areas of improvement (e.g., checkout process)
- c. understand the key metrics (sales by category, customer demographics) for easy stakeholder comprehension, and to understand the customer browsing behavior on the website (hotspots which indicate the items of interest).
- d. These insights will help the application to
- i. recommend relevant products based on a customer’s purchase history and browsing behavior.
- ii. tailor promotions and advertisements to specific customer segments.
- adjust prices based on demand and customer demographics.
2. A research institute is conducting a study on public sentiment towards environmental conservation efforts. They aim to gather insights from various data sources to
understand public opinions and perceptions. They collect data from diverse sources such as news articles, online forums, blog posts, and social media comments. Which
type of data does this description represent?
Ans: Unstructured data
3. A global e-commerce platform is experiencing rapid growth in its user base, with millions of transactions occurring daily across various product categories. As part of their data analytics efforts, they are focused on improving the speed and efficiency of processing incoming data to provide real-time recommendations to users during their browsing and purchasing journeys. Identify the specific characteristic of big data (6V’s of Big Data) that is most relevant in the above scenario and justify your answer.
Ans: In the scenario described, the most relevant characteristic of big data from the 6V’s perspective is Velocity.
The reason being it highlights the need for the e-commerce platform to handle the high speed at which data is generated from millions of transactions daily. The platform needs to process this data quickly to provide real-time recommendations during a user’s browsing and purchasing journey.
Delays in processing could lead to missed opportunities to influence customer decisions.
Reference links: