Exploratory Data Analysis Class 12 Questions and Answers
Objective Type Questions
Please choose the correct option in the questions below.
1. You need to check the relationship between the two variables. Which graph would you use?
a) Histogram
b) Pair plot
c) Box plot
d) None of the above
Answer: b) Pair plot
2. You need to check if a variable has outliers. Which graph would you use?
a) Histogram
b) Pair plot
c) Box plot
d) None of the above
Answer: c) Box plot
3. You need to perform a multivariate analysis. Which graph will you use?
a) Contour plot
b) Scatter plot
c) Box plot
d) None of the above
Answer: d) None of the above
4. You need to perform a univariate analysis. Which graph will you use?
a) Scatter plot
b) Histogram
c) Contour plot
d) Both a and b
Answer: b) Histogram
5. What is a data cleaning step?
a) Removing duplicates
b) Removing outliers
c) All of the above
Answer: c) All of the above
Exploratory Data Analysis Class 12 Questions and Answers
Standard Questions
Please answer the questions below in no less than 100 words.
1. What are some of the differences between univariate and multivariate analysis? Give some examples.
Univariate analysis involves studying one variable at a time while multivariate analysis involves simultaneously studying multiple variables to understand their relationships. Univariate methods typically include histograms, box plots and summary statistics such as mean and median; univariate analysis could also involve looking at distribution of students’ test scores as an example of univariate analysis; multivariate methods include techniques like scatter plots, correlation and regression for instance studying relationships between test score and time spent studying (i.e. studying how much time was actually spent studying) etc.
Exploratory Data Analysis Class 12 Questions and Answers
2. What are the ways to handle missing data?
Methods for handling missing data may include:
Removing rows with minimally missing data. Imputation: Replacing missing values with mean, median, or mode of variable; additionally advanced techniques like regression imputation or k-nearest neighbors may also be used to impute missing values.
Utilizing machine learning algorithms to predict missing values based on other variables is one way machine learning algorithms can assist.
Exploratory Data Analysis Class 12 Questions and Answers
3. What are some of the methods for univariate analysis?
Methods of univariate analysis include histograms as visual depictions of distribution; box plots depict a summary of variable distribution with median, quartiles, and outliers displayed;
Summary statistics involve the calculation of measures such as mean, median, mode, variance and standard deviation. Bar charts depict categorical data with frequency distribution while pie charts highlight proportions within datasets.
4. What are the steps for cleaning raw data?
Steps for cleaning raw data include:
Handling missing data: Determine appropriate imputation methods or delete rows with missing values.
Data Redundancy: Removing duplicate rows to reduce redundancy.
Outlier detection and handling: Spot outliers that might hinder analysis.
Standardization: Convert data into an accessible format and unit as necessary.
Data type conversion: Verify that variables contain appropriate types (numerical or categorical).
Reducing irrelevant columns: Drop columns that are unnecessary to the analysis process.
Coping with inconsistencies: Correct errors and ensure data is reliable and accurate.
Exploratory Data Analysis Class 12 Questions and Answers
Higher Order Thinking Skills(HOTS)
Please answer the questions below in no less than 200 words.
1. What problems can outliers cause?
Outliers are data points that greatly differ from the majority of data in a dataset, often representing important pieces of information or occurring naturally in datasets. While outliers may sometimes represent valuable insight, they may also present problems during data analysis:
- Skewed Analysis: Outliers can distort data distributions, leading to inaccurate summaries of central tendency and dispersion (e.g. mean and standard deviation), that misrepresent true characteristics of a dataset.
- Biased Models: When training machine learning models, outliers may interfere with the training process and cause it to focus more heavily on extreme values rather than patterns found throughout most data. This may compromise its predictive performance and lead to less accurate predictions being produced from them.
- Outliers Can Diminish Robustness: Outliers may decrease the robustness of statistical and machine learning algorithms, making them less reliable in real world situations.
- Misinterpretation: Exploratory Data Analysis (EDA) can lead to misinterpreted insights due to outliers, as they obscure true trends and patterns within the data.
Exploratory Data Analysis Class 12 Questions and Answers
2. Why should irrelevant observations be removed from the data?
Removing irrelevant observations, commonly referred to as noise, can impede data analysis processes and produce misleading results. Eliminating them is key for several reasons, including:
- Improved Accuracy: Irrelevant observations can introduce errors and noise into an analysis, leading to inaccurate predictions or conclusions. By eliminating them, data becomes more representative of its underlying patterns.
- Increased Efficiency: Eliminating unnecessary observations can reduce the size of the dataset, speeding up analysis and modeling processes – especially with large datasets.
- Clearer Insights: Eliminating irrelevant observations allows the analysis to focus on only relevant features and relationships within the data, making it easier for people to spot meaningful patterns and trends.
- Simplified Interpretation: Unwanted observations can complicate interpretation, making it more challenging to glean actionable insights from data.
Exploratory Data Analysis Class 12 Questions and Answers
3. How can we use unsupervised learning for EDA?
Unsupervised learning can be used in exploratory data analysis (EDA) to uncover patterns, relationships and clusters within data without labeled target variables. Some ways unsupervised learning can be employed include:
- Clustering: Unsupervised learning algorithms like K-means or hierarchical clustering can group similar data points together into clusters that become visible within a set of information, helping identify hidden subgroups or patterns not previously recognized.
- Dimensionality Reduction: Tools such as Principal Component Analysis (PCA) or t-SNE can help to reduce the dimensionality of high-dimensional data while still preserving important structural details, thus providing easier visualizing and understanding.
- Anomaly Detection: Unsupervised algorithms can assist in the detection of unusual data points that might indicate anomalies, errors, or potential outliers within a dataset.
- Visualization: Unsupervised learning techniques can be leveraged to generate visual representations of data, such as scatter plots or heatmaps, that provide more intuitive ways for understanding complex relationships within it.