# Chapter 3 Classification Algorithms I Class 12 Questions and Answer Data Science

## Classification Algorithms I Class 12 Questions and Answer

### Objective Type Questions

1. Which of the following are parts of a Decision Tree?

a) Decision Node

b) Leaf Node

c) Branch

d) All of the above

Answer: d) All of the above

2. Which of the following statement is false?

a) Decision Trees can contain only one branch.

b) Decision Trees can be used for classification and regression.

c) Random Forests algorithm uses Decision Trees.

d) None of the above

Answer: a) Decision Trees can contain only one branch.

3. Which of the following is a use case for Decision Trees?

a) Classification

b) Regression

c) Both of the above

Answer: c) Both of the above

4. A decision tree can be further divided into further sub-trees.

a) True

b) False

5. Decision Trees are easier to understand and interpret.

a) True

b) False

### Standard Questions

1. Write a short note on the application of classification algorithms.

Classification algorithms are utilized in machine learning to divide information into predetermined classes or labels, typically via input features analysis and assigning them into specific classes based on patterns in the data.

Applications of classification algorithms range from spam detection in emails and customer reviews to medical diagnosis and credit risk evaluation. Popular classification algorithms include logistic regression, support vector machines (SVM), random forests and k-nearest neighbors (KNN) which learn from labeled training data and then classify new unseen data based on what patterns they recognized during training.

### Classification Algorithms I Class 12 Questions and Answer

2. In your own words, write down the steps to create a decision tree.

Steps for Building a Decision Tree:

1. Data Collection: Secure relevant data from reliable sources and ensure it contains both input features and the associated target labels/classes.
2. Data Preprocessing: Clean your data by handling missing values, eliminating duplicates, and converting categorical variables to numerical format.
3. Data Splitting: Split the dataset into training set and testing/validation sets in order to assess your model’s performance.
4. Decision Tree Construction: Begin at the root node and divide data recursively on features that maximize information gain or Gini impurity until a stopping criterion has been met.
5. Stopping Criterion: Stop tree growth when it reaches a specific depth or when the number of samples in any node drops below a predefined threshold.
6. Tree Pruning (Optional): To prevent overfitting, prune the decision tree by removing nodes that do not significantly increase accuracy on the testing set.
7. Classification: Employ the trained decision tree to classify new data instances by traversing its tree structure from root node down until you reach an output node that represents their predicted class.

### Classification Algorithms I Class 12 Questions and Answer

3. Write two advantages of using a decision tree.

Benefits of Utilizing a Decision Tree:

1. Interpretability: Decision trees are easy to comprehend and interpret, making even non-experts accessible for decision making processes represented visually by tree structures and clear rules that make understanding their decision making processes more accessible even than other models.
2. Versatility: Decision trees offer great versatility by being able to handle both categorical and numerical data sets, making them suitable for diverse datasets. Furthermore, decision trees can also be used for classification and regression tasks, providing versatile solutions to multiple problems.

### Higher Order Thinking Skills(HOTS)

1. Write two disadvantages of using a decision tree.

Cons of Using a Decision Tree:

1. Overfitting: Decision trees can become susceptible to overfitting when their branches become too deep and complex, due to over capturing noise in training data that does not generalize to unseen ones; this can result in poor performance on new data and decreased predictive accuracy. Pruning can help minimize overfitting but this remains an uphill battle; setting appropriate stopping criteria can help mitigate it, though this remains challenging.
2. Instability: Decision trees can be sensitive to even minor changes in data, with even minor variations leading to completely different tree structures and unpredictable predictions. This instability makes decision trees less reliable for critical applications with noisy data that requires frequent updates, for instance.

To address these drawbacks, ensemble methods like Random Forests and AdaBoost boosting algorithms are frequently employed to combine multiple decision trees for improved predictive performance, generalization and reduced overfitting.

### Classification Algorithms I Class 12 Questions and Answer

2. Write a short note on the Random Forest algorithm.

Random Forest is an ensemble learning method based on decision trees that combines multiple decision trees during training to produce one final prediction. Key steps involved in its algorithm include:

1. Data Collection: Accumulate a labeled dataset consisting of input features and their target labels/classes.
2. Random Sampling of Training Data: Randomly select subsets (with replacement) of training data in order to construct decision trees with each subset.
3. Decision Tree Construction: For each subset, construct a decision tree using only certain features at each node split to increase diversity among trees. By choosing specific subsets of features for each node split, diversity among the trees will increase substantially.
4. Voting: At the prediction phase, each decision tree in the forest makes their prediction; then the overall prediction is decided through either majority voting (classification) or averaged prediction averaging (regression).

Random Forest offers several advantages over single decision trees, including reduced overfitting and increased accuracy and stability. Furthermore, Random Forest can handle large datasets and high-dimensional feature spaces effectively while providing an estimate of feature importance that assists with model performance evaluation and interpretability. Overall, Random Forest is a powerful machine learning algorithm widely used for classification and regression tasks.