MACHINE LEARNING EXAM PREP
ETCS-402 MACHINE LEARNING EIGHTH SEMESTER [B.TECH] 2023
PYQ 2020 [2.5 MKS]
a) Explain the goals of machine learning.
Machine learning (ML) is the study of algorithms and statistical models that computers use to learn from and make predictions or decisions based on data. The primary goal of machine learning is to enable computers to learn and improve their performance on a specific task without being explicitly programmed. The key objectives of machine learning are:
Prediction: To make accurate predictions based on historical data.
Classification: To automatically classify or group data based on certain characteristics.
Pattern Recognition: To automatically identify patterns or regularities in data.
Optimization: To optimize a particular objective function, such as minimizing error or maximizing accuracy.
b) What is bagging?
Bagging stands for Bootstrap Aggregation. It is a technique used in ensemble learning to improve the accuracy of machine learning algorithms. Bagging involves creating multiple models using different subsets of the training data and combining their predictions to produce a final prediction. The idea behind bagging is that the multiple models, each trained on different data, will have different strengths and weaknesses, and by combining their predictions, we can reduce the variance and improve the accuracy of the final model.
c) What is the role of a kernel in Support Vector Machine classifier?
A kernel is a function that takes two data points as input and outputs a measure of their similarity. In Support Vector Machines (SVM), the kernel function is used to transform the input data into a higher-dimensional space where it is easier to separate the data into different classes using a hyperplane. The kernel function essentially defines the shape of the decision boundary that separates the different classes. Some commonly used kernel functions include linear, polynomial, and radial basis function (RBF) kernels.
d) What is boosting?
Boosting is a technique used in ensemble learning to improve the accuracy of machine learning algorithms. Boosting involves creating multiple models, where each subsequent model tries to correct the errors of the previous model. The idea behind boosting is that the multiple models, each trained on different data, will have different strengths and weaknesses, and by combining their predictions, we can reduce the bias and improve the accuracy of the final model.
e) What is a perception? Explain in brief.
Perceptron is a single-layer neural network model used for binary classification tasks. It consists of an input layer, an output layer, and a single layer of artificial neurons that process the input data and generate the output. The perceptron learning algorithm adjusts the weights of the neurons to minimize the error between the predicted output and the true output.
f) Explain KNN classifier.
KNN (k-nearest neighbors) classifier is a non-parametric machine learning algorithm used for both classification and regression tasks. The KNN algorithm is based on the assumption that similar data points tend to belong to the same class. It works by finding the k nearest data points to the test data point and assigns the class label based on the majority class of the k nearest neighbors.
g) What is Linear Quadratic Regulation?
Linear Quadratic Regulation (LQR) is a control theory technique used to design controllers for linear systems. LQR aims to minimize the cost function, which is the sum of the quadratic costs of the state and control inputs. The LQR algorithm calculates the optimal feedback gains that minimize the cost function and achieve the desired performance.
h) What is Direct Policy Search?
Direct Policy Search (DPS) is a reinforcement learning technique used to learn the optimal policy for a given task. DPS directly searches for the optimal policy function using optimization algorithms, rather than explicitly estimating the value function or the transition probabilities.
i) What is logistic regression?
Logistic Regression is a statistical model used for binary classification problems. It models the probability of the outcome of a binary variable (i.e., a variable that takes on one of two possible values) based on one or more predictor variables. Logistic regression uses the logistic function (also known as the sigmoid function) to transform a linear combination of the predictor variables into a probability value between 0 and 1. This probability value is then used to make a binary classification decision.
j) What is the difference between supervised and unsupervised learning?
Supervised learning and unsupervised learning are two broad categories of machine learning algorithms. In supervised learning, the algorithm is trained on labeled data, which means that the input data has corresponding output labels. The goal of supervised learning is to learn a mapping between the input data and the output labels, so that the algorithm can make accurate predictions on new, unseen data. Examples of supervised learning algorithms include linear regression, logistic regression, decision trees, and neural networks.
In unsupervised learning, the algorithm is trained on unlabeled data, which means that there are no corresponding output labels for the input data. The goal of unsupervised learning is to find patterns or structure in the data, without any prior knowledge of what the structure might be. Examples of unsupervised learning algorithms include clustering algorithms (such as k-means clustering), dimensionality reduction algorithms (such as principal component analysis), and generative models (such as variational autoencoders and generative adversarial networks).
PYQ 2018 [5MKS]
a) Explain the Goals of Machine learning.
Goals of Machine Learning:
The main goal of machine learning is to develop intelligent algorithms and models that can learn from data and make predictions or decisions on new data. The goals of machine learning can be broadly classified into three categories:
Prediction: The primary goal of machine learning is to make accurate predictions on new, unseen data. For example, predicting whether an email is spam or not, predicting stock prices, or predicting whether a customer will churn.
Description: Machine learning can be used to identify patterns and relationships in data, and to gain insights and understanding from the data. For example, identifying which features are most important in predicting a certain outcome, or identifying groups or clusters of similar data points.
Control: Machine learning can be used to control systems and make decisions based on the learned patterns and relationships. For example, controlling a robot or autonomous vehicle, or making decisions in a game like chess or Go.
b) What is the difference between linear and non linear discriminative
classification.
Linear discriminative classification models assume that the decision boundary that separates different classes is a linear function of the input features. These models include logistic regression, linear SVM, and perceptron. Nonlinear discriminative classification models allow for more complex decision boundaries that can be nonlinear functions of the input features. These models include decision trees, random forests, and neural networks.
c) Explain the brief of Bellman Equation.
The Bellman equation is a recursive equation that expresses the value of a state in terms of the expected immediate reward and the expected value of the next state. It is used in reinforcement learning to compute the optimal policy for an agent to take actions in an environment. The equation is given as:
V(s) = R(s) + γ * max(a) Q(s, a')
where V(s) is the value of state s, R(s) is the immediate reward for being in state s, Q(s, a') is the value of taking action a' in state s, and γ is the discount factor that determines the importance of future rewards.
d) What is bagging and boosting?
Bagging and boosting are two ensemble learning techniques used in machine learning. Bagging stands for bootstrap aggregation and involves training multiple models on different subsets of the training data, and then combining their predictions by averaging or voting. Boosting, on the other hand, involves sequentially training multiple models, with each subsequent model trying to correct the errors made by the previous model. Boosting methods include AdaBoost, gradient boosting, and XGBoost.
e) How is KNN different from K-means clustering?
K-Nearest Neighbors (KNN) is a supervised learning algorithm that is used for classification and regression tasks. It involves finding the k nearest neighbors of a given data point in the feature space and using their labels or values to predict the label or value of the new data point.
K-Means clustering is an unsupervised learning algorithm that is used to group similar data points together based on their features. It involves finding k clusters in the feature space, such that the sum of the squared distances of each data point to its nearest cluster center is minimized. K-Means is often used for exploratory data analysis and can help identify patterns and relationships in the data.
PYQ 2017 [5 Marks]
a) What are the key tasks of machine learning?
The key tasks of machine learning are as follows:
Data preprocessing: It involves data cleaning, feature selection, and feature engineering.
Training the model: It involves selecting an appropriate model, training the model on the training data, and evaluating the performance of the model on the validation data.
Model evaluation: It involves evaluating the performance of the trained model on the test data.
Model optimization: It involves optimizing the model parameters to improve its performance.
b) Discuss Narve Bayes Theorem.
Naive Bayes theorem is a probabilistic algorithm that is based on Bayes' theorem. It is used for classification problems and assumes that the features are independent of each other. It is called "naive" because it makes a simplifying assumption that the features are independent of each other. The theorem states that the probability of an event given a set of evidence is proportional to the probability of the evidence given the event.
c) What is the difference between supervised learning and unsupervised learning?
The main difference between supervised learning and unsupervised learning is that in supervised learning, the model is trained on labeled data, while in unsupervised learning, the model is trained on unlabeled data. In supervised learning, the model learns from the labeled data to make predictions on new data, while in unsupervised learning, the model learns to find patterns or structure in the data.
d) Explain in brief logistic regression.
Logistic regression is a statistical algorithm used for binary classification problems. It is a linear model that predicts the probability of the input data belonging to one of the two classes. The output of logistic regression is a probability value between 0 and 1, which can be thresholded to make binary predictions. The algorithm is trained using a set of labeled data and aims to minimize the difference between the predicted probabilities and the true labels.
e) What is the Independent Component Analysis? Discuss.
Independent Component Analysis (ICA) is a technique used to separate a multivariate signal into its independent, non-Gaussian components. It is used for blind signal separation problems where the sources are unknown and the observed signals are mixtures of the sources. ICA works by finding a linear transformation of the observed signals that maximizes the independence of the transformed signals.
Comments
Post a Comment