1. What is Data Science?
Data Science is a blend of Statistics, technical skills and business vision which is used to analyze the available data and predict the future trend
2. How is it different from Big Data and Data Analytics?
| Big Data | Data Science | Data Analytics |
| Huge volumes of data-structured, unstructured and semi-structured | Deals with slicing and dicing the data | Contributing operational insights into complex business scenarios |
| Requires a basic knowledge of statistics and mathematics | Requires in-depth knowledge of statistics and mathematics | Requires moderate amount of statistics and mathematics |
3. Differentiate between Data Science , Machine Learning and AI.
| Data Science is not exactly a subset of machine learning but it uses machine learning to analyse and make future predictions. A subset of AI that focuses on narrow range of activities. A wide term that focuses on applications ranging from Robotics to Text Analysis. | A subset of AI that focuses on narrow range of activities. | A wide term that focuses on applications ranging from Robotics to Text Analysis. |
- What is logistic regression? Or State an example when you have used logistic regression recently.
Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.
5. Compare R and Python programming?
R: The best part about R is that it is an Open Source tool and hence used generously by academia and the research community. It is a robust tool for statistical computation, graphical representation and reporting. Due to its open source nature it is always being updated with the latest features and then readily available to everybody.
Python: Python is a powerful open source programming language that is easy to learn, works well with most other tools and technologies. The best part about Python is that it has innumerable libraries and community created modules making it very robust. It has functions for statistical operation, model building and more.
R and Python are two of the most important programming languages for Machine Learning Algorithms.
6. What is Linear Regression?
Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.
7.What is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.
8.What is power analysis?
An experimental design technique for determining the effect of a given sample size.
9.What is K-means? How can you select K for K-means?
k–means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
10.What is Collaborative filtering?
The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents.