101 data science interview questions

1. What is Data Science?

           Data Science is a blend of Statistics, technical skills and business vision which is used to analyze the available data and predict the future trend

2. How is it different from Big Data and Data Analytics?

Big DataData ScienceData Analytics
Huge volumes of data-structured, unstructured and semi-structuredDeals with slicing and dicing the dataContributing operational insights into complex business scenarios
Requires a basic knowledge of statistics and mathematicsRequires in-depth knowledge of statistics and mathematicsRequires moderate amount of statistics and mathematics

3. Differentiate between Data Science , Machine Learning and AI.

Data Science is not exactly a subset of machine learning but it uses machine learning to analyse and make future predictions. A subset of AI that focuses on narrow range of activities. A wide term that focuses on applications ranging from Robotics to Text Analysis.A subset of AI that focuses on narrow range of activities.A wide term that focuses on applications ranging from Robotics to Text Analysis.
  1.   What is logistic regression? Or State an example when you have used logistic regression recently.

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.

5. Compare R and Python programming?

R: The best part about R is that it is an Open Source tool and hence used generously by academia and the research community. It is a robust tool for statistical computation, graphical representation and reporting. Due to its open source nature it is always being updated with the latest features and then readily available to everybody.

Python: Python is a powerful open source programming language that is easy to learn, works well with most other tools and technologies. The best part about Python is that it has innumerable libraries and community created modules making it very robust. It has functions for statistical operation, model building and more.

R and Python are two of the most important programming languages for Machine Learning Algorithms.

6. What is Linear Regression?

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.

7.What is Interpolation and Extrapolation?

Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.

8.What is power analysis?

An experimental design technique for determining the effect of a given sample size.

9.What is K-means? How can you select K for K-means?

kmeans clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

10.What is Collaborative filtering?

The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents.

101 Best Data Science Interview Questions 2018

Leave a comment

Design a site like this with WordPress.com
Get started