I have often used the terms Statistical modeling techniques and Machine learning techniques interchangeably but was not sure about the similarities and differences. So I went through few resources and sharing my findings here.

Lets start with basic definition,

Lets explore what books and courses say in their first chapter/lecture about both fields.

A relation between response variable and predictor(s) can be written as,

Y = f(X) + e

Where,

f() : function of X

X : An input vector with X1, X1…Xn.

Y : Output

e is random error

Statistical learning refers to approaches in estimating the f().

Machine learning requires,

Input (X)

Output(Y)

Target function f : X -> Y

Data (x.1, y.1), (x.2, y.2), (x.3, y.3) … (x.n, y.n)

Hypothesis g : X -> Y

So we can say, both fields deal with

Lets see what other people think about both fields

Larry Wasserman a statistician and professor at CMU thinks there is no difference. "

However he also talks about few points suggesting the difference, like

Robert Tibshiriani, a statistician and machine learning expert at Stanford says machine learning is glamorous version of statistics in his class notes.

Follow discussion on Reddit.

Lets start with basic definition,

*A statistical model is a formalization of relationships between variables in the form of mathematical equations.*

*Machine learning is a subfield of computer science and artificial intelligence which deals with building systems that can learn from data, instead of explicitly programmed instructions.*

Lets explore what books and courses say in their first chapter/lecture about both fields.

**From book “An introduction to statistical learning” by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani**A relation between response variable and predictor(s) can be written as,

Y = f(X) + e

Where,

f() : function of X

X : An input vector with X1, X1…Xn.

Y : Output

e is random error

Statistical learning refers to approaches in estimating the f().

From notes of Caltech course “Learning from data” by Yaser S. Abu-MostafaFrom notes of Caltech course “Learning from data” by Yaser S. Abu-Mostafa

Machine learning requires,

Input (X)

Output(Y)

Target function f : X -> Y

Data (x.1, y.1), (x.2, y.2), (x.3, y.3) … (x.n, y.n)

Hypothesis g : X -> Y

**Notes from Andrew Ng’s Machine learning class at Stanford also talks about same basic concepts.**So we can say, both fields deal with

**data**trying to find some**function**which takes (data as)**input**producing the desired**output**.Lets see what other people think about both fields

Larry Wasserman a statistician and professor at CMU thinks there is no difference. "

*They are both concerned with the same question: how do we learn from data*?” In his blog post he states how same concepts have different names in both fields,- Estimation~Learning
- Classifier~Hypothesis
- Data point~Example/Instance
- Regression~Supervised Learning
- Classification~Supervised Learning
- Covariate~Feature
- Response~Label

However he also talks about few points suggesting the difference, like

*Machine learning is comparatively new filed, evolved in computer age. However statistical data analysis practices existed long before computers were invented.*-
*Statistics emphasizes on statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems and Machine Learning emphasizes high dimensional prediction problems.*

Robert Tibshiriani, a statistician and machine learning expert at Stanford says machine learning is glamorous version of statistics in his class notes.

Brendan O’Connor, Assistant Professor at University of Massachusetts Amherst, also wrote on similar lines in a blog post back in 2008. He says "

Andrew Gelman a statistician and professor at Columbia University replied to this on his blog with following points

*Statistics and machine learning aren’t very different fields."*He added an update to his post saying*"Statistics, not machine learning, is the real deal, but unfortunately suffers from bad marketing.*” He explains the difference by mentioning about few techniques which exists only in one of the two subfields*"*

"There are definitely a number of topics in ML that aren’t very related to statistics or probability. Max-margin methods: if all we care about is prediction, why bother using a probability model at all? Why not just optimize the spatial geometry instead? SVM’s don’t require a lick of probability theory to understand. (Of course probability-based approaches are huge in ML, but it’s important to remember they’re not the only game in town, and there is no necessary reason they must be.) And then there are non-traditional settings such as online learning, reinforcement learning, and active learning, where the structure of access to information is in play. There are certainly plenty of things in statistics that aren’t considered part of ML — say, regression diagnostics and significance testing."There are definitely a number of topics in ML that aren’t very related to statistics or probability. Max-margin methods: if all we care about is prediction, why bother using a probability model at all? Why not just optimize the spatial geometry instead? SVM’s don’t require a lick of probability theory to understand. (Of course probability-based approaches are huge in ML, but it’s important to remember they’re not the only game in town, and there is no necessary reason they must be.) And then there are non-traditional settings such as online learning, reinforcement learning, and active learning, where the structure of access to information is in play. There are certainly plenty of things in statistics that aren’t considered part of ML — say, regression diagnostics and significance testing.

Andrew Gelman a statistician and professor at Columbia University replied to this on his blog with following points

- Its better to have two fields trying to solve similar problems.
- He does reiterate the point statistics generally deal with low dimensional data as compared to machine learning
- Machine learning has done great progress on hard problems.

- Both fields are trying to solve similar problems. Unfortunately statistics suffers from bad marketing.
- Statistics is much older field which evolved from mathematics and Machine learning is pretty new which evolved from Computer Science/ Artificial Intelligence.
- Though there is a huge overlap between two fields, both fields have few unique techniques.
- Machine learning use computers extensively, which helps in solving many complex problems.
- Statistics generally deals with low dimensional data where Machine learning is generally associated with high dimensional data.

Follow discussion on Reddit.

We have the oppertunities if you have killing looks, sharp brains and dedication to be a supermodel in India .Free registration modeling agency in delhi

ReplyDeleteWelcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

ReplyDelete✔ Embedded System Training in chennai

✔ Embedded System Training Institute in chennai

✔ Embedded Training in chennai

✔ Embedded Course in chennai

✔ Embedded Systems Course in chennai

✔ Best Embedded System Training Institute in chennai

✔ Best Embedded System Training Institutes in chennai

✔ Embedded Training Institute in chennai

✔ Embedded System Course in chennai

✔ Best Embedded System Training in chennai

✔ VLSI Training in chennai

Wiztech Automation Solutions is the Best Training institute in Chennai,started in the year 2006 and it extended its circle through providing the best Education as per the Global Quality Standards. Hence our Training Center in Chennai was Recognized by IAO and ISO for its inspiring Education Quality Standards. Wiztech Automation Solution, the PLC SCADA Training Academy in Chennai offers both PLC, SCADA, DCS, VFD, Drives, Control Panels, HMI, Pneumatics, Embedded systems, VLSI, IT, Web Designing, AutoCad Training courses in chennai with latest various brands. Wiztech Automation Solutions offers Real Time Training Courses with 100% Placement support in chennai.

ReplyDelete✔ PLC Training in chennai

✔ SCADA Training in chennai

✔ PLC Training Institute in chennai

✔ Embedded System Training in chennai

✔ VLSI Training in chennai

✔ Automation Training in chennai

✔ Industrial Automation Training in chennai

✔ Process Automation Training in chennai

✔ DCS Training in chennai

✔ Inplant Training in chennai

✔ Placement

✔ PLC Course in chennai

✔ Best PLC Training in chennai

✔ PLC Training in chennai

✔ Robotics Training in chennai

✔ Embedded Training in chennai

✔ IT Training in chennai

✔ Web designing Training in chennai

✔ AutoCad Training in chennai