Sunday, August 31, 2014

Statistical Modeling vs Machine Learning

I have often used the terms Statistical modeling techniques and Machine learning techniques interchangeably but was not sure about the similarities and differences. So I went through few resources and sharing my findings here.


Lets start with basic definition,

A statistical model is a formalization of relationships between variables in the form of mathematical equations.

Machine learning is a subfield of computer science and artificial intelligence which deals with building systems that can learn from data, instead of explicitly programmed instructions.


Lets explore what books and courses say in their first chapter/lecture about both fields.

From book “An introduction to statistical learning” by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani



A relation between response variable and predictor(s) can be written as,

Y = f(X) + e

Where,
f() : function of X
X : An input vector with X1, X1…Xn.
Y : Output
e is random error

Statistical learning refers to approaches in estimating the f().

From notes of Caltech course “Learning from data” by Yaser S. Abu-Mostafa


Machine learning requires,

Input (X)
Output(Y)
Target function f : X -> Y
Data (x.1, y.1), (x.2, y.2), (x.3, y.3) … (x.n, y.n)
Hypothesis g : X -> Y


Notes from Andrew Ng’s Machine learning class at Stanford also talks about same basic concepts.



So we can say, both fields deal with data trying to find some function which takes (data as) input producing the desired output.


Lets see what other people think about both fields

Larry Wasserman a statistician and professor at CMU thinks there is no difference. "They are both concerned with the same question: how do we learn from data?” In his blog post he states how same concepts have different names in both fields,
  • Estimation~Learning
  • Classifier~Hypothesis
  • Data point~Example/Instance
  • Regression~Supervised Learning
  • Classification~Supervised Learning
  • Covariate~Feature
  • Response~Label

However he also talks about few points suggesting the difference, like 
  • Machine learning is comparatively new filed, evolved in computer age. However statistical data analysis practices existed long before computers were invented.
  • Statistics emphasizes on statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems and Machine Learning emphasizes high dimensional prediction problems.


Robert Tibshiriani, a statistician and machine learning expert at Stanford says machine learning is glamorous version of statistics in his class notes.



Brendan O’Connor, Assistant Professor at University of Massachusetts Amherst, also wrote on similar lines in a blog post back in 2008. He says "Statistics and machine learning aren’t very different fields." He added an update to his post saying "Statistics, not machine learning, is the real deal, but unfortunately suffers from bad marketing.” He explains the difference by mentioning about few techniques which exists only in one of the two subfields

"There are definitely a number of topics in ML that aren’t very related to statistics or probability. Max-margin methods: if all we care about is prediction, why bother using a probability model at all? Why not just optimize the spatial geometry instead? SVM’s don’t require a lick of probability theory to understand. (Of course probability-based approaches are huge in ML, but it’s important to remember they’re not the only game in town, and there is no necessary reason they must be.) And then there are non-traditional settings such as online learning, reinforcement learning, and active learning, where the structure of access to information is in play. There are certainly plenty of things in statistics that aren’t considered part of ML — say, regression diagnostics and significance testing.
"

Andrew Gelman a statistician and professor at Columbia University replied to this on his blog with following points
  • Its better to have two fields trying to solve similar problems.
  • He does reiterate the point statistics generally deal with low dimensional data as compared to machine learning
  • Machine learning has done great progress on hard problems.

Interesting discussion in comments section of this post




Summary of findings:
  • Both fields are trying to solve similar problems. Unfortunately statistics suffers from bad marketing.
  • Statistics is much older field which evolved from mathematics and Machine learning is pretty new which evolved from Computer Science/ Artificial Intelligence.
  • Though there is a huge overlap between two fields, both fields have few unique techniques.
  • Machine learning use computers extensively, which helps in solving many complex problems.
  • Statistics generally deals with low dimensional data where Machine learning is generally associated with high dimensional data.

Follow discussion on Reddit.

3 comments:

  1. We have the oppertunities if you have killing looks, sharp brains and dedication to be a supermodel in India .Free registration modeling agency in delhi

    ReplyDelete
  2. Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

    Embedded System Training in chennai
    Embedded System Training Institute in chennai
    Embedded Training in chennai
    Embedded Course in chennai
    Embedded Systems Course in chennai
    Best Embedded System Training Institute in chennai
    Best Embedded System Training Institutes in chennai
    Embedded Training Institute in chennai
    Embedded System Course in chennai
    Best Embedded System Training in chennai
    VLSI Training in chennai

    ReplyDelete
  3. Wiztech Automation Solutions is the Best Training institute in Chennai,started in the year 2006 and it extended its circle through providing the best Education as per the Global Quality Standards. Hence our Training Center in Chennai was Recognized by IAO and ISO for its inspiring Education Quality Standards. Wiztech Automation Solution, the PLC SCADA Training Academy in Chennai offers both PLC, SCADA, DCS, VFD, Drives, Control Panels, HMI, Pneumatics, Embedded systems, VLSI, IT, Web Designing, AutoCad Training courses in chennai with latest various brands. Wiztech Automation Solutions offers Real Time Training Courses with 100% Placement support in chennai.

    PLC Training in chennai
    SCADA Training in chennai
    PLC Training Institute in chennai
    Embedded System Training in chennai
    VLSI Training in chennai
    Automation Training in chennai
    Industrial Automation Training in chennai
    Process Automation Training in chennai
    DCS Training in chennai
    Inplant Training in chennai
    Placement
    PLC Course in chennai
    Best PLC Training in chennai
    PLC Training in chennai
    Robotics Training in chennai
    Embedded Training in chennai
    IT Training in chennai
    Web designing Training in chennai
    AutoCad Training in chennai

    ReplyDelete