Leave a comment

Log in to post a comment.

Get the GH Bookmarklet# Ask GH

I have the marketing and development background, but I'm failing at the analytics. I continually hear about top level growth hackers creating models and crunching big sets of data into answers for their hypothesis.

Does anyone have good (free if possible) resources/personal curriculum for someone to develop a fundamental understanding of data science and statistics as it relates to growth, to a level that he or she can apply it? I've seen a few nice Python and SQL riddles to solve, like, how to calculate your survivor rate if you were on the Titanic. Those are useful, but limited.

Or what are your favorite calculations to perform on a set of data once you get your hands on it? E.g. if I'm given a huge set of numbers for social shares/comments, how do you then process that into trends and takeaways?

Thank you everyone! Long time reader, first time poster - as the cliche goes!

SHARE

## Shana Carp

First learn some solid statistical stuff. By solid statistical stuff I mean first year undergrad solid statistical stuff. This is mostly because you need to look at a bunch of data and figure out what you are looking at.

Then I think the next step would be manually setting up an AA test and then an AB test. By manually, I mean the multistep process of

1) understanding the limits of the tools you are using

2) understanding the limits of the model you are testing - what actually can you control, what can't you.

3) understanding what is a hypothesis and how to properly set one up, as well as the philosophical problems of what you can and cannot say with a hypothesis.

4) understand what can go wrong.

5) Understand how to calculate that the math is showing you an answer, and what is illegal math.

The goal here is to understand what is a sterile testing environment is and understand what an experiment is. You need to treat your marketing test as if you are testing a drug for cancer. Fundamentally, to understand this, you have to model world to do this right, and then understand how to set up a world to prove yourself right or wrong, and then understand the limitations of what you proved. You should understand what errors are out there, and how to simplify down to get rid of errors (not ignore them, get rid of them). You should also understand the limits of your testing technologies and how to handle it (and not it, you).

Now we have a cookies to celebrate how far we've come along.

I guess from here the question is how do I think about and model the world. And there are lots of unknowns. It takes lots of practice and putting things to paper, and questioning fundamental assumptions (even unsaid ones).

For specifically marketing - I'd probably look up a lot of pre 1995 marketing papers and books, particularly those that involve media buying and econometrics of marketing. There is a lot of re-inventing the wheel in digital land.

About data science in particular - start talking to phd candidates - if you want to understand marketing it may not be necessary for you to understand the mechanics of how to simplify down different types of networks so that you can compute the probability of something b spreading among different nodes in the network - but that this is a workable problem.

## Shana Carp

actually, I lied, it may not be possible right now to figure out how to transmit information between human nodes. We're working on it. we have partial solutions.

## Josh Davis

Awesome answer @shanac! Thank you so much. I've started taking free online classes on statistics (it's been awhile) and data analysis. The "data fear" comes from being given a 5,000 row excel sheet with a ton of data and told to find information from it.

If I interpreted part of your answer correctly, you're saying come up with a model first then collect data. Which would help my above dilemma far more.

Thank you!

## Shana Carp

If your data does not conform to your model of the world (which is effectively what hypothesis testing is) then your model is wrong.

Learning what kinds of models are possible, tests are possible, and then data that can come out of them (basically patterns and groupings than can make a different post) is basically how this works

But first you need a model of the world in your head.

## Josh Davis

@shanac So when someone mentions they are developing a "model", are they saying they are creating a hypothesis (let's say that blogs with the word Hack in the title and a CTA with a button color of green) then pulling data to find if the hypothesis is correct (so I'd pull all my blog articles and do a comparison of blogs without the word Hack in the tittle, with the word Hack but the color of green, with the word Hack and the color of yellow)?

I was under the impression that when analysts create models, they're set algorithms used to find information from data. And that there's a bunch of these models most people, especially in marketing, use. But I guess that's just another way of saying your model = hypothesis statement.

I really appreciate the lesson! Sorry for my rookie questions.

## Shana Carp

we need to understand what a hypothesis is. This gets at what the issue of testability and falsifiability (http://en.wikipedia.org/wiki/Falsifiability ). A Hypothesis need to conform to these principles. They don't get because statements unless the because parts also conform to the same two things (testability and falsifiability)

In order to get at the testable part - we need to have a functional way of saying "this is how we view the world" for the sake of seeing if this is how things work.

So a good hypothesis: Purple titles on my blog will cause people to read for longer periods of time than red titles.

We can then set up a structure of the site my blog, have a way of changing the color of the titles, find of way of doing the test, measure time of people reading, see if they are actually reading, ect ect, make sure they are not doing something else.

They we can see if the hypothesis is true - but only if we have a model of all the pieces that go into reading and my site

## Josh Davis

Fantastic clarification. Thank you so much!

## Shana Carp

also, you can say you can find some correlations from 5,000 rows of data, but you can't really draw any conclusions of it.

## Morgan Brown

I'm working on a 50-70 page online guide/ebook called "The Data Science Survivor's Guide for Marketers" I'll be happy to give you early access as soon as it's ready.

Would love some critical review of it before I publish it, too.

## Shana Carp

how about this. I'll read and then pass it onto one of my best friends (ABD, PHD/U Maryland in CompSci in Machine Learning - which surprisingly has a super competitive program in Machine Learning. ) She may not be able to give it a read (ABD people are insanely busy because phd programs are dumb in a specific way)

A lot of the stuff written by engineers on blogs about Data Science is extremely different than the stuff coming out in math papers, statistics papers, Comp Sci papers, and physics papers.

Also you should ask @grayj - he dropped out of a phd program, and specializes in a lot of the math behind image processing which shares a lot of the math behind data science, if I remember correctly.

## Morgan Brown

Hey @shanac thanks for the offer. I'll definitely send it to you when it's ready for your review. I reached out to @grayj at the outset and he provided really solid directional advice to get the project rolling.

Thanks!

## Josh Davis

@morgan I'd love to read over it. I think I can help from being your ideal target audience as well as the fact I've edited plenty of articles, white papers, ebooks, etc. over the years so I'm always happy to offer an editor's eye.

Let me know when you want eyes and they're yours! Thanks for the heads up, and I'm glad you're doing this. I think it's very needed.

## Dylan La Com

Also would love to hear the communities thoughts on this. @shanac would probably have some good advice :)

## Shana Carp

whatever I'll say will be radical

## Shana Carp

Ooh, lastly - you should know what NP Hard and NP Complete are, since a number of marketing problems are NP Hard and some are NP Complete. There are definitely limits in what is out there.

## Josh Davis

@Shanac I'm Googling those terms right now!

## Shana Carp

Basically it means that you can model them, but it isn't clear that you could write a program that would solve the answer, but you'll probably live longer than the time it would take to get the answer

## Shana Carp

the other comment is wrong - its the reverse - it would take billions of years to resolve an NP Complete problem by brute force. Some marketing problems are model-able, but the model you'll get and then the tests you'll have to run are in the NP hard /np complete field of problems (i'm tired?)

## Josh Davis

I see. I feel my path toward understanding data science/analysis is getting longer, wider, and a bit scarier. But I'll trudge on!

Thanks for the enlightment @shanac!

## Shana Carp

I feel that way every day. A lot of people confuse data science with business analysis.

Having a good handle of basic business analysis is also good thing though.

## Josh Davis

I'll work on both! Thank you again for your help.