Leave a comment
Get the GH Bookmarklet

AMAs

Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry.

He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. 

Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for e-commerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views.

Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Flipkart. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.

You can follow/connect with him: Twitter: @dtunkelang, LinkedIn: linkedin.com/in/dtunkelang

He will be live on Mar 1 at 930 AM PT for one and a half hours during which he will answer as many questions as possible.

  • SS

    Sky Stack

    over 4 years ago #

    Hey Daniel, thanks for taking the time.
    What are some organizational KPI's that you believe are worth collecting, but most data scientists would disagree?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I don't think there's any consensus among data scientists about what KPIs are worth collecting, so it's hard to take a contrarian position.

      But one thing I've frequently noticed is what's known as the "streetlight effect": people -- including data scientists -- measure the quantities that are easiest to measure, rather than focusing on the quantities that are most important to measure. It's like the drunkard looking for his keys under the streetlight because that's where the light is.

      In particular, I've seen data scientists focus too much on business metrics and not enough on metrics that represent value creation for users. For example, they measure how often people engage with content, but they don't try to find ways to measure the value people obtain from consuming that content.

      Measuring value creation is much harder than measuring engagement. But it's ultimately the reason for your product or service to exist, and ultimately you can only monetize the value you create.

      So, insofar as I can give general advice, avoid the streetlight effect and focus on metrics that represent value creation.

      • AA

        Anuj Adhiya

        over 4 years ago #

        THIS! And wow - what a way to kick off the AMA.
        If this was the only Q&A in this AMA I'd have walked away with enough to chew on for the rest of the week.

      • ES

        Edward Stephens

        over 4 years ago #

        Amen this is an awesome answer - loved the drunk and streetlight example.

  • YS

    yassin shaar

    over 4 years ago #

    Daniel, thank you for doing this and looking forward to your responses.

    What "data skills" should product managers & growth leads develop in order to move faster in their product development & testing?

    thank you

    • DT

      Daniel Tunkelang

      over 4 years ago #

      "Data skills" is a broad term -- it could mean anything from basic mathematics to fluency in particular languages or platforms.

      That said, I believe the most important data skill for anyone in the space is to be able to properly frame all projects as experiments and apply the scientific method to them.

      That means that you treat each project as testing a hypothesis. And I mean that in a formal sense: you need to model the hypothesis in a way that is quantifiable and falsifiable, or there's no way to objectively measure success.

      The biggest mistake see product managers make -- and repeat -- is to evaluate experiments after the fact, rather than committing to goals beforehand. It's easy to win when you can move the goal posts, and it's all too human to do so.

      A less critical but related skill is using historical data for opportunity analysis to prioritize projects. Analysis of historical data -- which you can do using any number of off-the-shelf tools -- is great for generating hypotheses as well as estimating impact. Just remember that generating a hypothesis is very different from testing one. Analysis of historical data is never a substitute for experimentation.

  • AS

    Alex Sherstinsky

    over 4 years ago #

    Hi again, Daniel! One more question from me (you can imagine why I may want to know!). By definition, the Growth Team has to execute high-tempo testing -- to the tune of at least three -- five tests per week. What are some of the reliable techniques you are aware of that allows the Growth Engineering Team to ship experiments fast, which often require small and medium sized product changes, without negatively impacting the product's stability and without interfering with the broader Product Development Team's release plans?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      There's no free lunch -- changes take time, and testing takes time, as well as the risk of negative impact on your users.

      But the most important investment you can make as a company is deploying a platform to makes experiments as cheap to deploy as possible. Doing so reduces the cost of experimentation to the bare bones: developing experiments and running them. If you haven't done so already, then build, buy, or borrow one.

      With that in place, keep your experiments small and incremental. An experiment should test exactly one hypothesis, and it should be the minimal experiment to test that hypothesis.

      One way to keep experiments small is to drop the requirement that they be production-ready. An experiment just has to be good enough to test your hypothesis and not cause serious damage (such as taking down your site).

      Finally, use your experimentation bandwidth wisely. While you can't rely on offline analysis to tell you which experiments will succeed, it's often a cheap way to find out which experiments will fail.

      But don't go overboard with analysis. At some point, experimentation is a cheaper than analysis, so just run the experiment. It's hard to know which frogs will be princes, so usually it's best to kiss as many frogs as you can, as quickly as possible.

      7 Share
      • DT

        Daniel Tunkelang

        over 4 years ago #

        To answer Anuj's question about whether you're going overboard with analysis: ask yourself how much time and effort it would take to obtain equivalent information -- that is, enough information to make a decision -- by running an experiment. If you're spending more than that on analysis, then you're probably doing it wrong.

        4 Share
      • AA

        Anuj Adhiya

        over 4 years ago #

        How do you know whether you're going overboard with analysis?

    • HQ

      Hila Qu

      over 4 years ago #

      Love this question! Look forward to Daniel's answer

  • SJ

    Sebastian Johansson

    over 4 years ago #

    I have no question. Just wanna say that this was one of the best AMAs Ive ever read..

  • AH

    Agnes Haryuni

    over 4 years ago #

    If you have to choose, what is the top 3 language/tools that someone should learn to become a data scientist? Php? R language? nosql?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      Everyone has their own opinions about languages and tools, and I'm not particularly religious about either.

      That said, I think data scientists should learn Python, which includes the standard packages like NumPy, SciPy, scikit-learn, etc. I'm not saying that Python is better than R -- you can make a case for either, and ideally you should be comfortable with both. But I favor Python because of its flexibility: it's a great general-use programming language that also happens to have a solid toolset for data science.

      And you should certainly feel comfortable with SQL, whether you end up using PostgreSQL, MySQL, or a non-SQL database like Redis or MongoDB. The more time you spend with any particular tool, the more you'll learn to make it sing avoid its pitfalls. Nonetheless, I believe that most of the intuitions and best practices are transferrable across databases, and SQL is the closest thing to a universal language for querying data.

      2 Share
  • AS

    Alex Sherstinsky

    over 4 years ago #

    Hi Daniel! Thanks for doing an AMA with us. After our last 2-hour+ coffee, I feel like asking you a million more questions, but let's just start with one. :) In your experience, how has the composition and size of the growth team at companies you worked in evolved as the growth objectives were met and the given company grew?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      The main difference I've seen as a company grows is the shift in emphasis from ad hoc methods to more systematic ones.

      When you're just starting, you don't have much data and you can't afford to have a team of rock-star data scientists. So you need to rely heavily on your intuition and experience to come up with ideas that will drive growth. You still test them -- intuition is for generating hypotheses, not validating them! -- but you don't make meaningful investments in analysis of historical data to generate ideas.

      Later on, you have historical data, and hopefully you can afford to hire those rock star data scientists. At that point, you rely heavily on analysis of historical data, and you systematically use machine learning and other statistical techniques to drive your growth strategy.

      What does that mean in terms of the the composition and size of the growth team? In the early days, you need people product sense, intuition, and the courage to take intelligent risks. Those people don't need to have much mathematical sophistication — they just have to be right often enough that you make progress. But later on, as growth becomes more data-driven, you want data scientists and machine learning experts who can run a rigorous process to achieve steady, incremental growth.

      Also note: the early days may be more fun as you start from scratch, but the biggest gains often come later. Never underestimate the power of incremental: slow and steady wins the race.

      3 Share
  • AS

    Alex Sherstinsky

    over 4 years ago #

    Hi Daniel, my next question concerns with tools and technologies. What tools and technologies have the growth teams you worked with use to automate and streamline their work? Which products did they use off the shelf and what tools and technologies had to be developed internally? Has there been a dedicated product development team focused solely on servicing the growth team?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I've never worked with growth teams directly, so my experience here is a bit second-hand. But if you look at LinkedIn, you can see that the company put an enormous and highly successful emphasis on growth.

      LinkedIn has had a dedicated growth team -- with engineering, data science, and product leadership -- for as long as I can remember, and that team is perhaps the highest-profile and most successful team at the company.

      As for tools, they mostly use internally built tools to analyze data and perform experiments. I believe this is the norm for large technology companies that can afford to build their own platforms. Smaller companies have to rely on off-the-shelf tools (Google Analytics, Mixpanel, Optimizely, etc), if only because it's too expensive to build their own.

      I believe that the value of customized tools is secondary. The incremental gains are worth it when you're a big company, just because the multiplier is so large. But I don't think smaller companies are held back by a lack of tools. Between open-source and SaaS offerings, everyone today can afford to analyze data and run experiments.

  • GK

    Gabe Kwakyi

    over 4 years ago #

    Thank you for joining Daniel!

    What do you think the next big frontier for identifying and monetizing customer intent is, beyond written/typed language?

    Context (based on my emails, calendar, traffic trends, searches, etc. I need an uber to the airport on Tuesday at 7am)
    Voice (OK google I talk about traveling to aruba and probably need an airbnb)
    Location (work is over it's friday I'm headed home but there's a cool art exhibit on the way)
    Neural (I'm feeling blue so chocolate and kittens would be nice)

    • DT

      Daniel Tunkelang

      over 4 years ago #

      Location is a no-brainer — we're already seeing location-based alerting and advertising. And time of day is a gimme. If you're just thinking beyond text, then location and time are the best and most easily obtained contextual information to passively obtain intent signal.

      Other forms of context are interesting, but most of them far more invasive. You can certainly extract useful intent signals from someone's email, calendar, credit card statements, etc, but it's hard to do so without creeping people out. As someone once said, you don't want to cross the creepy line!

      Voice interfaces are clearly gaining traction, but I have to confess some ambivalence about them as a general-purpose interface. Voice is great in your care or your living room, but not so great in a noisy or public environment.

      And neural interfaces? I know there's work happening in university labs, but I think it's a little bit early. And more than a little bit creepy!

  • MM

    martín medina

    over 4 years ago #

    Daniel,

    I have been reading a lot lately use of data in health sciences and how that is revolutionizing health and health care. As a data scientist who has worked with health data you have a unique insight into this industry and I imagine tons of great stories. I was wondering what are some interesting trends you have noticed and how can each of us personally use data to help improve our health?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I did spend several months working in health care, but my focus was largely on more mundane problems like compliance, security, and operations. Almost all of my work on data science has been in other domains.

      Nonetheless, my brief experience in health care has shown me how much harder it is to work with data there than in other domains. The organizations that hold the most valuable data aren't eager to share it (to put it mildly), and they have strong legal and business incentives not to do so.

      That said, I'm excited by the various attempts to bring data to health care, and I believe it's just a matter of time until we move to a far more data-driven approach to ensuring and improving our health. You can see it from sites where people volunteer their own data (e.g., PatientsLikeMe, 23andme) all the way up to the Obama Administration's Precision Medicine Initiative, in which he's enlisted US Chief Data Scientist (and my former boss) DJ Patil.

      I also think it's great that so many people are using their phones and wearables to collect data about themselves, and I'm optimistic about efforts like ResearchKit to connect researchers to that data.

      But it's early days, and there's a lot more data collection than actionable insights from that data. But we're on the right track, so we'll just have to be patient to see the rewards of collecting that data.

  • HQ

    Hila Qu

    over 4 years ago #

    Hi Daniel,

    At GrowthHackers Conference 2 weeks ago, @JamesCurrier talks about different ways to achieve defensibility .

    Do you think data becomes another way for companies to build defensibility?

    From the companies you have advised/worked with, are there a mix of companies which built data strategy into their core product, and companies which figured it out later and added data into the core business? What are some lessons for new companies to take from there?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I believe that data has always been a way for companies to build defensibility -- in fact, it's been one of the best ways to do so. A head start on data collection creates a moat that is incredibly difficult to overcome.

      Look at Google, Amazon, Facebook, and LinkedIn as examples. Those companies have done a lot things right, but their greatest defensibility comes from the data assets they've painstakingly built. And that's because they've built data strategy into their core products from day one. Moreover, they continue to do so — many of these companies' strategic acquisitions and product decisions are driven by data strategy.

      I'm hard-pressed to think of any companies that figured it out later and added data into their core business as an afterthought. That's not to say that it's impossible or hasn't been done, but my inability to think of any suggests to me that it isn't the norm.

      Consider that a data strategy requires investments in infrastructure, instrumentation, specialized talent, etc. It's not a decision a company can make overnight, and some lost opportunities can never be restored.

      My recommendation is that, even if you don't think that data science is your top priority, at least hedge your bets by collecting data from day one. Preserve your optionality. Your talent can be poached, and your product can be copied. But the data assets you create are your most defensible assets.

      5 Share
  • SE

    Sean Ellis

    over 4 years ago #

    Hi Daniel, thanks for doing an AMA with us. What's the most unreasonable request you've ever had for a data team and how did you respond?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      Unreasonable requests seem to all have the same form: someone wants insights that for which we simply don't have the right data to derive those insights. And there's only one way to respond to such requests: this is the data we'll need, and here's what it will cost in time and money to obtain it. Conversations like these are painful, but eventually they cure people of the illusion that data science is magic.

      More specifically, I've found that most people don't understand the mechanics and limitations of crowdsourcing. They propose hard data collections problems and then expect to be able to farm them out to a global, low-skilled labor force for pennies a task. Again, a little education — and sometimes a bit of painful experience — goes a long way to cure people's magical thinking.

      4 Share
  • KB

    Kevin Bull

    over 4 years ago #

    Hi Daniel,

    Thank you for your time.

    Are there any legal issues that can arise with web scraping a third party website and then using the publicly available information to distribute your own email subscribers?

    Thank you,
    Kevin

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I'm not a lawyer, but even I know enough to tell you there are legal issues around scraping other people's content. For a recent example, see how Craigslist reacted to PadMapper scraping its real estate listings. There are several other examples of high-profile scraping cases, and those are only the ones prominent enough to gain public attention. Building a business on top of other people's proprietary data is a dangerous game.

      Data science doesn't give you a free pass to ignore copyright law, the Computer Fraud and Abuse Act, etc. If you think what you're doing might be illegal, then talk to a lawyer about it. Because this isn't the way you want your company to get a Wikipedia entry.

      Legal issues aside, companies that depend on their data assets for defensibility often invest in technical defenses to protect those assets. Which means that scrapers have to fight a two-front war against engineers and lawyers. Surely there's an easier way to make an honest living!

  • AG

    Alberto Grande

    over 4 years ago #

    Hey Daniel, thanks for doing this AMA!

    Have you worked with data science and faceted search to grow the sourcing and hiring process of a company?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      Not exactly sure what you have in mind, but I highly recommend using LinkedIn's faceted search for sourcing and hiring. In fact, when I worked at Google (before going to LinkedIn), I taught Google's sourcers a class that emphasized the use of faceted search on LinkedIn. I hope it helped them! Sourcing is an exploratory search process — precisely the class of information-seeking problem that calls for faceted search.

      As for using data science for hiring, I think it's a great idea, to the extent you have enough data from which to draw signal. And you can start small For example, identify the most common reasons you reject candidates, and then make the process more efficient by front-loading those parts of the interview process and short-circuiting.

      Sourcing and hiring is a funnel, and all funnels can be optimized. And if you are hiring at scale, you should have lots of opportunities to optimize that funnel -- not only for effectiveness and efficiency, but also for fairness.

  • ES

    Edward Stephens

    over 4 years ago #

    Hi Daniel,

    Thank you so much for doing the AMA! Really exciting to have you on....

    A couple of questions from me:

    1) What does the future of search look like across mobile and web. How do you see players such as What3Words changing how we leverage voice search (for geo-location they've cut the world into 57 trillion 3x3m squares all with a unique 3 word nomenclature).

    2) Human computer interaction, what symbioses are you banking on over the next 5, 10, 15 years and do we get to general AI in the next 50 years?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I don't know anything about What3Words, so I have no idea if it will catch on as an alternative to more conventional ways of specifying location. It's an interesting approach.

      As for HCI innovations, I'm certainly curious what will happen when virtual reality (VR) and augmented reality (AR) go mainstream — and that doesn't seem that far off anymore. Those modalities enable much richer bidirectional communication between human and machine than anything we have today, and I expect the results to be revolutionary.

      And as for general AI, I'm a skeptic. I'm all for being inspired by science fiction, but in this case I think people have let their imaginations run wild and get ahead of their critical thinking. I'm excited that we're using machine learning to tackle increasingly sophisticated problems, but I don't think we're even close to general AI. Feel free to use this against me in 50 years. :-)

  • AK

    Andrea Kopitz

    over 4 years ago #

    Hello Daniel,

    I'm curious... do you have something you believe that no one else does?

    Given your history, I'm curious if/how your work as a data scientist has influenced your interests/beliefs.

    • DT

      Daniel Tunkelang

      over 4 years ago #

      I don't know that any of my beliefs are unique, but working in data science has reinforced my strong belief in empiricism.

      With that said, data can never tell you what is morally right. You have to decide on the destination. But data can tell you whether what you're doing can get you there.

  • AL

    Arsene Lavaux

    over 4 years ago #

    Bonjour Daniel,

    Thanks for doing this AMA.

    Is there still art in the data science powering up growth?
    If so, which form(s)?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      Yes. The main activities of data science are hypothesis testing and hypothesis generation.

      Hypothesis testing is the more scientific part: you run your experiments and rigorously analyze the results to determine which experiments are successful.

      Hypotheses generation is more of an art. You can and should apply scientific techniques to analyze historical data, but you also draw on your intuition and domain knowledge to come up with ideas.

      I'm oversimplifying the process, but I'd say there's a strong art to hypothesis generation in all data science contexts, and growth is no exception.

  • AA

    Anuj Adhiya

    over 4 years ago #

    Hey Daniel
    So excited to have you on!

    When I analyze and experiment, is there a way (easy way?) to tell that the test results are a reliable predictor of future performance (for any time frame)?
    (How) would I know that at the time of analysis and/or after the fact?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      As Niels Bohr said, prediction is very difficult, especially about the future.

      There is no way to be sure that past results predict future performance, unless you can be certainly that past data is representative of future data.

      With that said, sometimes you have principled reasons to believe that the quantifies you are measuring don't vary over time. In that case, relying on past data is fair game.

      But understand that you are taking a leap of faith. And trust but verify. You can always validate your assumptions after the fact by comparing distributions.

  • AN

    Anu Nigam

    over 4 years ago #

    Great AMA. You gave really good answers. Really appreciate it.

    An issue I struggled with in my startup was getting "Clean Data". It felt like we were running tests and then later realizing that the instrumentation was broken, so the test & analysis was a waste. Its hard to have a cadence when things are broken. Eventually, we had to rearchitect most of the code to move to a growth A/B test model.

    What % of time would you allocate of the engineering team building to constantly being sure the data is good? A follow on, are there tools/processes you'd recommend to maintaining "Clean Data"?

    • DT

      Daniel Tunkelang

      over 4 years ago #

      Thanks!

      As for data quality assurance, it's a requirement. You simply can't pursue a data-driven strategy if you don't believe in your data, and your belief in your data has to be grounded in some kind of quality assurance.

      How much time do you need to invest? Hopefully not much in the steady state. It's a lot like software testing: if you already have a good suite of unit tests and integration tests in place, along with a continuous test automation framework, then the incremental overhead isn't too bad. If you don't, then the first step is a doozy.

      Indeed, you need to treat all data processing the same way you treat production code and test it accordingly. Instrumentation needs to be tested, and instrumentation bugs need to be treated with the same severity as functionality bugs. Same goes for any code that build dashboards, reports, etc. That may sound simplistic, but I believe that if you adopt that attitude as an organization, the details will fall into place.

SHARE
58
58