Leave a comment
Get the GH Bookmarklet

Ask GH

Really curious as to what everyone is doing and any recommended methodologies or resources.

  • CS

    Chris Sanfilippo

    over 4 years ago #

    I do a lot of scraping for SEO, sometimes about 10-20 sites a day! My goal is to discover profitable keywords so I can start creating lots of content around those topics. Scraping is only good if you take action on what you learn :relaxed: Here are the details:

    1. Use SEMrush to identify top 5-10 competitors for target keywords
    2. Put competitors sites through Screaming Frog, 1 at a time.
    3. Export the scrape to csv
    4. Delete all columns except page title
    5. Use a formula to compare page titles to 2 other columns, 1 with a list of all US cities (for local SEO), and another column with all the major industry keywords (at least 1000) from SEMrush, Ubersuggest, and Google KWP.
    6. Then I sort by #, so I can identify not only what keywords/locations a site is targeting, but exactly how many pages they’ve built around those keywords
    7. Next step is to analyze those pages to see how they ranked, and then replicate. This is outside of scraping now so I’ll stop here.

    For small sites you can accomplish the same thing by searching Google for site:www.example.com and manually look through, or site:www.example.com intitle:”keywordhere” but you need to know the keyword.

    That’s the beauty of this scrape technique is that you not only get to discover all the keywords they’re targeting, but you can also quantify the # of pages optimized for each keyword, and then sort them.

    • MB

      Morgan Brown

      over 4 years ago #

      Knowledge bomb!

      Thanks for sharing Chris.

    • ND

      Nichole Elizabeth DeMeré

      over 4 years ago #

      Chris, I love this - a great way to show how an SEO expert thinks. :) I use ScreamingFrog for initial SEO audits but hadn't considered it for other purposes. Thanks for sharing!

      • CS

        Chris Sanfilippo

        over 4 years ago #

        Thank you both for the kind words :)

      • SC

        Shana Carp

        over 4 years ago #

        I use it for checking if certain tags exist - since technically it looks at an entire site, you can check a large number of pages for tags with it using custom variables

    • DL

      David Lin

      over 4 years ago #

      This is awesome! Thanks for sharing!

    • GA

      Gonzalo Armendariz

      over 4 years ago #

      Chris, you nail it pretty hard on the head here!

      I work for a startup in Miami, FL, DoYouRemember.com ( http://www.doyouremember.com ). We create original content focused on nostalgia primarily targeting baby boomers. I began interning here a year ago, and have since taken charge of our marketing efforts.

      My question: I'm having trouble identifying our competitors. Can I get some guidance on how to zone in on a few? I understand there are plenty factors that go into this but I really want to know if I'm supposed to be looking at our site as another pop culture/entertainment news site OR, more specifically, a site that generates content for baby booms?

  • AM

    Adrien Montcoudiol

    over 4 years ago #

    I used it a lot to get links and email addresses very fast for Business Development or PR.

    The best 3 tools to me:
    - http://www.kimonolabs.com
    - https://import.io
    - http://www.mozenda.com

  • CB

    Chris Bolman

    over 4 years ago #

    For those who know javascript I am a huge fan of using Cheerio (a node.js library for scraping). Cheerio lets you target DOM objects individually for scraping (so you can scrape based on CSS classes or any HTML structure on the page). Think jQuery on the server side. Really simple and powerful.

    Cheerio on Github: https://github.com/cheeriojs/cheerio
    node-crawler-cheerio: https://github.com/virushuo/node-crawler-cheerio

    • JB

      Jason Bates

      over 4 years ago #

      +1 for Cheerio - combined with Request.js (another node library) that will handle sessions, cookies, redirects, etc.

      It's a formidable combo if you want to do something more bespoke and complex. Very flexible.

    • DH

      Danny Halarewich

      over 4 years ago #

      Cheerio looks like a great tool. Thanks for sharing!

  • ND

    Nate Desmond

    over 4 years ago #

    I've done a lot of web scraping recently to analyze communities (like GH.com) to find the best websites to guest post and to analyze those websites to choose the best topics to write about.

    I've used:

    - https://import.io
    - http://monchito.com/blog/bulk-social-counts (to pull social stats in bulk)

  • ND

    Nichole Elizabeth DeMeré

    over 4 years ago #
    • JB

      Jon Bishop

      over 4 years ago #

      How was the course?

      • ND

        Nichole Elizabeth DeMeré

        over 4 years ago #

        I'd recommend it for beginners.

        It doesn't require any technical know-how. He shows viewers how to use import.io for web scraping rather than the alternative Python, regex, and XPath.

        There's a step-by-step walkthrough of how to extract data from Twitter, Reddit, Yelp, and Hacker News. It's like sitting next to someone while they're using their computer to show you how to use the app.

        Here's a breakdown of the course:

        An Introduction to Web Scraping
        - Introduction to Web Scraping
        - Strategies for Effective Web Scraping
        - Installing Import.io

        Connector Basics: Creating a PR News Dashboard of top industry blogs
        - PR News Dashboard Part 1
        - PR News Dashboard Part 2
        - PR News Dashboard Part 3

        Extractor Basics: Tracking your Competitors using Extractors
        - Competitive Intelligence Part 1
        - Competitive Intelligence Part 2

        Crawler Basics: Crawling for the Top 1000 Twitter Users
        - Extracting the Top 1000 Twitter Users
        - Reddit Scrapes
        - Crawling Reddit for Subreddits
        - Crawling Subreddits for Moderators
        - Crawling for Moderator Submissions

        Yelp Scrapes
        - Yelp URL Spreadsheet
        - Crawling Yelp Search Results Page
        - Crawling Yelp Business Profiles

        Twitter Scrapes
        - Twitter Crawl Spreadsheet
        - Crawling for Targeted Lists of Twitter Users

        Hacker News Scrapes
        - Extracting the Top 100 Hacker News Users
        - Crawling for a Top User's Submissions

        • MV

          Margaux Viola

          over 4 years ago #

          Great overview Nichole. I also found this course very helpful as an intro to web scraping. I primarily use and like Import.io (especially now that they support authenticated APIs) and would recommend Kimono as well.

  • DB

    Dean Brown

    over 4 years ago #

    You may also want to look at the Brody episode on Growthhackertv.com. Episode 114.

  • AD

    Ali Dinani

    over 4 years ago #

    I've always been a fan of having complete control. Personally I use Ruby and Mechanize

    • ND

      Nichole Elizabeth DeMeré

      over 4 years ago #

      Thanks, Ali! What's a basic rundown of how you're using them?

      • AD

        Ali Dinani

        over 4 years ago #

        I usually analyze the site using a chrome extension called SelectorGadget. Using this I use Mechanize to collect any data and output it in a useable format (such as json). I can either post the json data to a server that can use it, or I generate it when it's needed. Depends on the application at hand.

  • KK

    Katie Kent

    over 4 years ago #

    Absolutely! How technical are you?

    There's an awesome tool called Kimono that just came out of YC. It's pretty easy to use even if you don't write much code yourself: https://www.kimonolabs.com/

    If you are technical and want to learn the whole 9 yards of this stuff, we have a workshop coming up at Zipfian Academy: http://zipfianmlworkshop.eventbrite.com

    • ND

      Nate Desmond

      over 4 years ago #

      This is great Katie!

      A good friend of mine, Adam Gibson, is actually leading that workshop you mentioned, and I highly recommend it. He's brilliant at machine learning, so I think that course will be well worth attending.

    • ND

      Nichole Elizabeth DeMeré

      over 4 years ago #

      Oh yeah, that's the course I wish I was attending! I'm all the way in Florida, though. I'm very envious of those who get to attend, haha.

  • FS

    Franz Allan See

    over 4 years ago #

    Surprisingly, nobody has ever mentioned Python's BeautifulSoup yet. Combine that with the Request library and it makes it super easy and super clean code.

  • VB

    vincent barr

    over 4 years ago #

    It depends on your use case, but you can accomplish quite a bit with the Chrome extension Web Scraper (https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en) and a little XML knowledge/googling.

  • RG

    Ramya Gogineni

    over 4 years ago #

    I've scraped Pinterest for some stats for an ecommerce company I was doing a project for. I used Google Chrome tool called Scraper. The autoscroll on Pinterest makes many scraping tools useless, thus I chose this plain text tool. Read more about it here: http://to.pbs.org/PdD66H

  • DL

    Dylan La Com

    over 4 years ago #

    Good question Nichole. I've dabbled with Import.io but have never actually applied it to any projects. Curious to hear how others have used scraping..

    • BP

      Brandon Pindulic

      over 4 years ago #

      same here. only thing i use somewhat frequently is the Google Chrome scrape extension. Works like a charm in just a few clicks for basic scrapes

  • TG

    Todd Giannattasio

    over 4 years ago #

    Here's a great tutorial for beginners (this was the first time I'd done something like this) - http://findmyblogway.com/scraping-communities-with-xpath/

    We're using this for clients when we find niche social media sites with profile pages.

    Then using some proven outreach scripts we've pieced together to make contact and build relationships. Here's a good presentation that covers various needs - http://summit.grouphigh.com/assets/producers/V423grouphigh_223456291822641435.pdf

  • DR

    dave rigotti

    over 4 years ago #

    Assuming you are scraping for sales / marketing growth, the easiest thing to look for is certain scripts to define your target. Eg if you are a marketing automation platform you may want to look for companies investing in AdWords but aren't using another marketing automation platform.

    http://nerdydata.com/ is a good out of the box script search engine. Spyfu.com is also decent if you are looking for contact information too and you want people spending on AdWords.

    Two ideas of what you can do once you have the list:
    - Develop a LinkedIn company targeting campaign for whitepaper content
    - Run custom audience Facebook Ad campaign against the email addresses

  • BL

    Brian Lynn

    over 4 years ago #

    For those of us on Rails, simply install the Nokogiri gem (http://nokogiri.org/), and run it as a rake task (put rake file under /lib/tasks, then in the terminal run "rake [task_name]"). The Nokogiri library's doc.css() function allows you to easily access the HTML level and element based on their CSS selectors (which are often unique. You can also easily get attributes like href, value, name etc.

  • VP

    Valentina Porcu

    about 4 years ago #

    HI! I'd like to share a coupon code for my course about data extraction and web scraping:

    https://www.udemy.com/tips-and-tricks-for-the-data-extraction-and-web-scraping/?couponCode=GRHAEN1

    My first interest is about extracting user's comments for the sentiment and buzz analysis

    I'd love your insights!

  • DC

    Dan Cave

    about 4 years ago #

    10000 leads n 10 minutes - written by import.io founder Andrew Fogg and presented at Hustle Con 2014:
    http://blog.import.io/post/10000-leads-in-10-minutes

    Web Scraping for Sales & Growth Hacking Udemy course (7441 students enrolled):
    https://www.udemy.com/learn-web-scraping-in-minutes/?dtcode=E6YnShu1yf5P

    FYI: I am a growth type person at import.io

  • NR

    Nick Rizzuti

    over 4 years ago #

    while working with a restaurant loyalty start up we built a custom scraper to scrape competitors sites and yelp. We used the data from the competitors sites to track their growth rates and feed leads in our cms system. We then used the yelp data to compare the the quality or business the competitors we bringing on and set goals for our sales team. Feel free to message me if you have questions.

  • DM

    Depesh Mandalia

    over 4 years ago #

    This is a quite a broad area. For example when I was at a global retailer we scrapped prices daily/hourly to ensure we remainded competitive. I run affiliate marketing websites in which I scrape offers/voucher codes to list on my own sites. As someone has mentioned here I also scrape the HTML code of websites to for SEO purposes, to identify opportunities. I've even scraped

    I think it really depends on what you're looking to hack for growth :-)

    Here's another one for the list for scraping news outlets for related stories, which is great to support blog writing in any niche

    http://seogadget.com/content-strategy-generator-tool-v2-update/

    • DM

      Depesh Mandalia

      over 4 years ago #

      typo 1: "to ensure we *remained* competitive"
      typo 2: "I’ve even scraped" ... "... LinkedIn results pages to pull out potential job candidates through a search - the extent for scraping is extremely broad"

  • TG

    Todd Giannattasio

    over 4 years ago #

    What are some tools for Mac that can help with scraping?

    I have a netbook that we keep in the office just in case we need a PC for something, so I've been using SEO Tools for Excel and Xpath there. But it would be way easier on our main computers.

  • BR

    Bartolome Rodriguez

    over 4 years ago #

    My last scraping was very useful. We identified some kind of directory, with leads for our business. We got the urls of their websites and then send them to builtwith to see if they had certain cms and plugin.

    My tools for scraping are from my OS (Debian) I can do almost any scrap with curl, sed, cut, grep, sort, uniq. Very funny.

  • EK

    Eugene K

    over 4 years ago #

    Depends on the complexity of scraping itself, e.g. how "protected" (in a good sense, we are not hacking anything here!) is the data you are looking for, what technology is the data source using. That is to say, with the solutions like kimonolabs/mozenda it i quite hard to get the data behind some sophisticated ASP forms/login or systems that implement anti-crawler solutions + some post-processing, so i would tag them as scraping "entry level", although they cover majority of scraping activities.
    Some time ago i executed a project within legal field and we needed to get through millions of court cases daily to identify the potential leads before the competitors did, and we used this framework http://scrapy.org/ it's very flexible and configurable and suitable for large projects if "entry level" solution is not enough.

    P.S. fun fact, during this project i mentioned we got optimized 30+ manual data entry people department into only 5 support people, saving thousands yearly and improving the profits greatly. It was quite interesting growth hack, i am thinking on writing an article about it sometime.

  • PS

    Pietro Saccomani

    over 4 years ago #

    I use the Dataminer Chrome extension to quickly grab a csv of anything in a list format e.g. google SERPs

    https://chrome.google.com/webstore/detail/dataminer/nndknepjnldbdbepjfgmncbggmopgden

  • SB

    Seth Baum

    over 4 years ago #

    Any "dangers" of scraping? Like CAN SPAM, or companies sueing, or whatever? Not trying to be a wet blanket on great techniques - just didn't know if there were things to "watch out for" or "you'll get penalized for doing this" or "that is downright illegal in some states" etc.

  • AS

    Amit Sonawane

    over 4 years ago #

    Wow I just learned so much from this thread. Thanks everyone! Ballers.

  • WP

    Wilson Peng

    almost 4 years ago #

    I'm a little different. I'm familiar with import.io, Google docs, scraperbox etc. , but I love using traditional Ruby/Python code to do it. Much more effective imo.

  • GE

    genna elvin

    over 4 years ago #

    TaDaweb is a great data extractor as it can get data directly from APIs and RSS feeds, but it can also get info directly from websites. There is an SDK which enables you to create web data recipes for extracting precise data. It isn't so much web scrapping, it's more like web data mashup.

Join over 70,000 growth pros from companies like Uber, Pinterest & Twitter

Get Weekly Top Posts
High five! You’re in.
SHARE
36
36