No results found for your search
Ask GH: How have you used web scraping for growth hacking? What are some methodologies or resources you'd recommend for someone who wants to get started?
Really curious as to what everyone is doing and any recommended methodologies or resources.
I do a lot of scraping for SEO, sometimes about 10-20 sites a day! My goal is to discover profitable keywords so I can start creating lots of content around those topics. Scraping is only good if you take action on what you learn :relaxed: Here are the details:
1. Use SEMrush to identify top 5-10 competitors for target keywords
2. Put competitors sites through Screaming Frog, 1 at a time.
3. Export the scrape to csv
4. Delete all columns except page title
5. Use a formula to compare page titles to 2 other columns, 1 with a list of all US cities (for local SEO), and another column with all the major industry keywords (at least 1000) from SEMrush, Ubersuggest, and Google KWP.
6. Then I sort by #, so I can identify not only what keywords/locations a site is targeting, but exactly how many pages they’ve built around those keywords
7. Next step is to analyze those pages to see how they ranked, and then replicate. This is outside of scraping now so I’ll stop here.
For small sites you can accomplish the same thing by searching Google for site:www.example.com and manually look through, or site:www.example.com intitle:”keywordhere” but you need to know the keyword.
That’s the beauty of this scrape technique is that you not only get to discover all the keywords they’re targeting, but you can also quantify the # of pages optimized for each keyword, and then sort them.
Thanks for sharing Chris.
Chris, I love this - a great way to show how an SEO expert thinks. :) I use ScreamingFrog for initial SEO audits but hadn't considered it for other purposes. Thanks for sharing!
Thank you both for the kind words :)
I use it for checking if certain tags exist - since technically it looks at an entire site, you can check a large number of pages for tags with it using custom variables
This is awesome! Thanks for sharing!
Chris, you nail it pretty hard on the head here!
I work for a startup in Miami, FL, DoYouRemember.com ( http://www.doyouremember.com ). We create original content focused on nostalgia primarily targeting baby boomers. I began interning here a year ago, and have since taken charge of our marketing efforts.
My question: I'm having trouble identifying our competitors. Can I get some guidance on how to zone in on a few? I understand there are plenty factors that go into this but I really want to know if I'm supposed to be looking at our site as another pop culture/entertainment news site OR, more specifically, a site that generates content for baby booms?
Oh and more importantly, thanks so much for sharing!
Scrapebox is amazing for all sorts of things: www.matthewwoodward.co.uk/tutorials/scrapebox-tutorial/
Scrape with Google Docs: http://www.annielytics.com/blog/google-docs/how-to-scrape-the-web-using-google-docs/
a few more:
Thanks, Morgan! Good list of resources to review.
I used it a lot to get links and email addresses very fast for Business Development or PR.
The best 3 tools to me:
I was going to recommend Kimono!
Kimono is still in beta mode, and it's not working properly. Way easier and more advanced in development stage is import.io from what I have experienced.
Thanks, Adrien. Do you have any related case studies?
I did have that first one, looks pretty good,, same basis.
Ill trade you http://tubes.io/ for it. Same tool basically, but maybe the UI will make more sense to you..
For more advanced of course you can use http://scrapy.org
http://textract.me was another awesome tool, but seems to be down..
Cheerio on Github: https://github.com/cheeriojs/cheerio
+1 for Cheerio - combined with Request.js (another node library) that will handle sessions, cookies, redirects, etc.
It's a formidable combo if you want to do something more bespoke and complex. Very flexible.
Cheerio looks like a great tool. Thanks for sharing!
I've done a lot of web scraping recently to analyze communities (like GH.com) to find the best websites to guest post and to analyze those websites to choose the best topics to write about.
- http://monchito.com/blog/bulk-social-counts (to pull social stats in bulk)
So would you say you're using scraping for curation types of purposes?
I have this book:
And I've taken this course:
How was the course?
I'd recommend it for beginners.
It doesn't require any technical know-how. He shows viewers how to use import.io for web scraping rather than the alternative Python, regex, and XPath.
There's a step-by-step walkthrough of how to extract data from Twitter, Reddit, Yelp, and Hacker News. It's like sitting next to someone while they're using their computer to show you how to use the app.
Here's a breakdown of the course:
An Introduction to Web Scraping
- Introduction to Web Scraping
- Strategies for Effective Web Scraping
- Installing Import.io
Connector Basics: Creating a PR News Dashboard of top industry blogs
- PR News Dashboard Part 1
- PR News Dashboard Part 2
- PR News Dashboard Part 3
Extractor Basics: Tracking your Competitors using Extractors
- Competitive Intelligence Part 1
- Competitive Intelligence Part 2
Crawler Basics: Crawling for the Top 1000 Twitter Users
- Extracting the Top 1000 Twitter Users
- Reddit Scrapes
- Crawling Reddit for Subreddits
- Crawling Subreddits for Moderators
- Crawling for Moderator Submissions
- Yelp URL Spreadsheet
- Crawling Yelp Search Results Page
- Crawling Yelp Business Profiles
- Twitter Crawl Spreadsheet
- Crawling for Targeted Lists of Twitter Users
Hacker News Scrapes
- Extracting the Top 100 Hacker News Users
- Crawling for a Top User's Submissions
Great overview Nichole. I also found this course very helpful as an intro to web scraping. I primarily use and like Import.io (especially now that they support authenticated APIs) and would recommend Kimono as well.
You may also want to look at the Brody episode on Growthhackertv.com. Episode 114.
Thanks, Dean. I've watched that episode and also recommend it. Of course, it lead me to purchase his book. :)
I've always been a fan of having complete control. Personally I use Ruby and Mechanize
Thanks, Ali! What's a basic rundown of how you're using them?
I usually analyze the site using a chrome extension called SelectorGadget. Using this I use Mechanize to collect any data and output it in a useable format (such as json). I can either post the json data to a server that can use it, or I generate it when it's needed. Depends on the application at hand.
Hadn't heard of selectorGadget. Very cool, thanks for sharing.
Absolutely! How technical are you?
There's an awesome tool called Kimono that just came out of YC. It's pretty easy to use even if you don't write much code yourself: https://www.kimonolabs.com/
If you are technical and want to learn the whole 9 yards of this stuff, we have a workshop coming up at Zipfian Academy: http://zipfianmlworkshop.eventbrite.com
This is great Katie!
A good friend of mine, Adam Gibson, is actually leading that workshop you mentioned, and I highly recommend it. He's brilliant at machine learning, so I think that course will be well worth attending.
Oh yeah, that's the course I wish I was attending! I'm all the way in Florida, though. I'm very envious of those who get to attend, haha.
Surprisingly, nobody has ever mentioned Python's BeautifulSoup yet. Combine that with the Request library and it makes it super easy and super clean code.
I use it and love it. I actually wrote code to automate parts of http://www.austenallred.com/the-hackers-guide-to-getting-press/
if you want it
It depends on your use case, but you can accomplish quite a bit with the Chrome extension Web Scraper (https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en) and a little XML knowledge/googling.
I've scraped Pinterest for some stats for an ecommerce company I was doing a project for. I used Google Chrome tool called Scraper. The autoscroll on Pinterest makes many scraping tools useless, thus I chose this plain text tool. Read more about it here: http://to.pbs.org/PdD66H
Good question Nichole. I've dabbled with Import.io but have never actually applied it to any projects. Curious to hear how others have used scraping..
same here. only thing i use somewhat frequently is the Google Chrome scrape extension. Works like a charm in just a few clicks for basic scrapes
Here's a great tutorial for beginners (this was the first time I'd done something like this) - http://findmyblogway.com/scraping-communities-with-xpath/
We're using this for clients when we find niche social media sites with profile pages.
Then using some proven outreach scripts we've pieced together to make contact and build relationships. Here's a good presentation that covers various needs - http://summit.grouphigh.com/assets/producers/V423grouphigh_223456291822641435.pdf
Assuming you are scraping for sales / marketing growth, the easiest thing to look for is certain scripts to define your target. Eg if you are a marketing automation platform you may want to look for companies investing in AdWords but aren't using another marketing automation platform.
http://nerdydata.com/ is a good out of the box script search engine. Spyfu.com is also decent if you are looking for contact information too and you want people spending on AdWords.
Two ideas of what you can do once you have the list:
- Develop a LinkedIn company targeting campaign for whitepaper content
- Run custom audience Facebook Ad campaign against the email addresses
For those of us on Rails, simply install the Nokogiri gem (http://nokogiri.org/), and run it as a rake task (put rake file under /lib/tasks, then in the terminal run "rake [task_name]"). The Nokogiri library's doc.css() function allows you to easily access the HTML level and element based on their CSS selectors (which are often unique. You can also easily get attributes like href, value, name etc.
HI! I'd like to share a coupon code for my course about data extraction and web scraping:
My first interest is about extracting user's comments for the sentiment and buzz analysis
I'd love your insights!
10000 leads n 10 minutes - written by import.io founder Andrew Fogg and presented at Hustle Con 2014:
Web Scraping for Sales & Growth Hacking Udemy course (7441 students enrolled):
FYI: I am a growth type person at import.io
while working with a restaurant loyalty start up we built a custom scraper to scrape competitors sites and yelp. We used the data from the competitors sites to track their growth rates and feed leads in our cms system. We then used the yelp data to compare the the quality or business the competitors we bringing on and set goals for our sales team. Feel free to message me if you have questions.
Thanks for sharing a real-world example, Nick. What went into the decision process to build a custom scraper?
This is a quite a broad area. For example when I was at a global retailer we scrapped prices daily/hourly to ensure we remainded competitive. I run affiliate marketing websites in which I scrape offers/voucher codes to list on my own sites. As someone has mentioned here I also scrape the HTML code of websites to for SEO purposes, to identify opportunities. I've even scraped
I think it really depends on what you're looking to hack for growth :-)
Here's another one for the list for scraping news outlets for related stories, which is great to support blog writing in any niche
typo 1: "to ensure we *remained* competitive"
typo 2: "I’ve even scraped" ... "... LinkedIn results pages to pull out potential job candidates through a search - the extent for scraping is extremely broad"
What are some tools for Mac that can help with scraping?
I have a netbook that we keep in the office just in case we need a PC for something, so I've been using SEO Tools for Excel and Xpath there. But it would be way easier on our main computers.
My last scraping was very useful. We identified some kind of directory, with leads for our business. We got the urls of their websites and then send them to builtwith to see if they had certain cms and plugin.
My tools for scraping are from my OS (Debian) I can do almost any scrap with curl, sed, cut, grep, sort, uniq. Very funny.
Depends on the complexity of scraping itself, e.g. how "protected" (in a good sense, we are not hacking anything here!) is the data you are looking for, what technology is the data source using. That is to say, with the solutions like kimonolabs/mozenda it i quite hard to get the data behind some sophisticated ASP forms/login or systems that implement anti-crawler solutions + some post-processing, so i would tag them as scraping "entry level", although they cover majority of scraping activities.
Some time ago i executed a project within legal field and we needed to get through millions of court cases daily to identify the potential leads before the competitors did, and we used this framework http://scrapy.org/ it's very flexible and configurable and suitable for large projects if "entry level" solution is not enough.
P.S. fun fact, during this project i mentioned we got optimized 30+ manual data entry people department into only 5 support people, saving thousands yearly and improving the profits greatly. It was quite interesting growth hack, i am thinking on writing an article about it sometime.
I use the Dataminer Chrome extension to quickly grab a csv of anything in a list format e.g. google SERPs
Any "dangers" of scraping? Like CAN SPAM, or companies sueing, or whatever? Not trying to be a wet blanket on great techniques - just didn't know if there were things to "watch out for" or "you'll get penalized for doing this" or "that is downright illegal in some states" etc.
Wow I just learned so much from this thread. Thanks everyone! Ballers.
I'm a little different. I'm familiar with import.io, Google docs, scraperbox etc. , but I love using traditional Ruby/Python code to do it. Much more effective imo.
TaDaweb is a great data extractor as it can get data directly from APIs and RSS feeds, but it can also get info directly from websites. There is an SDK which enables you to create web data recipes for extracting precise data. It isn't so much web scrapping, it's more like web data mashup.
Use the feedback box below if you have a question, comment or general feedback.
Your feedback has been sent.
Sweet! The link has been copied to your clip boardy board!
Flash isn't supported. Please copy the link manually.