Skip to main content

Scrape Google Using Another Google Product

John Q. Public owns a website and would like to get a list of all pages on his site that are indexed in Google search. He would also like to monitor his web page rankings in Google for particular search keywords vis-a-vis other rival websites.

There are powerful command-line tools like curl and wget that one can use to download Google search result pages automatically. The HTML pages can then be parsed using Python’s Beautiful Soup library or the Simple HTML DOM parser of PHP but these methods are too technical and involve coding. The other issue is that Google is very likely to temporarily block your IP address should you send them a couple of automated requests in quick succession.

Scrape Google Search Results

Web Scraping Google using Google Docs

If you ever need to extract data from Google search results, Google offers a free tool that might just do the job. It’s called Google Docs and since docs will be fetching Google search pages within Google’s own network, the scraping requests are less likely to get blocked.

The idea is simple. The Google Sheet will fetch and import Google search results using the built-in ImportXML function. It then extracts the page titles and URLs using XPath expressions and then grabs the favicons of the web domain using another Google’s own favicon converter.

You can further customize the Google Search results by changing the sort order – you can sort results by relevance or by date published – the results can be restricted to pages that were published in the last hour, week, month or year. The number of results appearing in search results can be modified as well.

To get started, open this Google sheet and choose File -> Make a copy to clone the sheet in your Google Drive. You can now play with the various parameters in cells that are highlighted in light blue color.

Spreadsheet Functions for Scraping Web Pages

Writing a scraping tool with Google sheets is simple and involve a few formulas and built-in functions. Here’s how it was done:

1. Construct the Google Search URL with the search query and sorting parameters. You can also use advanced Google search operators like site, inurl, around and others.

2. Get the title of pages in search results using the XPath //h3 (in Google search results, all titles are served inside the H3 tag).

=IMPORTXML(STEP1, "//h3[@class='r']")

You can find the XPath of any element using Chrome Dev Tools

Find the XPath of any element using Chrome Dev Tools

3. Get the URL of pages in search results using another XPath expression

=IMPORTXML(STEP1, "//h3/a/@href")

4. All external URLs in Google Search results have tracking enabled and we’ll use Regular Expression to extract clean URLs.

=REGEXEXTRACT(STEP3, "\/url\?q=(.+)&sa")

5. Now that we have the page URL, we can again use Regular Expression to extract the website domain from the URL.

=REGEXEXTRACT(STEP4, "https?:\/\/(.[^\/]+)")

6. And finally, we can use this website with Google’s S2 Favicon converter to show the favicon image of the website in the sheet.


And now that you have the Google Search results inside the sheet, you can export the data as a CSV file, publish the sheet as an HTML page (and it will refresh automatically) or you can go a step further and write a Google Script that will send you the sheet as PDF daily.

This story, Scrape Google Using Another Google Product, was originally published at Digital Inspiration on 13/03/2014 under Google, Google Docs, Internet

from Digital Inspiration Technology Blog


Post a comment

Popular posts from this blog

How to Get SMS Alerts for Gmail via Twitter

How do you get SMS notifications on your mobile phone for important emails in your Gmail? Google doesn’t support text notifications for their email service but Twitter does. If we can figure out a way to connect our Twitter and Gmail accounts, the Gmail notifications can arrive as text on our mobile via Twitter. Let me explain:Twitter allows you to follow any @user via a simple SMS. They provide short codes for all countries (see list) and if you text FOLLOW to this shortcode following by the  username, any tweets from that user will arrive in your phone as text notifications. For instance, if you are in the US, you can tweet FOLLOW labnol to 40404 to get my tweets as text messages. Similarly, users in India can text FOLLOW labnol to 9248948837 to get the tweets via SMS.The short code service of Twitter can act as a Gmail SMS notifier. You create a new Twitter account, set the privacy to private and this account will send a tweet when you get a new email in Gmail. Follow this account …

Instagram Story links get 15-25% swipe-through rates for brands, publishers

Instagram may arrived late as a traffic source for brands and publishers, but it’s already showing early signs of success, driving new visitors to their sites and even outperforming its parent company, Facebook.For years brands, publishers and other have tried to push people from the Facebook-owned photo-and-video-sharing app to their sites. Outside of ads and excepting a recent test with some retailers, Instagram didn’t offer much help to companies looking to use it to drive traffic. So they had to find workarounds. They put links in their Instagram bios. They scrawled short-code URLs onto their pictures. And they typed out links in their captions.Then last month Instagram finally introduced an official alternative to these hacky workarounds: the ability for verified profiles to insert links in their Instagram Stories.Almost a month after the launch, 15% to 25% of the people who see a link in an Instagram Story are swiping on it, according to a handful of brands and publishers that h…

Five great tools to improve PPC ads

Every digital marketer wants to reach the top position on the search engine results. However, if you’ve recently launched a new website or your niche is saturated, starting with paid search ads sounds like a good idea.Strategically created PPC campaigns can drive leads, sales or sign-ups to your websites. You know what? In fact, businesses earn an average of $8 for every dollar they spend on Google Ads.Optimizing PPC campaigns is not easy, but it’s very powerful if you do it properly. Just like SEO, it is essential to conduct extensive keyword research, optimize ad copy, and design high-converting landing pages.Fortunately, there are a lot of effective PPC tools that will help you analyze your competitors’ PPC strategies, figure out tricks in their campaigns, and improve your PPC campaigns.If you are ready to take an evolutionary leap in your PPC advertising, take a look at my list of five amazing tools to save you time, give you crucial insights, and raise money for your business.Fiv…