Monday, 10 November 2014

Review: import.io’s New Scraping Process and Features

Web scraping Data platform import.io, announced last week that they have secured $3M in funding from investors that include the founders of Yahoo! and MySQL.

They also released a new beta version of the tool that is essentially a better version of their extraction tool, with some new features and a much cleaner and faster user experience.

First Impression

I’ve used the tool for a week and can say it is an improvement over the old version – which was a bit bulky and awkward. While still not exactly the most intuitive process, the development team at import.io has managed to slim down what was a relatively button heavy process, without sacrificing any of the functionality – they made the new workflow both simpler and more complicated at the same.

The new version features a simple tool bar across the top as opposed to the space hogging table and wizard from before, which is a large improvement on the pink and white of the previous version.

True, the loss of the wizard means there isn’t as much guidance as before (the pop-up help only appears on the first use), but the undo button means you don’t really need it. You can click around and experiment a bit with the different extraction options before settling down to do some real work.

Data Extraction

Once you’ve figured out how it works, the new version requires far fewer mouse clicks to get from the page to a table of data/API as shown in their homepage video.

All you need to do now is navigate to a website, click a single piece of data on the page – such as price, image, or URL – and their app will find all the other examples of similar data on the website, immediately creating a structured table of data.

download2

This latest version of the extractor also includes a important new feature labeled “Suggest Data”. Its important because it lets you extract all the data from a page, instantly creating a table of data that can be published as an API. This makes import.io very exciting and quick, I spent a long time playing with this and it worked on the majority of sites.

Advanced Features

Most non-programmer web scrapers struggle with complex sites that use JavaScript or iFrames, but import.io also now deals with this. In the basic mode you can toggle JavaScript and CSS on and off to help you see your data better.

If that doesn’t work, you can switch into an ‘advanced mode’ where import allows you to write your own XPath and RegExp. They’ve also added a source code view, though without the ability to click on the site and inspect element (like in Chrome) this feature isn’t particularly useful.

API Integration

Once you’ve created your scraper, there are a number of options for what you can do with it.

If you’ want you can just copy and paste the data into a spreadsheet or Download as CSV. You can also push your data directly Google Sheets, with import.io’s self generated formula.

For the rest of us, they have surfaced both the POST and GET requests for you and given you a JSON view which allows you to see how the data is returned, which is handy.

All this functionality is nice, and it’s clear they’re trying to cater to all technical levels, but it has made the API page somewhat messy and potentially confusing for newer or less technical users, but they should be able to get what they need.

Good with lots of Potential

Their new tool certainly isn’t perfect. There are still a few sites where manual row training is required and you can’t access the authentication feature (though you can still do this in the old version) or pagination.

Even if it’s not quite there yet, if import.io continue like this, they are well on its way to becoming the best data scraping platform on the market. Especially when you consider the “free for life” price tag.

Source:http://scraping.pro/review-import-ios-new-scraping-tools-features/

No comments:

Post a Comment