x-ray

The next web scraper. See through the <html> noise
Thanks to your financial contributions, we are operating on an estimated annual budget of $70
Today's Balance
$65.80
Estimated Annual Budget
$70

About

The next web scraper. See through the <html> noise

Features

Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

Composable: The API is entirely composable, giving you great flexibility in how you scrape each page.

Pagination support: Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped.

Crawler support: Start on one page and move to the next easily. The flow is predictable, following a breadth-first crawl through each of the pages.

Responsible: X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly.

Pluggable drivers: Swap in different scrapers depending on your needs. Currently supports HTTP and PhantomJS driver drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.

Team

Meet the awesome people that are bringing the community together! 🙌
Matthew Mueller
Collective Admin since October 2014

Budget

Current balance: $65.80

Latest transactions

Monthly donation to x-ray (Backers)

🎉
$2
USD
Scraper API | 21 days ago | View Details

Monthly donation to x-ray (Backers)

🎉
$2
USD

Monthly donation to x-ray (Backers)

🎉
$2
USD
ProxyCrawl | 21 days ago | View Details

Monthly donation to x-ray (Backers)

🎉
$2
USD
Netho.me | 21 days ago | View Details

Monthly donation to x-ray (Backers)

🎉
$2
USD
Scraper API | 2 months ago | View Details