Home | Javadocs | Quickstart Tutorial | FAQ | Download Jaunt

Jaunt Java Web Scraping & Automation

Introduction
Sept 29, 2014
0.9.9.2 Release!

Test drive Jaunt today and leave feedback in the forum to help shape the next release!
Jaunt Beta is a new, free, Java solution for web-scraping & web-automation tasks. Because the library provides an ultra-light headless browser (ie, no GUI), by using Jaunt your Java programs can easily perform browser-level, document-level, and DOM-level operations. Jaunt is the ideal tool when Javascript support is not required, including:
  • filling out and submitting forms
  • creating web-bots or web-scraping programs.
  • interfacing with REST APIs or web-apps (HTML, XHTML or XML).
  • automated testing.
Code example: Google scraper - search for 'butterflies'
UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser)
userAgent.visit("http://google.com");          //visit google
userAgent.doc.apply("butterflies");            //apply form input (starting at first editable field)
userAgent.doc.submit("Google Search");         //click submit button labelled "Google Search"
    
Elements links = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>");      //find search result links
for(Element link : links) System.out.println(link.getAt("href"));               //print results
Jaunt API Features:
Jaunt Beta is free [see product comparison]. Features include:
  • HTML, XHTML, XML parsing.
  • Protocols: HTTP, HTTPS, basic auth.
  • Form fill-out via field labels/names/sequence.
  • Automatic form permutation.
  • File downloading/uploading.
  • Saving complete web page (images, js, css, etc).
  • Table data extraction.
  • DOM navigation, search & search chaining.
  • Regex-enabled querying.
  • HTTP header/cookie manipulation.
  • HTTP/HTTPS proxy support.
  • Customizable caching.
  • Customizabe content handlers.
  • 100% Java (no dependencies)

Home | Javadocs | Quickstart Tutorial | FAQ | Download Jaunt