Home | Javadocs | Web-Scraping Tutorial | JSON Querying Tutorial | FAQ | Download

Jaunt Java Web Scraping & JSON Querying

Introduction
Oct 31, 2024
Jaunt 1.6.1 release

Test drive Jaunt today and leave feedback in the forum to help shape the next release!
Jaunt is a Java library for web-scraping, web-automation and JSON querying. The library provides a fast, ultra-light browser that is "headless" (ie has no GUI). The browser provides web-scraping functionality, access to the DOM, and control over each HTTP Request/Response, but does not support Javascript*. Jaunt enables your Java programs to:
  • peform web-scraping and JSON data extraction
  • work with forms and tables
  • control/process individual HTTP Requests/Responses
  • interface with REST APIs or web-apps (JSON, HTML, XHTML, or XML).
*Now you can automate Chrome, Firefox, Safari, Internet Explorer, etc,
with full Javacript support, using Jauntium.
Code example: Google scraper - search for 'butterflies'
UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser)
userAgent.visit("http://google.com");          //visit google
userAgent.doc.apply("butterflies").submit();   //apply form input and submit
    
Elements links = userAgent.doc.findEvery("<h3>").findEvery("<a>");  //find search result links
for(Element link : links) System.out.println(link.getAt("href"));   //print results
Features:
Jaunt is free [see product comparison]. Features include:
  • Parsing dirty HTML, XHTML, XML, JSON
  • Support for HTTP, HTTPS & basic auth.
  • Form fill-out via field labels/names/sequence.
  • Generating form data permutations.
  • File downloading/uploading.
  • Saving complete web page (images, js, css, etc).
  • Table data extraction.
  • Fluent DOM navigation & search (search chaining).
  • Regex-enabled querying in DOM & JSON.
  • HTTP header/cookie manipulation.
  • HTTP/HTTPS proxy support.
  • Customizable caching & content handlers.
  • Web pagination discovery.
  • 100% Java (no dependencies)

Home | Javadocs | Web-Scraping Tutorial | JSON Querying Tutorial | FAQ | Download