Home | Javadocs | Web-Scraping Tutorial | JSON Querying Tutorial | FAQ | Download

Jaunt JSON Querying Tutorial

Overview of the JSON Parser

The Jaunt package contains the the class UserAgent, which represents a headless browser. The UserAgent is capabable parsing and querying JSON, allowing it to interact with REST-ful applications. Consider the following JSON for a book:

            {
                "title": "The Big Book of Inventions",
                "editions": [1985, 1999, 2005, 2010],
                "author": {
                    "firstName": "Greg",
                    "lastName": "Gilmore"
                },
                "inStock" : false
            }
This JSON is a tree structure composed of objects, arrays, and primitives (Strings, booleans, numerics). Unlike primitives, objects and arrays can contain other objects/arrays/primitives. To represent each of these, the class JNode is used ("JSON Node"). Every JNode has a type, which tells whether the JNode represent an object, array, or one of the primitive types (see JNode.getType(), which returns either Type.ARRAY, Type.OBJECT, Type.NUMBER, etc).

In the book example, the root JNode is of Type.OBJECT. It is a parent to four child JNodes, each associated with a name ("title", "editions", "author"). The child node named "title" is a JNode of Type.STRING that represents the String "The Big Book of Inventions". The "editions" child is a JNode (of Type.ARRAY) which contains four child JNodes, each of which represents a number. The "author" child is a JNode (of Type.OBJECT) that represents a JSON object, which has two of its own child JNodes, both Strings. Finally, The "inStock" child is a JNode (of Type.BOOLEAN) that represents the boolean value false. As you can see, when working with a JNode, it's important to be cognizant of its type.

The class JNode provides intuitive and powerful search methods for finding nodes. Queries resemble the JSON object that they are intended to match (see Examples 8 and 9). When/if the parser encounters dirty JSON, it attempts to correct the data and is guaranteed to create a parse tree. To begin using Jaunt, download and extract the zip file. The zip file contains the licensing agreement, javadocs documentation, example files, release notes, and a jar file (Java 1.6). Include the jar file in your classpath/project, at which point you will be able to recompile and/or run the example files.

Example 1: Create a UserAgent, send a GET request, print JSON response.
try{
  UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser).
  userAgent.sendGET("http://jsonplaceholder.typicode.com/posts/1");   //send request
  System.out.println(userAgent.json);	         //print the retrieved JSON object
  System.out.println("Other response data: " + userAgent.response); //response metadata, including headers.
}
catch(JauntException e){         //if an HTTP/connection error occurs, handle JauntException.
  System.err.println(e);
}
This example illustrates creating a UserAgent, sending a GET request, and printing the JSON response.

When the userAgent sends the request (line 3), it creates a JNode (userAgent.json) to represent the root node of the JSON object in the response. On line 4, the JSON object is printed. Malformed JSON is automatically corrected by the parser, so the printed output may not be identical to the original/unparsed data. The UserAgent method getSource() can be called to access the original/unaltered JSON.

On line 5, the response object is printed. The response object contains metadata related to the HTTP response, including header information (see the class HttpResponse for accessing specific headers, etc). On line 7 we catch JauntException, which is the superclass of all other Jaunt-related Exceptions. Example 3 demonstrates handling HTTP/connection errors separately from search-related errors.

In addition to sendGET(...), the UserAgent class has methods for performing the most common HTTP requests types: sendPOST(...), sendDELETE(...), sendPUT(...), and sendHEAD() See the Extra Topics Tutorial for examples of these methods.

Example 2: UserAgent settings, searching using findFirst.
try{
  UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser)
  System.out.println("SETTINGS:\n" + userAgent.settings);      //print the userAgent's default settings.
  userAgent.settings.autoSaveJSON = true;        //change settings to autosave last visited page.
   
  userAgent.sendGET("http://jsonplaceholder.typicode.com/posts/1");   //send request
  JNode title = userAgent.json.findFirst("title"); 
  System.out.println("title: " + title);	

  JNode body = userAgent.json.findFirst("body");
  System.out.println("body:" + body);      
}
catch(JauntException e){                         //if an HTTP/connection error occurs, handle JauntException.
  System.err.println(e);
}
This example illustrates creating a UserAgent, sending a request to a REST endpoint, and printing the JSON response.

On line 4 autosaving is enabled, which means that anytime a JSON response is received it will be autosaved in the file LAST_VISITED.json in the directory specified by settings.outputPath.

On lines 7 and 10, the method findFirst(String) is invoked on the root node of the JSON reponse. The findFirst method accepts a query that (in the simplest case) is the name of some JSON node. It recursively searches the JSON until it finds a JNode with a matching name, and then returns that JNode. It should be noted that name portion of the query (title) is actually a regular expression, which provides a powerful syntax for pattern matching. For example, the query "head(er|ing)" would match a JNode named header or heading. Example 8 provides a full account of the query syntax.

Example 3: Detecting HTTP errors and connection errors with the Response object.
try{
  UserAgent userAgent = new UserAgent();      
  userAgent.sendGET("http://jsonplaceholder.typicode.com/posts/1");   
  System.out.println("Response:\n" + userAgent.response);  //print response data
}
catch(ResponseException e){                                //catch HTTP/Connection error
  HttpResponse response = e.getResponse();                 //or check userAgent.response
  if(response != null){                                    //print response data field by field
    System.err.println("Requested url: " + response.getRequestedUrlMsg()); //print the requested url
    System.err.println("HTTP error code: " + response.getStatus());        //print HTTP error code
    System.err.println("Error message: " + response.getMessage());         //print HTTP status message
  }
  else{
    System.out.println("Connection error, no response!");
  }
} 
When the UserAgent attempts to visit a url, it's possible that the connection to the webserver will fail or that an HTTP error code will be returned. The HttpResponse object (userAgent.response) contains information about the webserver response, including response headers. If no error occurs, UserAgent.response can be examined for details regarding the response, as on line 4.

If the connection fails or an HTTP error occurs, the UserAgent.sendGET(String) method will throw a ResponseException. The ResponseException also contains a reference to the response, however in this case it's possble that the response is null (indicating that no response was received due to a connection error). Since the response object could be null, that possility is checked on line 8 before invoking any of its methods for the printing steps. A simpler alternative would be to simply print the ResponseException e, which would show the same (and more) information.

In some cases, a webserver response will redirect the UserAgent to visit another url. In the case of a sequence of redirected requests and responses, userAgent.response represents the most recent response in the chain.

Example 4: Opening JSON from a file, accessing a JNode's type and parent
{
  "title": "The Big Book of Inventions",
  "editions": [1985, 2001, 2005, 2013],
  "author": {
    "firstName": "Greg",
    "lastName": "Gilmore"
  }
}
try{
  UserAgent userAgent = new UserAgent();         
  userAgent.openJSON(new File("example4.json"));  //open JSON from a file
  
  JNode node = userAgent.json.findFirst("firstName");
  System.out.println("node name: " + node.getName());
  System.out.println("node type: " + node.getType());
  System.out.println("parent node name: " + node.getParent().getName());
  System.out.println("node as string: " + node.toString());
  System.out.println("------------");
 
  node = userAgent.json.getFirst("author");
  System.out.println("node name: " + node.getName());
  System.out.println("node type: " + node.getType());
  System.out.println("node as string:\n" + node.toString());  
  System.out.println("last name: " + node.getFirst("lastName"));  //or node.get("lastName")
  System.out.println("------------");
  
  node = userAgent.json.getFirst("editions");
  System.out.println("node name: " + node.getName());
  System.out.println("node type: " + node.getType());
  System.out.println("node as string:\n" + node.toString());   
}
catch(JauntException e){         
  System.err.println(e);
}
This example illustrates opening JSON from a local file, searching for specific JNodes of different types, and printing various attributes and properties of those JNodes.

On line 5, the findFirst method is called on the root node. It searches all descendant nodes for the first JNode named "firstName". On line 6, the node's type is printed (Type.String), and on the following line the name of the parent node is printed. On line 8 the value of the JNode is printed. Since the JNode is of Type.String, it is simply a String value when printed.

On line 12, the getFirst(String) method is called. The getFirst method differs from the findFirst method in that it searches only child nodes (rather than all descendant nodes). If a matching child were not found, a NotFound Exception would be thrown and caught in the catch block. In this case, the "author" node is found, and on the next few lines its name and type (Type.OBJECT) are reported, and it value is then printed. Since it's a JSON object, it's value is represented as such when the toString() method is called. On line 16 the getFirst(String) method is called on the "author" node to search its child nodes for the first node named "lastName". It's also possible to retrieve a child using get(String), where the String parameter is the child's name. But beware that this parameter is not a query; it does not support regular expressions or the full query syntax introduced in later examples.

On line 19, the getFirst method is used to retrieve a JNode named "editions". It's name and type (Type.ARRAY) and then printed. Since it's a JSON array, it's represented as such when the toString method is called. Other possible types for JNode are Type.NUMBER, Type.BOOLEAN, and Type.UNDEFINED (see JNode.Type).

Example 5: Opening JSON from a String, working with JSON arrays
try{ 
  UserAgent userAgent = new UserAgent();
  userAgent.openJSON("{ \"editions\": [1985, 2003, 2010, 2014] }");  //open JSON from String

  JNode editionsArray = userAgent.json.getFirst("editions");  
  System.out.println("size of array: " + editionsArray.size());
  JNode firstArrayElement = editionsArray.get(0);
  int value = firstArrayElement.toInt();
  System.out.println("first array value: " + value);  // or firstArrayElement.toString()
  
  System.out.println("all array elements:");
  for(JNode node : editionsArray){
    System.out.println("edition year: " + node);
  }  
}
catch(JauntException e){
  System.err.println(e);
}
This example illustrates working with a JNode of Type.ARRAY. On line 5, the "editions" node is retrieved, which represents a JSON array of numbers. On line 6 the size of the array is determined, and on the next line the first node within the array is retrieved. Because this JNode is of Type.NUMBER, the method toInt() can be called to access the primitive value. The following methods of JNode can also be called when the value is numeric: toDouble(), toFloat(), and toLong(). On line 12 a for-loop is used to iterate through the array and print each value.
Example 6: Searching using findEvery, search chaining.
{
  "title": "Scary Movies",
  "movies": [
    { "title": "Dawn of the Zombies" },
    { "title": "Night of the Undead" },
    { "title": "Awakening of the Zombies"}
  ]
}
try{ 
  UserAgent userAgent = new UserAgent(); 
  userAgent.openJSON(new File("example6.json"));  
  
  //find every title 
  JNode searchResults = userAgent.json.findEvery("title");
  System.out.println("Search results for every title:\n" + searchResults);
  System.out.println("number of results: " + searchResults.size());
  System.out.println("------------------");

  //find every title in the movies section
  searchResults = userAgent.json.getFirst("movies").findEvery("title");
  System.out.println("Search results for every movie title:\n" + searchResults);
  System.out.println("number of results: " + searchResults.size());
}
catch(JauntException e){
  System.err.println(e);
}
This example illustrates using findEvery(String), where the String parameter is a simple search query. On line 6, the findEvery method is invoked on the root node to search for every JNode named "title". Because the method is invoked on the root node, the entire datastructure in searched. In general, when invoked on a JNode, the search is restricted to the JNode's descendant nodes. The (four) search results are returned as children of the JNode searchResults, which is a JNode of Type.ARRAY.

Although the searchResults JNode contains results from the search, it is not considered the true parent. In other words, calling getParent() on any of the four nodes would return their parent Element in the JSON data, it would not return the searchResults node.

One benefit of the searchResults container itself being a JNode is that it can also be searched, allowing for search chaining. On line 12, for example, the root node is searched for the first movie node, and the search result is searched for "title" nodes. This search yeilds only three results. If the findEvery(String) method does not locate any matches for the query, an empty JNode container of Type.ARRAY is returned.

Example 7: Searching using findEach vs findEvery
{
  "fullname": "Luke Lowe",
  "father":  {
    "fullName": "Stan Rowe",
    "father": {  "fullName": "Mike Lowe"  },
    "mother": {  "fullName": "Janice Lowe" }
  },
  "mother": {
    "fullName": "Emily Rowe",
    "father": {  "fullName": "Greg Baker" },
    "mother": {  "fullName": "Linda Baker" }
  }
}
try{ 
  UserAgent userAgent = new UserAgent(); 
  userAgent.openJSON(new File("example7.json"));
 
  //find every node named "father" or "mother"
  JNode searchResults = userAgent.json.findEvery("father|mother");
  System.out.println("Results for every father or mother:\n" + searchResults);
  
  //find every non-nested node named "father" or "mother"
  searchResults = userAgent.json.findEach("father|mother"); 
  System.out.println("Results for each father or mother:\n" + searchResults);
}
catch(JauntException e){
  System.err.println(e);
}
The findEach(String) method searches descendant nodes for all nodes matching the specified query, but any such matching nodes are not themselves searched further. So for example searchResults (line 10) will contain all the non-nested nodes that are named "father" or "mother" (ie, it will not include "father" or "mother" nodes that exist within other "father" or "mother" nodes).

In contrast, the findEvery(String) method on line 5 returns searchResults containing every node that is named "father" or "mother" (six results). Note that as with findEvery, findEach returns a JNode of Type.ARRAY, and if no JNodes are found that match the search query, an empty JNode of Type.ARRAY is returned.

Example 8: Searching using getEach, query syntax, search method summary
{
  "name": "Return of the Flesh-Eating Zombies",
  "producer":  {
    "lastName": "Richards",
    "title": "Mr", 
    "email": null
  },
  "assistant producer": {
    "lastName": "Kosser",
    "title": "Mr",
    "email": "kosser@paramount.com"
  },
  "director": {
    "lastName": "Ryan",
    "title": "Mrs" 
  } 
}
try{ 
  UserAgent userAgent = new UserAgent(); 
  userAgent.openJSON(new File("example8.json"));  

  JNode searchResults = userAgent.json.findEvery("producer: { email: }");  //quotes around String values optional
  System.out.println("Found producers having email: " + searchResults.size() + " result(s)");
 
  searchResults = userAgent.json.findEvery("'producer': { 'title': 'Mr' }");  //apostrophes can replace quotes
  System.out.println("Found producers having title 'Mr': " + searchResults.size() + " result(s)");

  searchResults = userAgent.json.findEvery("{ title:, email: }");
  System.out.println("Found objects having both title and email: " + searchResults.size() + " result(s)");
  
  searchResults = userAgent.json.findEach("producer { title: }");
  System.out.println("Found non-nested producers having a title: " + searchResults.size() + " result(s)");
  
  searchResults = userAgent.json.getEach("{ title: }");
  System.out.println("Found child objects having a title: " + searchResults.size() + " result(s)");
}
catch(JauntException e){
  System.err.println(e);
}
This example illutrates performing search queries using the full query syntax, which is JSON-like. The query on line 5 can be read as "Search for JSON objects named 'producer' having a child node named 'email'". The number of search results (1) are printed on the subsequent line.

The query on line 8 can be read as "Search for JSON objects named 'producer' having a child node named 'title', where the child node has a String value of 'Mr'". This search yields one result.

The query on line 11 can be read as "Search for JSON objects of any name having both a child node named 'title' and a child node named 'email'". This search yields two results.

The query on line 14 can be read as "Seach for non-nested JSON objects named 'producer' having a child node named 'title', where the child node has a String value of 'Mr'". This search yields two results.

The query on line 14 can be read as "Search children for JSON objects of any name having a child node named 'title'". This search yields three results.

Although the query syntax resembles JSON, it is limited in the following ways: it is restricted to having no more than one set of curly braces, and may not contain square brackets (ie arrays). If the query does contain such extra terms, they are ignored when the query is processed.

Search Method Summary: a table of search methods
The following table summarizes the most important search methods covered in previous examples.
FirstEachEvery
get getFirst(String query) getEach(String query) -- searches children only
find findFirst(String query) findEach(String query) findEvery(String query) searches children/descendants to any depth
searches for first JNode that matches the query, returns a JNode or throws NotFound Exception searches for every matching, non-nested JNode, results are returned in a JNode of Type.ARRAY. searches for every matching JNode, results are returned in a JNode of Type.ARRAY.
A search query has the general form:
"nodeNameRegex": {"attributeName":"attributeValueRegex"}
where the curly braces section of the query can be omitted. Within the braces, multiple attributeName/attributeValueRegex pairs are comma-separated and quotes are optional or can be substituted with apostrophes. In order for the query to match a candidate JNode, all specified fields of the query must match.
nodenameRegex:
if specified, the node name is matched as a regular expression. The colon is not required if the query consists of only the nodeNameRegex.
attributeName:
if specified, the attribute name is matched as a case-sensitive string or as a regular expression if enclosed in parenthesis. If no attributeName/attributeValueRegex pairs are specified in the query, the query will match any attribute attributeName/attributeValue in a candidate object.
attributeValueRegex:
if specified, the attribute value is matched as a regular expression. If the attributeValueRegex is not present in the query, the attributeName will be matched against candidate attributeNames irrespective of their attributeValues.
Example 9: Searching with regular expressions
{
  "name": "Return of the Flesh-eating Zombies",
  "producer":  {
    "lastName": "Richards",
    "title": "Mr",
    "email": null
  },
  "assistant producer": {
    "lastName": "Kosser",
    "title": "Mr",
    "email": "kosser@paramount.com"
  },
  "director": {
    "lastName": "Ryan",
    "title": "Mrs" 
  } 
}
try{ 
  UserAgent userAgent = new UserAgent(); 
  userAgent.openJSON(new File("example9.json"));  

  //Search for every director or assistant producer object having 'email' attribute
  JNode searchResults = userAgent.json.findEvery("(director|assistant\\s*producer): { email: }");  
  System.out.println("Number of results: " + searchResults.size());
 
  //Search for every producer object having 'email' or 'Email' attribute 
  searchResults = userAgent.json.findEvery("producer: { (email|Email): }");   //quotes optional
  System.out.println("Number of results: " + searchResults.size());
  
  //Search for every object having title 'Mr' or 'Mrs'
  searchResults = userAgent.json.findEvery("{ title: Mrs? }");
  System.out.println("Number of results: " + searchResults.size() + " results");
}
catch(JauntException e){
  System.err.println(e);
}
This example ilustrates search queries that use regular expressions for matching string values. The query on line 6 can be read as "Search for objects named 'director' or 'assistant producer' (with zero or more whitespaces between the words), having a child JNode with a name matching 'email'". It should be remembered that Java-style regular expressions use two downslashes rather than a single downslash for escape sequences, hence \\s rather than \s to match a whitespace. This search yeilds one result.

The query on line 10 can be read as "Search for objects named 'producer' having a child node named 'email' or 'Email'". The search yields one result.

The query on line 14 can be read as "Search for objects of any name who's title is 'Mr' or 'Mrs'". The search yields three results.


Home | Javadocs | Web-Scraping Tutorial | JSON Querying Tutorial | FAQ | Download