Contents

Overview

The web 2.0 trend has shifted users' expectations: web applications have to be as interactive as typical desktop applications. Old refresh-style webmails have been wiped out by GMail, Windows Live Mail or Yahoo! Mail.

In this setting, JavaScript is becoming a crucial web development language and is getting attention of the research community. Recently, Siek and Taha used JavaScript as a mean to introduce the concept of Gradual Typing. Drossopoulou et al. are proposing techniques for doing type inferencing in JavaScript. JavaScript is a widely used language which is used for complex AJAX-driven applications, and starting to be used as a server-side scripting language (Zimki).

Our goal is to create a JavaScript compiler and use it as an analysis tool. Static JavaScript analysis is an interesting topic, given the quirks of the language and the lack of open source analysis tools.

As a proof of concept, the first analysis we will implement is the identification of browser incompatibilities, using as much static information as possible. This analysis would help cut back the annoyances of cross-browser development, especially as JavaScript code bases continue to increase in size.

The compiler's design is meant to be simple, clean and modular allowing other people to extend it easily.

Browser Incompatibilities

The JavaScript language has been implemented by various organizations in different browser versions. Early in the language's existence, no standard was set forth and implementors pionneered APIs. Typically, the same semantic interaction (such as 'get the width of the displayed page') are accessible in different ways. Programmers use control-flow structures to dynamically select which value to access depending on the environment in which the code is ran. These techniques use a common feature of JavaScript that returns a special 'undefined' value when accessing non-existing properties. In comparison, using unsafe Java reflection one can call a non-existing property (or method) but the runtime throws a NoSuchFieldException (respectively NoSuchMethodException).

For instance, the inner width viewport property is supported differently on Internet Explorer (IE) version 5, IE6, Mozilla (Firefox), and Safari. Here is typical browser independent function to get the inner width:

function iw() {
  if (self.innerWidth) {
    return self.innerWidth;
  } else if (document.documentElement &&
           document.documentElement.clientWidth) {
    return document.documentElement.clientWidth;
  } else {
    return document.body.clientWidth;
  }
}

Here, the first if statement detects all Internet Explorer versions (except IE6) as these are the only one to define the self.innerWidth property. The second if detects IE6 in strict mode and finally the last if handles all other browsers (Mozilla, Safari, Opera, ...).

Other browser detection techniques are used such as arithmetic combination of partial results, use of exceptions to alter the control-flow or context sensitive dynamic script loading. Our tool is not designed to detect or handle these.

Browser Compatibility Verification

Compiler Structure

Rhino
Enlarge
Rhino

Our compiler is structured as a standard optimizing compiler. Parsing is delegated to Rhino which produces a very raw abstract-syntax tree, difficult to analyse and to reason about.

This raw AST is thus transformed into a clean AST resembling the concrete syntax of a typical functional programming language. For efficiency, this clean AST is never created in memory but we instead use pattern matching to give an abstract view of this AST.

With the help of this representation, a control-flow graph per function is created. The control-flow graph has one instruction per basic block to ease the implementation of optimizations.

A dataflow analyis framework has been written and constant folding and dead-code elimination have been implemented. These two optimizations greatly reduce the size of the CFG due to the simple generation phase we use and expose additional expressions for analysis.

In addition to this interprocedural analysis framework, we have implemented a call graph factory and a tailored intraprocedural analysis.

Raw AST
Enlarge
Raw AST
Clean AST
Enlarge
Clean AST
CFG
Enlarge
CFG
CFG with analyses
Enlarge
CFG with analyses

Analyses

In this section, we detail all the analyses used for the browser compatibility verification.

Selections

A "selection" is a field or method request in JavaScript, for example:

  • self.screenX
  • document.documentElement["clientWidth"]

Selections are the basic unit in our browser compatibility matrices.

Selection Folding (intraprocedural, forward data-flow analysis)

This analysis folds selections to make them apparent at any point in the program. It is comparable to constant folding.

  • domain = set of selection expressions. (selection ::= expression | expression '.' selection)
  • meet = s1\vee s2 = s1\quad\textrm{if}\quad s1 = s2\quad\textrm{otherwise}\quad\bot
  • transfer function = propagate selection operation, discard the rest

Program-Point Context (intraprocedural, forward data-flow analysis)

We start by computing the set of browsers supported by every program point.

  • domain = set of browsers (IE, FF, Safari)
  • meet = \cup
  • transfer function of node n =
    • use of a selection s
      • get output(n)=input(n)\cap compatibility(s)
      • false path of a branch output(n)=\mathcal{U}\setminus compatibility(s)
    • otherwise output(n) = input(n)

CFG (or Function) Context (intraprocedural, backward data-flow analysis)

Using program-point context information (context(n)) we backward-propagate the information and make a summary for the control-flow graph. The result of this analysis is thus only interesting at the input of entry node of the CFG.

  • domain = set of browsers (IE, FF, Safari)
  • meet = \cup
  • transfer function of node n = output(n)=input(n)\cap context(n)

Global Context (interprocedural)

The final analysis considers the whole program and finalises the results gathered by the previous CFG context analysis. If a function foo compatible on \left\{A,B,C\right\} calls another function bar compatible on \left\{C\right\} only then foo cannot be compatible with \left\{A,B\right\}.

Example: the program consists of four functions m1 through m4. After the intraprocedural CFG context analysis, the function m1 is reported compatible with {A,B,C}, m2 compatible with {A,B,C}, m3 compatible with {B} and finally m4 compatible with {A,B}.

image:proj_jsc_globalanalysis.png

Since m4 calls m3, it is as compatible as m3 thus only compatible with {B}. This result propagates to m1 since it is the caller of m4.

The implementation of this analysis is rather elegant. We model the call graph as a CFG and reuse the previous analysis on this model. By backpopulating the results, we have an interprocedural analysis. Of course, building the call graph is the tricky part and we discuss this point later.

Examples

To test our browser incompatibility analysis, we used Google codesearch to find real-world JavaScript code. We discovered two interesting examples by searching for browser viewport fields that were included in our tool's compatibility matrices.

Our tool is run from the main project directory as follows:

browser.sh MyCode.js

The output is a report detailing the browser compatibility of each control-flow graph in MyCode.js for the 3 browsers our tools supports: Firefox (Mozilla), Internet Explorer 6, and Safari. Note that the tool does not distinguish between various browser modes (for example, IE6 can run in 'strict' mode or 'quirks' mode, as explained on quirksmode). In addition, whenever a browser incompatibility is detected during the forward analysis, the relevant line number and problematic browser field is reported.

open-xchange nmail_spell_main.htm

We ran our tool on nmail_spell_main.htm, from the open-xchange project.

The output was the following:

openPersonalDict: line 290 not compatible with [IE6] ('screenX')
openPersonalDict: compatible with [Safari, Mozilla]

The problematic code is the function openPersonalDict:

function openPersonalDict() {
  newWordSet = false;
  var w = 320, h = 300;
  var x = self.screenX + (self.outerWidth - w) / 2;
  var y = self.screenY + (self.outerHeight - h) / 2;
  parent.pdictWin = window.open(...);
  parent.pdictWin.focus();
}
[...]
</script>
[...]
<td>
  <input type="button" id="editButton" value="[spell_edit]"
         class="button-style-1" style="width:120px;"
         onClick="openPersonalDict();">
</td>

The use of self.screenX is incompatible with Internet Explorer 6, thus the control-flow graph for openPersonalDict is only compatible with Firefox and Safari, as reported by the tool. In fact, one can see from the HTML later in the page that this function is called without any particular browser checking, so this code would not work properly on IE6.

ipplan layersmenu-library.js

We then ran our tool on layersmenu-libary.js from the IPplan project.

The output was the following:

getWindowWidth: line 189 not compatible with [IE6, Mozilla] ('clientWidth')
getWindowWidth: compatible with [Safari, IE6, Mozilla]

getWindowXOffset: line 207 not compatible with [Safari] ('scrollLeft')
getWindowXOffset: line 210 not compatible with [IE6] ('scrollLeft')
getWindowXOffset: compatible with [Safari, IE6, Mozilla]

getWindowYOffset: line 240 not compatible with [Safari] ('scrollTop')
getWindowYOffset: line 243 not compatible with [IE6] ('scrollTop')
getWindowYOffset: compatible with [Safari, IE6, Mozilla]

getWindowHeight: line 222 not compatible with [IE6, Mozilla] ('clientHeight')
getWindowHeight: compatible with [Safari, IE6, Mozilla]

Some of the problematic code is the function getWindowHeight:

function getWindowHeight() {
  var value = 0;
  if ((DOM && !IE) || NS4 || Konqueror || Opera) {
    value = window.innerHeight;
  } else { // IE
    if (document.documentElement && document.documentElement.clientHeight) {
      value = document.documentElement.clientHeight;
    } else if (document.body) {
      value = document.body.clientHeight;
    }
  }
  if (isNaN(value)) {
    value = window.innerHeight;
  }
  return (value);
}

The use of document.documentElement.clientHeight is the problem here, as it is not compatible with IE6 or Firefox. The whole CFG for getWindowHeight is reported as fully-compatible, due to the union operation performed on if statements during our backwards analysis. (This isn't always correct, as discussed in bugs). The line number report is in fact a false positive, as the developer gated the code with various browser checks performed elsewhere (stored in the variables IE, Konquerer, ...), and made careful use of the 'undefined' value returned when a field isn't compatible. These false positives represent flaws in our analysis tool.

AST
Enlarge
AST
Zoom on the AST
Enlarge
Zoom on the AST
CFG
Enlarge
CFG
Zoom on the CFG
Enlarge
Zoom on the CFG

Results

We successfully built a JavaScript compiler and static analysis tool to perform the following tasks:

  • Use Rhino's parser to generate an abstract-syntax tree.
  • Process the AST into a cleaner form for verification and analysis.
  • Generate control-flow graphs from the AST.
  • Perform constant folding and dead-code elimination to simplify each CFG.
  • Encode the browser viewport incompatibilities from quirksmode.
  • Use forward and backward data analysis to determine browser incompatibilities.
  • Report the incompatibilities to the user.

We applied the tool to several examples. The tool worked as we expected, but due to a shortcoming in our analysis reported false positives.

Given the time constraints of the project, we were very satisfied with the result. Even though the browser incompatibility detection wasn't perfect, we did produce a clean, general purpose static analysis framework for JavaScript. This framework could easily be extended to other forms of analysis, and was a great learning experience.

Bugs

There were some bugs discovered as the project deadline approached, and we didn't have time to fix them:

On certain code examples, the tool gets stuck in an infinite loop while generating the control-flow graphs.

The union operation performed during our backwards analysis turned out to be insufficient. When the expression of the if statement yields an incompatibility, we propagate the compatible browsers to the true branch, and the incompatible browsers to the false branch. This handles code of the form:

if (self.screenX)
  print "This code should execute on Firefox and Safari";
else
  print "This code should execute on IE6.";

This was the case we considered while developing the analysis, but it became clear that the union operation wasn't always correct. Consider the following:

if (/*some complex expression*/)
  x = self.screenX;  // incompatible with IE6
else
  x = 5;

Assume all browsers are compatible going into the if statement. The true branch will eliminate IE6 compatibility. However, the union in the backwards analysis will merge the two branches, and report that the outer context is fully-compatible. This may be appropriate if the developer had carefully gated the browsers in the if expression, but otherwise the incompatibility should be reported. Proper evaluation of the expression would be required to determine the correct merge operation in all cases, as discussed in future work. Without such evaluation, however, we should be conservative with our merging operation and perform an intersection so the incompatibility is reported.

Future Work

  • interpreter We have written a simple interpreter to unit-test behaviorally our compiler's optimizations. Currently, this interpreter is rather limited. We wish to add support for native functions such as print allowing an interaction between Java and JavaScript.
  • call graph Computing the exact call graph for a functional language is an undecidable problem and we must therefore do approximations. Currently, we build a call graph based on the syntax. This is an under-approximation which means that our tool reports fewer errors than it could. In practice this is not an issue as functional programming is not widely used by JavaScript programmers.
  • global context analysis Currently, the call-site context is discarded. If the programmer guarded a call to a partially-compatible function our tool would report a false positive. Again, this problem is interesting but we do not believe it is crucial in practice.

References

  • [1] Jeremy G. Siek and Walid Taha, Gradual Typing for Objects. ECOOP'07: 21st European Conference on Object-Oriented Programming.
  • [2] Christopher Anderson, Paola Giannini, and Sophia Drossopoulou, Towards Type Inference for JavaScript. ECOOP'05: 19th European Conference on Object-Oriented Programming.
  • [3] Peter-Paul Koch, Viewport properties. quirksmode.org.
  • [4] Rhino: JavaScript for Java, mozilla.org.
  • [5] Graphviz - Graph Visualization Software, AT&T.

Source

The Java project can be downloaded from http://www.stanford.edu/~joshwise/jsc.zip.

Project Log

Milestone 1 - Parsing

  • Parse JavaScript using Rhino

Milestone 2 - CFG

  • Generate Control-Flow Graph

Milestone 3 - Interpreter

  • Build simple interpreter for CFG to help verify existing code.
    • Basic instruction interpeter

Milestone 4 - Interprocedural Analysis

  • Build a data-flow analysis framework for CFG

Milestone 5 - Optimizations

  • Constant folding
  • Dead code elimination

Milestone 6 - Browser Compatibility Analysis

  • Build intraprocedural browser analysis
    • Build per-browser compatibility structures, filled with all compatibility matrices from quirksmode
    • Build forward data-flow analysis to determine browser support at each program-point in CFG
    • Build backward data-flow analysis to determine browser support per CFG
    • Compute the call graph and determine the transitive browser compatibilty

Milestone 7 - Run in the wild

  • Run the tool on various sample code and determine its usefulness
Last modified June 12, 2007 5:18 pm / Skin by Kevin Hughes
MediaWiki