Automate Migration Assessment With XML Linter
Discover how ZK Team developed an XML linter to detect potential compatibility issues in existing codebases to assist migration decision-making.
Join the DZone community and get the full member experience.
Join For FreeWhen people think of linting, the first thing that comes to mind is usually static code analysis for programming languages, but rarely for markup languages.
In this article, I would like to share how our team developed ZK Client MVVM Linter, an XML linter that automates migration assessment for our new Client MVVM feature in the upcoming ZK 10 release. The basic idea is to compile a catalog of known compatibility issues as lint rules to allow users to assess the potential issues flagged by the linter before committing to the migration.
For those unfamiliar with ZK, ZK is a Java framework for building enterprise applications; ZUL (ZK User Interface Markup Language) is its XML-based language for simplifying user interface creation. Through sharing our experience developing ZK Client MVVM Linter, we hope XML linters can find broader applications.
File Parsing
The Problem
Like other popular linters, our ZUL linter starts by parsing source code into AST (abstract syntax tree). Although Java provides several libraries for XML parsing, they lose the original line and column numbers of elements in the parsing process. As the subsequent analysis stage will need this positional information to report compatibility issues precisely, our first task is to find a way to obtain and store the original line and column numbers in AST.
How We Address This
After exploring different online sources, we found a Stack Overflow solution that leverages the event-driven property of SAX Parser to store the end position of each start tag in AST. Its key observation was that the parser invokes the startElement
method whenever it encounters the ending ‘>’ character. Therefore, the parser position returned by the locator must be equivalent to the end position of the start tag, making the startElement
method the perfect opportunity for creating new AST nodes and storing their end positions.
public static Document parse(File file) throws Exception {
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(file, new DefaultHandler() {
private Locator _locator;
private final Stack<Node> _stack = new Stack<>();
@Override
public void setDocumentLocator(Locator locator) {
_locator = locator;
_stack.push(document);
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) {
// Create a new AST node
Element element = document.createElement(qName);
for (int i = 0; i < attributes.getLength(); i++)
element.setAttribute(attributes.getQName(i), attributes.getValue(i));
// Store its end position
int lineNumber = _locator.getLineNumber(), columnNumber = _locator.getColumnNumber();
element.setUserData("position", lineNumber + ":" + columnNumber, null);
_stack.push(element);
}
@Override
public void endElement(String uri, String localName, String qName) {
Node element = _stack.pop();
_stack.peek().appendChild(element);
}
});
return document;
}
Building on the solution above, we implemented a more sophisticated parser capable of storing the position of each attribute. Our parser uses the end positions returned by the locator as reference points to reduce the task into finding attribute positions relative to the end position. Initially, we started with a simple idea of iteratively finding and removing the last occurrence of each attribute-value pair from the buffer. For example, if <elem attr1="value" attr2="value">
ends at 3:34 (line 3: column 34), our parser will perform the following steps:
Initialize buffer = <elem attr1="value" attr2="value">
Find buffer.lastIndexOf("value") = 28 → Update buffer = <elem attr1="value" attr2="
Find buffer.lastIndexOf("attr2") = 21 → Update buffer = <elem attr1="value"
Find buffer.lastIndexOf("value") = 14 → Update buffer = <elem attr1="
Find buffer.lastIndexOf("attr1") = 7 → Update buffer = <elem
From steps 3 and 6, we can conclude that attr1 and attr2 start at 3:7 and 3:21, respectively.
Then, we further improved the mechanism to handle other formatting variations, such as a single start tag across multiple lines and multiple start tags on a single line, by introducing the start index and leading space stack to store the buffer indices where new lines start and the number of leading spaces of each line. For example, if there is a start tag that starts from line 1 and ends at 3:20 (line 3: column 20):
<elem attr1="value
across 2 lines"
attr2 = "value">
Our parser will perform the following steps:
Initialize buffer = <elem attr1="value across 2 lines" attr2 = "value">
Initialize startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
Find buffer.lastIndexOf("value") = 45
Find buffer.lastIndexOf("attr2") = 36
→ lineNumber = 3, startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
→ columnNumber = 36 - startIndexes.peek() + leadingSpaces.peek() = 5
Find buffer.lastIndexOf("value across 2 lines") = 14
Find buffer.lastIndexOf("attr1") = 7
→ Update lineNumber = 1, startIndexes = [0], and leadingSpaces = [0]
→ columnNumber = 7 - startIndexes.peek() + leadingSpaces.peek() = 7
From steps 4 and 8, we can conclude that attr1 and attr2 start at 1:7 and 3:5, respectively.
As a result of the code provided below:
public void startElement(String uri, String localName, String qName, Attributes attributes) {
// initialize buffer, startIndexes, and leadingSpaces
int endLineNumber = _locator.getLineNumber(), endColNumber = _locator.getColumnNumber();
for (int i = 0; _readerLineNumber <= endLineNumber; i++, _readerLineNumber++) {
startIndexes.push(buffer.length());
if (i > 0) _readerCurrentLine = _reader.readLine();
buffer.append(' ').append((_readerLineNumber < endLineNumber ? _readerCurrentLine :
_readerCurrentLine.substring(0, endColNumber - 1)).stripLeading());
leadingSpaces.push(countLeadingSpaces(_readerCurrentLine));
}
_readerLineNumber--;
// recover attribute positions
int lineNumber = endLineNumber, columnNumber;
Element element = document.createElement(qName);
for (int i = attributes.getLength() - 1; i >= 0; i--) {
String[] words = attributes.getValue(i).split("\\s+");
for (int j = words.length - 1; j >= 0; j--)
buffer.delete(buffer.lastIndexOf(words[j]), buffer.length());
buffer.delete(buffer.lastIndexOf(attributes.getQName(i)), buffer.length());
while (buffer.length() < startIndexes.peek()) {
lineNumber--; leadingSpaces.pop(); startIndexes.pop();
}
columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek();
Attr attr = document.createAttribute(attributes.getQName(i));
attr.setUserData("position", lineNumber + ":" + columnNumber, null);
element.setAttributeNode(attr);
}
// recover element position
buffer.delete(buffer.lastIndexOf(element.getTagName()), buffer.length());
while (buffer.length() < startIndexes.peek()) {
lineNumber--; leadingSpaces.pop(); startIndexes.pop();
}
columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek();
element.setUserData("position", lineNumber + ":" + columnNumber, null);
_stack.push(element);
}
File Analysis
Now that we have a parser that converts ZUL files into ASTs, we are ready to move on to the file analysis stage. Our ZulFileVisitor
class encapsulates the AST traversal logic and delegates the responsibility of implementing specific checking mechanisms to its subclasses. This design allows lint rules to be easily created by extending the ZulFileVisitor
class and overriding the visit method for the node type the lint rule needs to inspect.
public class ZulFileVisitor {
private Stack<Element> _currentPath = new Stack<>();
protected void report(Node node, String message) {
System.err.println(node.getUserData("position") + " " + message);
}
protected void visit(Node node) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
_currentPath.push(element);
visitElement(element);
NamedNodeMap attributes = element.getAttributes();
for (int i = 0; i < attributes.getLength(); i++)
visitAttribute((Attr) attributes.item(i));
}
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
visit(children.item(i));
if (node.getNodeType() == Node.ELEMENT_NODE) _currentPath.pop();
}
protected void visitAttribute(Attr node) {}
protected void visitElement(Element node) {}
}
Conclusion
The Benefits
For simple lint rules such as "row
elements not supported," developing an XML linter may seem like an overkill when manual checks would suffice. However, as the codebase expands or the number of lint rules increases over time, the advantages of linting will quickly become noticeable compared to manual checks, which are both time-consuming and prone to human errors.
class SimpleRule extends ZulFileVisitor {
@Override
protected void visitElement(Element node) {
if ("row".equals(node.getTagName()))
report(node, "`row` not supported");
}
}
On the other hand, complicated rules involving ancestor elements are where XML linters truly shine. Consider a lint rule that only applies to elements inside certain ancestor elements, such as "row
elements not supported outside rows
elements," our linter would be able to efficiently identify the infinite number of variations that satisfy the rule, which cannot be done manually or with a simple file search.
class ComplexRule extends ZulFileVisitor {
@Override
protected void visitElement(Element node) {
if ("row".equals(node.getTagName())) {
boolean outsideRows = getCurrentPath().stream()
.noneMatch(element -> "rows".equals(element.getTagName()));
if (outsideRows) report(node, "`row` not supported outside `rows`");
}
}
}
Now It's Your Turn
Despite XML linting not being widely adopted in the software industry, we hope our ZK Client MVVM Linter, which helps us to automate migration assessment, will be able to show the benefits of XML linting or even help you to develop your own XML linter.
Opinions expressed by DZone contributors are their own.
Comments