AST - From Refactor to Lint in (J/T)S(x?) codebases

Jul 12, 2022 Edited: Mar 25, 2023

Disclaimer

This “post” is intended as a second brain style report during the exploration of the specific topic. In the first place I simply tried to resolve all of the reported problems without doing any repetitive monkey work, but after I attendend the amazing talk about AST by Michele Riva (links reported in the specific paragraph) @ JSDay 2022 in Verona, I wanted to make sure that what I saw sticked to what I already did, hoping to share all of this with anybody else

This content is the stem (revisited) upon which an internal company talk branches. (Prepared, never went live ATM)

Intro

From time to time, we find ourselves struggling with large codebases, innocent victims of the constant code stratification, missing developers guidelines and personal coding style/flavors overlapping to create a perfect nightmare for both staged and staging developers.

Some of these behaviors can be corrected over time and the fixed by both automatic (linting) and manual rules (code reviews), but what about the previous? Is there something we can do about this?

Refactor - Semantic vs Syntactic

Well, All of us thinks that “refactor” stands for “holy magic”, that requires at least... a god, a saint or a bunch of good developers. Something that needs the human touch, unthinkable to leave at the mere machine intelligence.

Even with Machine Learning assisted IA, the results are (ATM, at least) quite impressive but largely insufficient, or be generically usable.

The main problem is that when a generic snippet of code inside a comparably larger codebase gets refactored, the resulting code needs to be concurrently correct (both semantically and syntactically) - And it should also work as intended!

So, what other chances do we have? Luckily, increasingly large number of use cases can be achieved today.

The Lord of the Code - The Runtime Journey

To “make sense” to a computer, every single programming language (of mid-high-ish level) needs to be scanned, parsed and then interpreted/compiled back to runnable machine code.

For the most part, these steps are pretty much common and equivalent across all the programming languages, with some ending / starting differences, but still the same overall.

Read more: https://en.wikipedia.org/wiki/Lexical_analysis

Understanding the source code - From Source to AST

Some steps includes parsing the input source code, splitting this input in evaluable and understandable bits called “tokens”

These tokens are then arranged following their built-in meaning and their usage inside the source code, generating the first syntactical structure (please read as: “Oh, this line of code means something!”): “The Concrete Syntax Tree”

The very next thing is to jump from “This line means something!” to “This line means THIS, and it’s related to THAT and WANTS to do this other thing!”.

Every component of the Concrete Syntax Tree gets a special identity, that actually holds both the token composing it, a special identity name and type.

The AST in born.

What about AST? (semi-quote Tatoo)

AST stands for “Abstract Syntax Tree” (A Successful Trerrot)

It’s a tree (really?) data structure that holds every line of code inside and keeps every line, identifier, token of the source code typed and organized correctly.

This structure is no different from other tree structured data, and can be visualized/serialized freely in more comfortable formats (XML, JSON ecc)... or human readable ones like images and diagrams.

Read more: AST Explorer: Amazing tool for AST study and even codemod practice (and visual debugging)

Read more: AST Viewer with Typescript presets

Read more: AST Definition from ESLint Docs

Read more: Javascript VISUAL AST Visualizer!

Knowing of the way the single pieces that we write a single LoC interacts and means across the entire line and the whole source code file in which this line is written by using the AST, means a programmatic access to the meta program itself.

Some of the most curious tools, such as the ones migrating from one programming language to another, are based off AST.

Okay… so what can we use AST for, and why I have to deal with it?

Using the AST to refer to out source code, is just inspecting the choosed medium used to solve the original problem, in an explorable/editable manner.

This generic and complete representation of our source code enables us to solve various problems , not directly connected to the final produced code, but even gaining extreme granularity without ditching the code context.

You may have used AST based feature for a long time without noticing, and actually loving these functionalities. You don’t believe me, don’t you?

Well… Many IDEs (and sometimes, even text editors) do have something very useful when it comes to change the variable name for example (in VS Code is “Rename Symbol” (wow, you don’t say, uh?)). That’s a good example of AST usage inside your code editor. It would be a nightmare to implement this using string manipulation techniques, and keeping it consistent for the entire project altogether.

How cool is that? And it gets even cooler than this! Let’s see…

Refactor - Real World Problem

In the inspiring talk about AST and code refactoring by Michele Riva, there are common usage for refactor that are very complex to realize across large team and codebase impacted.

There is also an amazing focus about which way it’s better to use when it comes to apply the refactor change. We’ll talk a bit about this later on.

Read more: Refactoring Large Javascript Codebases - Michele Riva

But Is there a closer example to our experience?

One of my company project is a React Application structured as a monorepo (atypic). It’s basically, it’s a huge PoC that made to production.

Throughout the entire formation process, the various macro-parts of our app started to grow inside our codebase. Then, these parts became complete enough to be ready to be moved inside different packages or modules inside the monorepo. (So proud to watch our disastrous child)

Over time (and lines of code) our import started to look a bit like this:

import { getElementCoordinates } from '../../../../../utils';

What’s wrong with this code? Well, cosmetic reason apart, it becomes challenging to understand not only what the file ../../../../../utils actually contains, but even how getElementCoordinates fits inside the current source file.

Let me explain this better: Imagine to have adopted in your problem a common Controller pattern or maybe a Plugin pattern to better manage both your code and your program flow. In these (and other) patterns is pretty common to have exported symbols that actually are interface bounded to be exactly NAMED ALL IN THE SAME WAY.

Using Typescript in nearly any project, a better path structure can be obtained using the paths option inside tsconfig.json file for the project (root/main one). Easy, just set the wanted paths for a given folder structure and you’re set! But who should update ALL the import inside a project made during years of work?

This is also a machine learning / AI empowerable problem but can be an imperative script made with a codemod tool, therefore using AST!

A “simple” script should do that, caching particular behaviors when ran.

And that’s done! An entire job for multiple developers that should’ve taken weeks of work, done in minutes by a single dev/codeowner.

Real World Problem #2 - Making our source code grow strong and healthy

Enforcing standards in code and rules it’s a quite complex matter. Especially, when you are one and the others are… well… more than one and pretty aggressive about defending their habits too!

For the mentality… well, this is not covered here 😀

What about enforcing standard inside the code? How can I help developers to feel “home” even inside an unknown project? And how can a developer respect those rules consistently in every project without leaving banana peels inside their commits?

May AST help us to solve this?

Yes, It definitely can!

In fact, the main used tools leveraging the power of AST are linter tools, such as ESLint. This obscure piece of software is now not so obscure as we know that uses AST, and writing custom rules for this linter now can be done intuitively.

Let’s see a special use case.

Referring again back that React project, we have a particular problem that happens only in some legacy platforms when using a specific import for a specific library (framer-motion)

The library is reality a custom fork of the original framer-motion that adds some legacy features such as CSS prefix aware transform properties. Both our fork and the original are packed the same way, aiming for a modern runtime to execute it, and therefore exported in a modern and fancy module style when “released”. Since this library mimics every HTMLElement to be exported from the main index file, it uses both a special construct + js proxy feature to provide an on-demand generated framer component in the form of the requested html element (es. motion.div where motion is a proxy)

This works fine in dev mode, and runs good on modern platforms… but crashes with confused error stack on older ones.

Luckily there is a workaround (that actually is how the library originally worked in older versions): we can manually create a special motion element using createMotionElement utility provided by the lib.

Simply, instead of doing this:

import motion from '@my-corp-scope/framer-motion';

We should do

import motion from 'my-path/motionComponents';

Very cool! But how can i make this an error with my linters?

Well… A few LoCs are all it takes:

/**
 * @fileoverview Rule to disallow motion import from package, enforcing direct import
 * @author Daniele Lubrano
 */

'use strict';

//------------------------------------------------------------------------------
// Rule Definition
//------------------------------------------------------------------------------

/** @type {import('eslint').Rule.RuleModule} */
module.exports = {
	meta: {
		type: 'problem',

		docs: {
			description: 'Disallow motion import from package, enforcing direct import',
			category: 'Possible Errors',
			recommended: true,
		},
		fixable: 'code',
	},
	create: function (context) {
		return {
			ImportDeclaration(node) {
				const importFromValue = node.source.value;
				const specifier = node.specifiers.find((spc) => {
					if (spc.type == 'ImportSpecifier' && spc.imported.name == 'motion') {
						return true;
					}
				});
				if (importFromValue == '@my-corp-scope/framer-motion' && specifier) {
					context.report({
						node: specifier,
						message: 'Do not import "motion" directly from "@my-corp-scope/framer-motion"',
					});
				}
			},
		};
	},
};

As you can see, the AST implementation plainly visible as ImportDeclaration callback runs when a walker of eslint (using the visitor pattern) trips onto an … well… ImportDeclaration 😄

Other Tools using AST

Madge:

Enables to track the dependency tree inside a JS/TS project, even outputting an image/svg of the files involved. It also implements a circular dependency option, which is very useful!

Read more: https://github.com/pahen/madge

const madge = require('madge');

madge('./src/index.tsx', { tsConfig: './tsconfig.json' }).then((res) => {
	console.log(res.circularGraph());
	// Needs graphviz installed
	return res.image('circularImage.svg', true);
});

~LBRDan