π§© Parsing .re
files β
After getting the feel of Re4 right - just writing count++
without needing useState
, .value
, or setCount
- I had to answer a bigger question:
How do I actually parse this thing?
I didnβt want wrappers like signal()
or $state()
- I wanted syntax like:
component Counter {
state count = 0
computed doubled = count * 2
}
Something the compiler could track easily and we could write naturally.
Which meant I needed to parse .re
files with new keywords (state
, component
, computed
, etc.) while still fully supporting JavaScript, TypeScript, and JSX.
First Attempt: Loose Tokenizing β
I started off lazy. I didnβt write a full parser - just a loose tokenizer. It scanned for important keywords like state
, component
, etc. using an enum:
export enum TokenKind {
Name,
Component,
State,
Computed,
Eq,
LCurly,
RCurly,
JsKeyword,
Eos,
Unknown,
}
Whenever I saw one of my keywords, I paused and let a real JS parser take over:
component Counter {
state count = 1
prop name = "counter"
return <div>{name} count: {1}</div>
}
When I hit state
, Iβd parse it manually:
function parseState(): Stmt[] {
const stateToken = expect(TokenKind.State);
const jsCode = eatJs();
const src = `let ${jsCode}`;
const program = parse(src, { jsx: true });
const [decl, ...rest] = program.body;
if (!isVariableDeclaration(decl)) raiseError();
return [decl, ...rest];
}
The eatJs()
function would just consume everything until the next keyword or end of block:
// Rough idea: consume JS tokens until we hit another re4 keyword
function eatJs() {
let start = curToken.span.end;
let last;
while (isInComponentBlockScope() && !isRe4Keyword(token)) {
last = nextToken();
}
const end = lastToken.end;
return source.slice(start, end);
}
It worked! For a while.
JSX Broke Everything β
Then JSX showed up. This broke everything:
<div state="1">hello</div>
My tokenizer saw state
and thought it was a keyword. But it was just an attribute.
Worse:
<div>That's bad</div>
The '
was treated as a JS string start β but it was just JSX text.
Even:
<component></component>
Would trigger my component
block parser, even though it was just a tag.
I kept patching:
- Marking some tokens as
Unknown
- Skipping others
- Trying to guess if I was inside JSX or JS
But it was clear: Loose tokenizing wasnβt going to scale.
Why I Thought Itβd Be Easy β
Honestly, it felt like it should be. component {}
looks like try {}
or function {}
- I figured I could treat it like a block and move on.
But we need something more context-aware.
For example:
<div class="flex">hello</div>
Even though class
is a keyword, here it is just an attribute name. Totally valid. Similar to await
, which is only a keyword in async
functions.
Loose parsing couldnβt tell when keywords were actually keywords.
I wasnβt building a tokenizer anymore. I was faking a parser. So I stopped.
Exploring Real Parsers β
I looked into:
- Oxc: Fast, great TS+JSX support, Rust-based
- SWC: Similar, Rust-based
- Babel: Heavy
- Acorn: Tiny, readable, plugin-friendly
I loved Oxc, but I didnβt want to maintain a Rust fork just to parse a few keywords. So I circled back to Acorn.
π§ Enter Acorn β
Acorn is small, simple, and has a TypeScript plugin.
So my checklist:
- JS β
- TS β
- JSX β
Now I just needed to support my syntax. So I wrote a plugin.
It added support for:
component
,state
,prop
,computed
effect
,mount
,unmount
π Acorn Plugins β
An Acorn plugin is just a function that returns a class extending the base parser:
type Plugin = () => typeof acorn.Parser;
function re4Plugin() {
return class extends acorn.Parser {
// add logic here
};
}
You can chain multiple plugins like this:
const MyParser = Parser.extend(tsPlugin, jsxPlugin, re4Plugin);
const ast = MyParser.parse('code', {
/* options */
});
Plugin Composition Pain β
Acorn plugins are chained like:
Parser.extend(tsPlugin, re4Plugin);
But internally they override methods like parseStatement()
. Meaning only the last plugin wins.
Also, acorn-typescript
overrides readWord()
to detect TS keywords. If I override it for Re4, I lose TS support.
So I forked acorn-typescript
. And added a hook system:
class TsParser extends Parser {
readWordHooks: ((word: string) => TokenType | undefined)[] = [];
readWord() {
const word = this.readWord1();
// test with hooks
for (const hook of this.readWordHooks) {
const type = hook(word);
if (type) {
return this.finishToken(type, word);
}
}
// .. original code
let type = tt.name;
if (this.keywords.test(word)) {
type = jsTokens[word];
} else if (new RegExp(tsKeywordsRegex).test(word)) {
type = tsTokens[word];
}
return this.finishToken(type, word);
}
Now I can inject my keywords without breaking TS:
class Re4Parser extends Parser {
constructor(...args: any[]) {
super(...args);
this.readWordHooks.push(readWordHook);
}
}
function readWordHook(word: string) {
if (re4Keywords.has(word)) return re4KeywordTokenTypes[word];
return undefined;
}
Done. Acorn recognizes our tokens π
Parsing Component Blocks β
The heart of Re4 is the component
block.
So I override parseStatement()
to support it only at the top level:
if (isTopLevel() && isComponentKeyword()) {
return this._parseComponent(this.startNode());
}
Now if it tries to parse statements inside a component block, it will parse state
, prop
, computed
, mount
, unmount
, effect
, etc.:
function parseStatement(...args) {
if (isInComponentRootLevel()) {
// same impl for prop and computed
if (isContextual(token, re4Tokens.state)) {
const node = this.startNode() satisfies Re4VariableDeclaration;
node.reKind = 'state';
return this.parseVarStatement(node, 'const'); // rest will be handled by Acorn
}
// Handle lifecycle blocks
if (isLifeCycleBlockToken(token)) {
return this._parseLifecycleBlock(this.startNode());
}
}
return super.parseStatement(...args); // allow Acorn to handle the rest
}
Parse Component β
function parseComponent(node: ComponentStatement) {
this.next(); // consume 'component'
node.id = this.parseIdent();
// allow return keyword inside component blocks
this.enterScope(AcornScopes.SCOPE_FUNCTION);
this.context.push(componentContext);
// Parse the component body
node.body = this.parseBlock() as Re4BlockStatement;
// Pop component context after parsing
this.context.pop();
this.exitScope();
return this.finishNode(node, 'ComponentDeclaration');
}
Parse Lifecycle Blocks β
For effect
, mount
, and unmount
, I use the same trick:
function parseLifecycleBlock() {
const node = this.startNode();
this.next();
node.kind = getLifeCycleNodeKind(token);
node.body = this.parseBlock();
return this.finishNode(node, 'LifecycleBlock');
}
Again, Acorn handles everything - I just route the keywords to the right behavior.
Also overrode:
// allow export component {}
shouldParseExportStatement() {
return this.type === re4KwTokenTypes.component || super.shouldParseExportStatement();
}
// allow export default component {}
parseExportDefaultDeclaration() {
if (this.type === re4KwTokenTypes.component) {
return this._parseComponent(this.startNode());
}
return super.parseExportDefaultDeclaration();
}
And that was it. Fully working parser π
Why Acorn Worked β
- No manual tokenization
- Full control over scopes, blocks, and keywords
- Clean AST for compilation
- TS + JSX work thanks to
acorn-typescript
- No guessing, no edge cases, no hacks
TL;DR: β
I built a parser for Re4 using Acorn with JSX + TS support. Tried a loose parsing strategy, failed with edge cases, then forked acorn-typescript
and built a plugin to parse Re4 syntax.
π Up Next β
Parsing was step one. Next: the compiler Where count++
becomes a tracked signal. Where DOM updates happen without boilerplate.
No .value
. No setCount
. No boilerplate. Just JavaScript β supercharged.
If you're into: Compilers Framework internals UI reactivity experiments Feel free to follow along. I'll be posting updates as things evolve.
β Aadi (Follow On X)