Skip to content

🧩 Parsing .re files ​


After getting the feel of Re4 right - just writing count++ without needing useState, .value, or setCount - I had to answer a bigger question:

How do I actually parse this thing?

I didn’t want wrappers like signal() or $state() - I wanted syntax like:

re
component Counter {
  state count = 0
  computed doubled = count * 2
}

Something the compiler could track easily and we could write naturally.

Which meant I needed to parse .re files with new keywords (state, component, computed, etc.) while still fully supporting JavaScript, TypeScript, and JSX.


First Attempt: Loose Tokenizing ​

I started off lazy. I didn’t write a full parser - just a loose tokenizer. It scanned for important keywords like state, component, etc. using an enum:

typescript
export enum TokenKind {
  Name,
  Component,
  State,
  Computed,
  Eq,
  LCurly,
  RCurly,
  JsKeyword,
  Eos,
  Unknown,
}

Whenever I saw one of my keywords, I paused and let a real JS parser take over:

re
component Counter {
  state count = 1
  prop name = "counter"
  return <div>{name} count: {1}</div>
}

When I hit state, I’d parse it manually:

typescript
function parseState(): Stmt[] {
  const stateToken = expect(TokenKind.State);

  const jsCode = eatJs();

  const src = `let ${jsCode}`;

  const program = parse(src, { jsx: true });

  const [decl, ...rest] = program.body;

  if (!isVariableDeclaration(decl)) raiseError();

  return [decl, ...rest];
}

The eatJs() function would just consume everything until the next keyword or end of block:

ts
// Rough idea: consume JS tokens until we hit another re4 keyword
function eatJs() {
  let start = curToken.span.end;
  let last;
  while (isInComponentBlockScope() && !isRe4Keyword(token)) {
    last = nextToken();
  }
  const end = lastToken.end;
  return source.slice(start, end);
}

It worked! For a while.


JSX Broke Everything ​

Then JSX showed up. This broke everything:

tsx
<div state="1">hello</div>

My tokenizer saw state and thought it was a keyword. But it was just an attribute.

Worse:

ts
<div>That's bad</div>

The ' was treated as a JS string start β€” but it was just JSX text.

Even:

tsx
<component></component>

Would trigger my component block parser, even though it was just a tag.

I kept patching:

  • Marking some tokens as Unknown
  • Skipping others
  • Trying to guess if I was inside JSX or JS

But it was clear: Loose tokenizing wasn’t going to scale.


Why I Thought It’d Be Easy ​

Honestly, it felt like it should be. component {} looks like try {} or function {} - I figured I could treat it like a block and move on.

But we need something more context-aware.

For example:

tsx
<div class="flex">hello</div>

Even though class is a keyword, here it is just an attribute name. Totally valid. Similar to await, which is only a keyword in async functions.

Loose parsing couldn’t tell when keywords were actually keywords.

I wasn’t building a tokenizer anymore. I was faking a parser. So I stopped.


Exploring Real Parsers ​

I looked into:

  • Oxc: Fast, great TS+JSX support, Rust-based
  • SWC: Similar, Rust-based
  • Babel: Heavy
  • Acorn: Tiny, readable, plugin-friendly

I loved Oxc, but I didn’t want to maintain a Rust fork just to parse a few keywords. So I circled back to Acorn.


🧠 Enter Acorn ​

Acorn is small, simple, and has a TypeScript plugin.

So my checklist:

  • JS βœ…
  • TS βœ…
  • JSX βœ…

Now I just needed to support my syntax. So I wrote a plugin.

It added support for:

  • component, state, prop, computed
  • effect, mount, unmount

πŸ”Œ Acorn Plugins ​

An Acorn plugin is just a function that returns a class extending the base parser:

typescript
type Plugin = () => typeof acorn.Parser;

function re4Plugin() {
  return class extends acorn.Parser {
    // add logic here
  };
}

You can chain multiple plugins like this:

ts
const MyParser = Parser.extend(tsPlugin, jsxPlugin, re4Plugin);
const ast = MyParser.parse('code', {
  /* options */
});

Plugin Composition Pain ​

Acorn plugins are chained like:

ts
Parser.extend(tsPlugin, re4Plugin);

But internally they override methods like parseStatement(). Meaning only the last plugin wins.

Also, acorn-typescript overrides readWord() to detect TS keywords. If I override it for Re4, I lose TS support.

So I forked acorn-typescript. And added a hook system:

ts
class TsParser extends Parser {
  readWordHooks: ((word: string) => TokenType | undefined)[] = [];


readWord() {
  const word = this.readWord1();

  // test with hooks
  for (const hook of this.readWordHooks) {
      const type = hook(word);
      if (type) {
        return this.finishToken(type, word);
      }
  }

  // .. original code
  let type = tt.name;

  if (this.keywords.test(word)) {
    type = jsTokens[word];
  } else if (new RegExp(tsKeywordsRegex).test(word)) {
    type = tsTokens[word];
  }

  return this.finishToken(type, word);
}

Now I can inject my keywords without breaking TS:

ts
class Re4Parser extends Parser {
  constructor(...args: any[]) {
    super(...args);
    this.readWordHooks.push(readWordHook);
  }
}

function readWordHook(word: string) {
  if (re4Keywords.has(word)) return re4KeywordTokenTypes[word];
  return undefined;
}

Done. Acorn recognizes our tokens πŸŽ‰


Parsing Component Blocks ​

The heart of Re4 is the component block.

So I override parseStatement() to support it only at the top level:

typescript
if (isTopLevel() && isComponentKeyword()) {
  return this._parseComponent(this.startNode());
}

Now if it tries to parse statements inside a component block, it will parse state, prop, computed, mount, unmount, effect, etc.:

typescript
function parseStatement(...args) {
  if (isInComponentRootLevel()) {
    // same impl for prop and computed
    if (isContextual(token, re4Tokens.state)) {
      const node = this.startNode() satisfies Re4VariableDeclaration;
      node.reKind = 'state';
      return this.parseVarStatement(node, 'const'); // rest will be handled by Acorn
    }

    // Handle lifecycle blocks
    if (isLifeCycleBlockToken(token)) {
      return this._parseLifecycleBlock(this.startNode());
    }
  }

  return super.parseStatement(...args); // allow Acorn to handle the rest
}

Parse Component ​

typescript
function parseComponent(node: ComponentStatement) {
  this.next(); // consume 'component'
  node.id = this.parseIdent();

  // allow return keyword inside component blocks
  this.enterScope(AcornScopes.SCOPE_FUNCTION);

  this.context.push(componentContext);

  // Parse the component body
  node.body = this.parseBlock() as Re4BlockStatement;

  // Pop component context after parsing
  this.context.pop();
  this.exitScope();
  return this.finishNode(node, 'ComponentDeclaration');
}

Parse Lifecycle Blocks ​

For effect, mount, and unmount, I use the same trick:

typescript
function parseLifecycleBlock() {
  const node = this.startNode();
  this.next();
  node.kind = getLifeCycleNodeKind(token);
  node.body = this.parseBlock();
  return this.finishNode(node, 'LifecycleBlock');
}

Again, Acorn handles everything - I just route the keywords to the right behavior.

Also overrode:

typescript
// allow export component {}
shouldParseExportStatement() {
  return this.type === re4KwTokenTypes.component || super.shouldParseExportStatement();
}
// allow export default component {}
parseExportDefaultDeclaration() {
  if (this.type === re4KwTokenTypes.component) {
    return this._parseComponent(this.startNode());
  }
  return super.parseExportDefaultDeclaration();
}

And that was it. Fully working parser πŸŽ‰


Why Acorn Worked ​

  • No manual tokenization
  • Full control over scopes, blocks, and keywords
  • Clean AST for compilation
  • TS + JSX work thanks to acorn-typescript
  • No guessing, no edge cases, no hacks

TL;DR: ​

I built a parser for Re4 using Acorn with JSX + TS support. Tried a loose parsing strategy, failed with edge cases, then forked acorn-typescript and built a plugin to parse Re4 syntax.


πŸš€ Up Next ​

Parsing was step one. Next: the compiler Where count++ becomes a tracked signal. Where DOM updates happen without boilerplate.

No .value. No setCount. No boilerplate. Just JavaScript β€” supercharged.

If you're into: Compilers Framework internals UI reactivity experiments Feel free to follow along. I'll be posting updates as things evolve.

β€” Aadi (Follow On X)