30 January 2026

Using LLMs To Explore Project 2025

Click here to explore Project 2025 by its recommendations and topics

How good are modern LLMs on half-baked frameworks?

I decided on this app as a way to test opencode against the ad hoc framework I put together to create this site. This site is a less-than-straightforward combination of surplus.js with a react-dom replacement (typed-html) I run via tsconfig’s reactFactory config. There’s also a lot of custom scaffolding to allow for static pages with dynamic content, like the Sheep Charts or Tetris pages. Since Project 2025 was on my reading list, but I really didn’t want to read the thing, I thought I could write an app to analyze/explore it instead. Predictably, I spent far more time on the tool than it would have taken to read the actual thing, but oh well. Fun was had.

Difficulties with surplus.js

I really liked surplus.js when I found it, but the author got bored with it or got busy with life. It wasn’t actively maintained and never attracted an audience like other libraries. It’s similar to React the way Solid is similar: they both have JSX, but really different lifecycle and tracking mechanics. The first pass from opencode was kind of rough. Like 80–90% working. General stuff was fine, but the built-in React assumptions messed other stuff up. ref tended to generate broken code because Surplus’s ref is far less featureful. It got a little better after throwing some warnings about this in a skill, but the results are still less than ideal.

Matching site styles

The first draft missed a lot of styling details: themes, button treatments, background choices. I’ve started adding some information to skills, but it really needs a lot more spelled out before we can get things that match the rest of the site.

Example of a filter I wanted to reimplement that we utterly failed to recreate. The original had all the labels inline:

And the new one with labels and counts outside the input:

Using LLMs to analyze Project 2025

The original document is a PDF, which was unfortunate. I converted it to markdown, but this came with a few disadvantages:

loss of heading information
broken paragraphs spanning two pages
broken images and figures

The broken figures definitely sucked, but the conversion was good enough for analysis, so I proceeded to feed it to an LLM.

First pass

My goal was to try and pull out recommendations and a topic list from the doc for easier referencing. The first attempt was bad. The model defaulted to a high-level overview rather than doing a deep dive of the document. The document is pretty big and models know that they can burn context quickly by loading the entirety of large docs, so they are tuned to parsing small snippets.

Second pass

Tossed the first results out and split the document into chunks. Now we were ingesting the document properly, but this led to other errors. I didn’t really plan for references, and an ad hoc system of grabbing pages and paragraphs wasn’t very accurate given the chunking. It was also surprisingly easy to blow up my usage budget during this. I could never quite figure it out, but it just seemed like the model loved to output tokens here and I hit the usage limits multiple times.

We got a decent list of recommendations and topics from this, though we did end up grabbing some metadata like chapter titles and broken figures that needed to be filtered out. But the biggest problem was the half-working references, which made the analysis kind of useless since we couldn’t properly link to the actual text.

Third pass

Before making a third attempt, I modified the chunking to manually create <a> tags with built-in IDs for pages and paragraphs. That created a more stable reference so I could properly reference each topic or recommend.

And I tried to write this post using LLMs

I also gave the LLM an attempt at writing this post. Wasn’t expecting much as I find the writing from LLMs awful. It basically picks up on every worst habit of current tech writing, which is saturated by resume-padding fluff. As I expected, I hated the result and threw it all away in the end.

The LLM loves to name-drop words like “performance” without explaining the actual constraints or what was done to address them.

Technically, this was an interesting challenge. The full document is quite large, so I had to think carefully about how to structure the data and components for good performance. I ended up using:
The search implementation was probably the trickiest part.

Search performance was actually broken and had to be removed.

The split-pane resizing was also fun to implement — had to handle mouse events carefully to make the drag interaction feel smooth while also updating the layout state reactively.

This is another good example of useless fluff. “<feature> was fun”. “Had to handle mouse events carefully” … There’s little left to be careful about. Browsers took over most of the drag-related performance because developers were getting it wrong for years. “Feel smooth” is empty calories. Does that mean higher fps, steadier fps, less layout thrash? And why does “careful” handling make it smoother, exactly? Feels like your typical junior/intern writing a Medium article.

How do I feel about using agents here?

It’s been a while since I worked on major features on this site. It’s been pretty hard to adapt back to the mental model needed for this site when it’s pretty different from the day job. I was forgetting many of the exact places to look for things, so watching the LLM do searches was helpful. And after it was done I could use the session to kick-start a skill to improve quality on future work.

It feels like there’s a lot less friction around adding/improving features now. For example, I want to remove the scss dependency and the reliance on live rendering via chrome. I’m well on the way to that goal now. I was able to add a sx prop to the static typescript files, which dynamically generates CSS while rendering. A couple more tweaks and that dynamic CSS will be inlined into the generated HTML pages.

I was also able to add new features like statically-built gallery pages. I have many places where I’ve been collecting reference pictures, and a gallery is a nice way to view them (example: roguelike refs gallery).