Using LLMs To Explore Project 2025
Click here to explore Project 2025 by its recommendations and topics
I added this app to test using opencode against the ad hoc framework used to create this site. This site is a less-than-straightforward combination of surplus.js with a custom React renderer (typed-html) run through tsconfig’s reactFactory config. Plus a lot of custom scaffolding to combine them into one. Since Project 2025 was on my reading list, but I really didn’t want to read the thing, I thought I could write an app to analyze/explore it instead. Predictably, I spent far more time on the tool than it would have taken to read the actual thing, but oh well. Still had fun.
Difficulties with surplus.js and writing valid code
I really liked surplus.js when I found it, but the author got bored with it or got busy with life. It wasn’t actively maintained and never attracted an audience like other libraries. It’s similar to React the way Solid is similar: they both have JSX, but really different lifecycle and tracking mechanics. The first pass from opencode was kind of rough. Like 80–90% working. General stuff was fine, but the built-in React assumptions messed other stuff up. ref tended to generate messed up code as Surplus’s ref is far less featureful. It got a little better after throwing some warnings about this in a skill, but the results are still less than ideal.
Consistent styling with the rest of website
The first version missed a lot of styling details: themes, button treatments, background choices. I’ve started adding some information to skills, but it really needs a lot more spelled out before we can get things that match the rest of the site.
Challenges of analyzing the Project 2025 document
My first attempt at analyzing the Project 2025 document went poorly. The model defaulted to a high-level overview and missed a lot of topics and recommendations. I split the document into 10-page chunks and asked it to focus on each chunk. That helped it to perform a more thorough analysis, but really increased usage: I was surprised that these chunks plus a directive to think hard meant 4–5 chunks could blow through the Claude hourly limit. I ended up needing to scrap the first analysis because it was inconsistent and incomplete due to the usage limits breaking at inappropriate times. Dialing down “ultra think” helped a bit on the next run.
Lot of work needed to cleanup
It was pretty impressive to see what came up from the first analysis. Lots of interesting recommendations and topics. There were various issues. We didn’t quite figure out pages so there are lots of off-by-one errors here. Probably can be fixed, but a little worried it isn’t consistent.
The topics had a problem with an invented category (meta) to capture things like title and footnotes. Easy to remove, but the recommendations ended up worse as it began forming multiple recommendations per paragraph due to page splitting. Fortunately, this was resolved by pointing out the problem and asking the LLM to compare recommendations with the same paragraph and number. This is where the agent style shines. It uses some python code to programmatically extract content so it can present it as a single body to the LLM. This significantly improves the success rate on tasks like this.
Final attempt
I think the second attempt was still a failure. There was a lot of problems mapping page and paragraphs correctly. To resolve, we should directly add paragraph markers via <a> tags before analyzing. I made one more go at this.
| Metric | Before | After |
|---|---|---|
| Total Recommendations | 3002 | 2134 |
| Total topics | 2115 | 2103 |
| Total Topic Tags | 95 | 44 |
Struggles to perform deep analysis
With a document like the goal was to really do a deep inspection of everything. It was really challenging as I found I the LLM was always trying to summarize and generalize and was prone to skipping work. One of the first things I did was chunk up the main document into 10 page section to force it to analyze specific sections at a time.
There were many times where it wanted to reach for programmatic analysis techniques, and I had to force it to consider raw analysis of the content so I could get the analysis from the LLM. I found this was a much better way to detect topics and tagging. I ended up with lots of duplication, but since the LLM loves to generalize, I let it do so as a final cleanup step.Pillar II is a personnel database
LLM-generated posts
I find the writing of LLMs is still awful. It basically picks up on every worst habit of tech writing which has been abysmal for years. The space is saturated with resume-padding fluff. I think that’s seriously impaired the writing quality of these LLMs. I tried to let it draft an initial pass at this post, but I threw it all away in the end.
It loves to namedrop words like “performance” without explaining the actual constraints or what was done to address them.
Technically, this was an interesting challenge. The full document is quite large, so I had to think carefully about how to structure the data and components for good performance. I ended up using:
The search implementation was probably the trickiest part.
Search performance was actually broken and had to be removed.
The split-pane resizing was also fun to implement — had to handle mouse events carefully to make the drag interaction feel smooth while also updating the layout state reactively.
This is another good example of useless fluff. \<feature\> was fun. “Had to handle mouse events carefully”… The browsers took over most of the drag-related performance after years of developers getting it wrong. “Feel smooth” is empty prose, too. Is it higher fps, steadier fps, less layout thrash? And why does “careful” handling make it smoother, exactly? Feels like your typical junior/intern writing a Medium article.
What did go well
It’s been a while since I worked on major features on this site. It’s been pretty hard to adapt back to the mind model when it’s pretty different from the day job. So, opencode was a helpful way to get back into this site after so long. I was forgetting many of the exact places to look for things, so watching the LLM search was helpful. And after it was done I could use the session to kick-start a skill to improve quality on future requests.
It feels like there’s a lot less friction around adding/improving features. I want to remove the scss dependency and the reliance on live rendering via chrome. I’m well on the way to that goal now. I was able to add a sx prop to dynamically generate CSS on the statically typed HTML pages. A couple more tweaks and that dynamic CSS will be inlined into the generated HTML pages.
I was also able to add new statically built pages, like these gallery pages for static content folders that have a lot of images (example: roguelike refs gallery).