Notes on vibe coding 5

How I fixed a problem with my AI-generated code

Notes on vibe coding 5
Sand Castle (Daniel Ciucci / Unsplash)

I'm still refining my blog migration code, that is completely written by AI. In this vibe coding experiment, I refrain from meddling with the AI-generated code.

In the previous update (link), the initial script hit several snags, and I covered the URL matching mystery.

The other big problem I had to contend with concerned the file type of scraped images. Recall from Notes 1 that the Typepad export process does not save images, and so I asked GPT5 to produce a script that reads the Typepad export file, flags every image tag, catalogs and labels them, and then "clicks" on every link to download the images.

I was highly satisfied with the result. I had avoided a lot of tedious work, such as figuring out all the different ways Typepad may store images.

I was not that surprised to learn about edge cases that neither I nor the AI coder anticipated. In this case, Typepad sometimes serves up HTML pages with embedded image links instead of the image files themselves. I'm not sure why Typepad sometimes uses this method. It would have been transparent to readers because the images load automatically.

However, in such cases, the image-downloading script has saved an HTML file, instead of an image file. I discovered this when some of the migrated posts displayed broken image links. Ghost does not expect to find HTML files in an image block, and can't handle them.


In modern coding, we expect to be able to "roll back" changes. In this case, I was hoping to undo the upload of images that included those useless HTML files. This then clears the way to redo the upload, after cleaning up the HTML files.

That would be too easy! Ghost does not provide a method to delete files. According to GPT, if a file is uploaded twice, the first file is still present while the second file is renamed "file-2".

Because the corrected images have different suffixes from the HTML files, I worked around this restriction. I could leave the HTML files stranded in the server, without ever using them.


I had yet to find the real image files associated with those HTML files.

A little foresight proved crucial at this stage of vibe coding. In my first ask, I wanted and received a spreadsheet that documents key information such as post index and title, image index and title, and so on. From this spreadsheet, it's quick to find all the posts that presented HTML files. The AI coder then implemented a number of ways to grab the underlying images, ridding the HTML wrapping.

After retrieving these images, I packaged them up and uploaded them to Ghost.

As mentioned above, I'd much prefer to undo and redo the upload. This ensures that the number of image files on the Ghost server is exactly the total number of images in my blog posts. Doing what I just did, the server contains more files than the expected number, because there is a subset that has duplicates, with one HTML file and one image file.

If this were the only patch, the impact would have been light. The risk is in accumulating patches as more issues are discovered. The server becomes more and more bloated with "dead" files, which I'm not allowed to remove.


Meanwhile, codes and scripts are also piling up. All of the above steps were accomplished using AI-generated code.

The same principle of hygiene applies. A cleaner process would involve just one master script to which I add handlers for edge cases. My hands are tied because Ghost does not let me overwrite anything. They treat a post the same as an image. If I upload another post of the same name, the first one stays put while the second one is given a new name. Instead of regenerating everything, I end up repairing bits and parts.

My spreadsheet summarizing all the posts and images has been a life-saver throughout this process. At any time, it gives me a snapshot of everything.

But a serious flaw would soon emerge. Stay tuned.