Day 4–5 — From Scripts to a Working System

By the end of Day 3, I had something that technically worked. I could pull a work item from Azure DevOps, construct a prompt, send it to OpenClaw, and get a response back. On paper, that sounds like a complete loop. In reality, it was fragile and inconsistent.

Day 4 and Day 5 were focused on one thing: making the system actually usable.

The biggest challenge at this stage was communication with OpenClaw. There was no clean API to integrate with, so I was relying on automating interactions through the control UI. This introduced a new class of problems that had nothing to do with logic and everything to do with timing and reliability.

Simple actions like clicking buttons or capturing responses were not guaranteed to work every time. There were delays, elements not being ready, and occasional timeouts. It became clear very quickly that even if the overall architecture was sound, the system would fail without proper handling of these edge cases.

This led to the introduction of basic stabilization strategies. I started thinking in terms of retries, wait conditions, and ensuring that each step completed successfully before moving to the next. Instead of assuming success, the system now had to verify it.

At the same time, I focused on closing the loop with Azure DevOps. It was not enough to just generate a response from OpenClaw. That response needed to be persisted back into the work item in a meaningful way. This meant posting comments that captured the AI’s reasoning, approach, and proposed changes.

This was a turning point. For the first time, the system was not just running in isolation. It was interacting with a real workflow and leaving behind artifacts that could be reviewed and iterated on.

One of the more interesting realizations during this phase was how different this felt from traditional scripting. Writing the scripts themselves was not the difficult part. The challenge was designing a sequence of steps that could reliably execute in an unpredictable environment.

It also became clear that I was working with a system that had multiple points of failure. The Azure DevOps query could fail. The prompt construction could be incomplete. The UI automation could break. The response could fail to capture. Each of these needed to be accounted for.

Despite all of this, by the end of Day 5, I had something that felt significantly more real. I could trigger the process, watch it pick up a work item, send it through OpenClaw, and see the output appear back in Azure DevOps. It was not perfect, but it was consistent enough to start building on.

This phase also reinforced an important shift in mindset. I was no longer thinking in terms of individual scripts. I was thinking in terms of systems, dependencies, and failure handling. The focus had moved from “can this work” to “how do I make this reliable.”

The next step is to move beyond basic functionality and start evaluating the quality of the output. Now that the system can execute, the question becomes: how well does it actually perform as a developer?