Executive Summary
A high-traffic learning platform running Symfony 6.0 / PHP 8.1 faced a convergence of end-of-support deadlines, pinned dependencies, and tight coupling with no room for downtime or disruption. The team rejected a full rewrite and instead pursued incremental modernization: PHP first, Symfony second, with AI applied only to template conversion. An early attempt at broad AI-assisted migration failed, revealing that unconstrained generation produces structurally valid but semantically broken output.
The result: 60% reduction in template migration effort, zero downtime, and delivery in 3–4 weeks against an 8–10 week manual estimate. Read on to know more about the entire process.
The Problem: Stable in Production. Calcifying Underneath.
A high-traffic learning platform running Symfony 6.0 / PHP 8.1 faced a convergence of end-of-support deadlines, pinned dependencies, and tight coupling with no room for downtime or disruption. The team rejected a full rewrite and instead pursued incremental modernization: PHP first, Symfony second, with AI applied only to template conversion. An early attempt at broad AI-assisted migration failed, revealing that unconstrained generation produces structurally valid but semantically broken output. The solution was a structured, rules-first prompting approach, versioned like production code, paired with a three-stage validation pipeline. Business logic was migrated manually throughout.
The most dangerous legacy systems aren’t the ones that break. They’re the ones that work, reliably, quietly, every day, while slowly becoming impossible to change. This platform was exactly that.
A high-traffic learning platform. Thousands of active students. A Symfony 6.0 / PHP 8.1 stack that had never had a serious incident and was, by every ops metric, healthy. But the engineering team told a different story in sprint retros: releases were getting longer, not from feature complexity, but from navigation overhead. The architecture had stopped accommodating change.
The symptoms were specific and compounding. When a PDF generation library issued a security patch requiring PHP 8.2 minimum, we spent three days evaluating workarounds just to stay running on 8.1. When a storage library needed upgrading to support a new S3 endpoint format, compatibility ceilings forced us to pin the old version. When a billing integration changed its webhook payload structure, tight coupling between the session handler and form processing meant a change that should have touched two files touched eleven.
These weren’t catastrophic failures. They were friction taxes – small costs paid repeatedly, on every release cycle, that were eroding both team confidence and velocity. You can’t measure that in uptime. You feel it in how long engineers pause before touching something that shouldn’t be complicated.
The Trigger: End-of-Support Turned a Manageable Problem Into a Deadline
Symfony 6.0 reached end-of-support. On its own, that’s an engineering concern. Combined with the specific state of this system, it became a business risk with a deadline attached.
Three things converged simultaneously. First, moving to Symfony 7.3 required PHP 8.4, but the platform was on PHP 8.1. This wasn’t two independent upgrades – it was a coupled migration with a fixed sequencing constraint: PHP first, Symfony second, with integration points tested at each step. Second, two critical dependencies – the PDF certificate generator and the S3 file handler – had already issued updates requiring PHP 8.2+. We were pinning both to older versions just to stay running. Third, the billing integration was planning an API version change that required Symfony 7.x-compatible event handling.
The window for a controlled upgrade, on our own terms, was closing.
For a platform where a student might be mid-exam, mid-certificate-generation, or mid-payment at any given moment, disruption isn’t just a technical failure. It’s a business failure, with support tickets, refund requests, and reputation damage attached. The pressure to move fast had to be held in direct tension with the requirement not to break anything a user would ever notice.
The Decision: Why We Rejected the Rewrite and the Alternative We Seriously Considered
The instinct to rewrite is always tempting. We considered it. We also seriously considered a strangler fig approach, wrapping the legacy system incrementally with new services, before rejecting that too.
Why the full rewrite was rejected
The platform’s billing, session, PDF, and S3 integrations had no spec documentation beyond the code itself and the engineers’ institutional memory. Rebuilding them accurately would have taken at least 3–4 months, with a high probability of regression. Critically, the end-of-support security risk would have persisted throughout the rewrite period, the exact problem we were trying to eliminate.
Why the strangler fig/microservices extraction was deferred
The session handler was the clearest candidate for extraction. It was a structural bottleneck and touched too many core flows. But extraction would have required redesigning authentication flows, rebuilding inter-service communication, and refactoring file handling contracts, an estimated 3–4 months of additional scope on top of the framework upgrade. We deferred this to a post-upgrade architectural phase, where it now has a clean seam to operate from. The sequencing principle: remove the urgent risk first, improve the architecture second. Attempting both simultaneously is how modernization projects lose scope and confidence.
What we chose instead
- Incremental modernization.
- Upgrade within the existing architecture, preserving all business logic.
- PHP and Symfony upgraded in sequenced layers, each independently testable.
- AI applied only to template conversion – the highest-volume, lowest-risk surface.
- Business logic migrated manually with test-driven validation.
The Failure: Our First AI Attempt Made More Work, Not Less
We didn’t start with a working AI pipeline. We started with an obvious approach that failed, and that failure was instructive enough to reframe the entire project.
The initial approach: provide large sections of the legacy codebase to the model with a broad instruction – “convert these PHP templates to Twig, following Symfony conventions.” The outputs were plausible-looking yet consistently wrong, requiring significant time to diagnose.
Four distinct failures appeared in a single template – each one silent, each one breaking something different.
- Variables were dropped without warning,
- Security filters were removed without replacement,
- Method calls were misread,
- Required parameters simply disappeared.
None of these triggered any errors. All of them caused runtime failures, some immediately visible, others only surfacing with real user data in production
The linter caught zero of these. They were structurally valid Twig that was semantically broken. Manually correcting a template like this took longer than rewriting it from scratch, because you first had to identify everything missing, then verify nothing else had been silently changed.
The failure wasn’t due to model capability. It was the absence of constraints. The model had no knowledge of our variable aliasing conventions, filter requirements, component parameter contracts, or include-path structure. Without that context, it produced confident, structurally valid, functionally broken output. That’s the specific failure mode of unconstrained AI generation on a real codebase.
What Actually Worked: Structure Was the Unlock
After the failed attempt, we ran a controlled proof-of-concept on a single module – twelve templates, a similar structure, with clear input/output mappings we could validate manually. The goal wasn’t to convert templates. It was to find the minimum constraint set that made outputs reliable enough to trust.
The difference between the first attempt and the second was entirely in how the prompt was structured:
Two things made this work where the first attempt that failed.
- First, every legacy construct had an explicit mapping rule – the model had no ambiguity to fill with a hallucination.
- Second, the few-shot example established the exact output format for the most common pattern, anchoring all subsequent transformations.
Templates that previously required 20+ minutes of manual correction now required only 90-second reviews.
Note: Treat Prompts Like Production Code
We version-controlled the prompts in the same repository as the code, with commit messages explaining each rule addition. When a new edge case appeared, a template using a filter we hadn’t mapped, we updated the prompt, added a test fixture for that pattern, and re-ran the affected templates. By the end of the project, the prompt had gone through eleven revisions. Each revision was traceable to a specific class of output failure it was designed to prevent.
If you treat prompts as throwaway inputs, you get throwaway outputs. Versioned, tested, and iterated against real fixtures, they become reliable engineering assets.
The Validation Pipeline: Three Stages Before a Human Touched It
The prompt gave us consistent output. The validation pipeline gave us confidence in that consistency. These are not the same thing. Consistent doesn’t mean correct; it means predictably wrong in diagnosable ways. The pipeline was designed to detect specific failure modes in AI output at each stage.
Stage 1: Syntax Validation and Linting
Automated Twig linting caught unclosed blocks, undefined variables (checked against a known variable registry), and malformed filter syntax. This ran post-generation immediately, before any human review. It automatically caught approximately 90% of errors.
Stage 2: Automated Refinement Loop
Detected errors were fed back into refinement prompts with specific error context and line numbers. Most templates were resolved in one refinement cycle. Templates requiring more than two cycles were flagged for manual escalation. The refinement loop was not open-ended — we set a hard ceiling of two cycles before human escalation, specifically to avoid burning time on templates the model was genuinely struggling with.
Stage 3: Integration Testing
Corrected output was validated against production-like data in a parallel staging environment. This is not the same as synthetic test data. We used real user data snapshots because enrollment-state, permission-state, and progress-state edge cases are almost invisible in synthetic fixtures and appear immediately with real data. 95% of templates matched original system behavior exactly.
The distribution of what reached manual review: roughly 70% of templates cleared the pipeline without escalation, 25% required one refinement cycle, and about 5%, primarily templates with inline component logic or non-standard filter chains, were routed directly to a developer.
Manual review time per template dropped from approximately 8 minutes to under 2 minutes because developers no longer triaged structural errors. They were reviewing logic, the only work that actually requires a human.
What the pipeline didn’t catch:The 5% behavioral difference cases.
These were almost entirely in templates with conditional rendering based on user enrollment state or certificate generation status, paths where real user data produced rendering flows that staged data didn’t cover. This is why production-like data snapshots in staging aren’t optional.
The Specific Conversions That Drove 60% of the Effort Reduction
The bulk of the conversion effort was concentrated in a small number of recurring patterns. Once these were mapped, rule-encoded, and a few-shot exemplified in the prompt, the AI handled them reliably at scale.
Six pattern classes covered the majority of the workload. Variable context patterns includePartial(path, vars) converting to Twig’s {% include %} with explicit context passing, and inline echo $var rendering flattening into {{ variable }} required a path mapping table and getter chain rules to prevent silent variable drops. Component includes via include_component() became {{ render(controller(…)) }}, where the critical constraint was zero silent parameter drops. Control flow blocks, if and foreach, mapped cleanly to their Twig equivalents, but method-call conditions inside them required explicit pre-mapping before the prompt could handle them reliably. Authentication context via $sf_user required a dedicated rule ensuring app.user flowed into every downstream include, not just the top-level template.
The security-relevant filter mappings, particularly escape_for_js() were the highest-risk conversion class. An AI that silently drops an output filter produces a template that renders correctly in staging with controlled data and creates an XSS vector in production with user-controlled input. These were given explicit rules and a dedicated linting check for filter preservation. We did not rely on the model remembering this from the system prompt alone. For anything with a security implication, verify mechanically, don’t trust the prompt.
Where We Drew the Line: What AI Didn’t Touch
The clearest engineering decision in this project was defining what AI should not touch. Templates were safe because correctness can be verified mechanically; the output either renders correctly against known data, or it doesn’t. Business logic is different.
The platform’s assessment engine, billing event handlers, and certificate generation logic contained rules encoded over years of edge cases, support tickets, and compliance requirements. Many had no documentation beyond the code itself and the engineers’ institutional memory. Allowing AI to transform these would have produced structurally modern code that was semantically wrong in ways that might not surface until a student’s grade was miscalculated or a certificate was issued for an incomplete course.
We applied test-driven manual migration to all business logic: write integration tests against the legacy behavior first, migrate manually, and verify that tests pass. This is slower than AI-assisted conversion. It is the correct tradeoff when the failure mode of being wrong is a compliance or revenue event.
Where we were probably too conservative: Test case scaffolding.
AI is reasonably capable of generating plausible test fixtures from method signatures and class contracts — not as a substitute for understanding business rules, but as a starting point that a developer then reviews and amends. We wrote all test fixtures manually. In retrospect, using AI to scaffold fixture structures, with engineers writing the behavioral assertions, would have been a reasonable scope expansion. We’d apply that in the next engagement.
The PHP → Symfony Sequencing Wasn’t Arbitrary
Symfony 7.3 requires PHP 8.2 minimum, and we were targeting 8.4. This created a forced constraint: PHP upgrade first, Symfony upgrade second. Running both simultaneously would have made it impossible to isolate whether a failure was a PHP compatibility issue or a Symfony breaking change.
In practice: PHP 8.1 → 8.4, with deprecation warnings surfaced and resolved; full regression testing on staging; then Symfony 6.0 → 7.3 with a targeted breaking-change audit.
The Symfony 7 breaking changes that required the most targeted work: the removal of `AbstractController::getDoctrine()` shortcut (replaced with direct injection); changes to nullable request payloads in form handling that affected several enrollment forms; and the session persistence integration with HttpFoundation. The last one was the most impactful because it was entangled with the legacy session handler we’d deferred for so long – upgrading Symfony forced the refactoring we’d been avoiding, which was ultimately the right outcome.
Outcomes
| Component | Before | After |
|---|---|---|
| Symfony | 6.0 | 7.3 |
| PHP | 8.1 | 8.4 |
| Templating | PHP Templates | Twig |
| Template Helpers | Ad-hoc Functions | Reusable Twig Layer |
- 60% reduction in template migration effort vs. manual conversion baseline
- 75% fewer manual edits post-automation, with validation gates in place
- Zero downtime, zero data loss, zero user-facing disruption
- 3–4 weeks delivery, against an 8–10 week manual estimate
- End-of-support risk eliminated, security posture restored, S3/PDF/billing integrations realigned
Engineering Takeaways
1. Start with a failure budget, not a success target.
Define your acceptable failure rate before deploying AI on any conversion task. Ours was: if more than 20% of templates in a batch require significant manual correction, stop and tighten the constraints before continuing.
2. Build the example library before writing the prompt.
Few-shot examples were the single most impactful prompt component. If we ran this again, we’d spend the first day building a curated library of ten to fifteen conversion examples – the tricky ones, not the easy ones – before writing a single transformation prompt.
3. Treat prompts like production code.
Version-controlled, commit-messaged, and iterated against test fixtures. Our prompt went through eleven revisions. Each was traceable to a specific class of output failure it was designed to prevent.
4. Security-critical transformations need independent mechanical verification.
Output-escaping filters cannot rely on the model remembering to preserve them. They need a dedicated linting check. Don’t trust system prompt instructions for anything with a security implication; verify it separately.
5. Measure AI value in developer-hours freed, not tasks completed.
The right metric is how much human attention was reallocated from mechanical work to judgment work. Our engineers spent the migration reviewing business logic and edge-case templates, work that actually required them. That’s the correct outcome.
6. Use production-like data in staging, not synthetic fixtures.
5% of our templates produced subtle behavioral differences that only appeared with real user data. Synthetic test data systematically misses enrollment-state and permission-state edge cases. If your staging environment doesn’t have production-like data, you’ll discover these in production instead.
Engineering Q&A
Because the correctness of business logic can’t be verified mechanically, the way template rendering can. A template either renders correctly or it doesn’t. You can tell by looking at it with known data. Business logic can pass every test you wrote before you understood the edge cases, and still be wrong in production for a specific user state. The failure mode of AI getting this wrong is a grade error or a billing error, and both are compliance events. We used test-driven manual migration and accepted the slower timeline in exchange for verifiable confidence.
The team had years of experience with Symfony. The existing S3, billing, and PDF integrations were written against Symfony’s service container and event dispatcher. Switching frameworks would have required rebuilding all of these from scratch, retraining the team, and extending the timeline by months, all while the end-of-support risk persisted. Symfony 7.3 was the fastest path to eliminating the security risk while preserving engineering continuity.
We maintained parallel environments throughout: the modernized stack and legacy stack were connected to the same database in read-only mode during staging validation. Rollback was a routing change, not a deployment. All database migrations were written as reversible from the start. In production, we ran the modernized stack with feature-flag gating on the new session handling and template engine until we had 24 hours of clean production data. The full rollback window was under 5 minutes.
Three areas required targeted work: removal of `AbstractController::getDoctrine()` shortcut; changes to nullable request payloads in form handling; and the session persistence changes in HttpFoundation. The security voter API changes were minimal in our case. The HttpFoundation session changes were the most impactful because they were entangled with the legacy session-coupled flows we had long deferred, and in hindsight, the upgrade forcing that refactoring was the right outcome.
Yes, and we were probably too conservative here. AI can generate reasonable test fixture structures from method signatures and class contracts. We’d use it in the next engagement to scaffold fixture libraries, with engineers writing the behavioral assertions. The combination of AI for structure and humans for edge-case logic is the right division of labor.
No. AI handled repetitive, mechanically verifiable work. Developers handled business logic, integration validation, edge-case review, and every decision with compliance or security implications. The goal was not to replace engineering judgment but to stop wasting it on work that doesn’t require it.