Claude Code and other AI tools are changing how organizations approach mainframe modernization. They promise faster analysis, automatic code generation, and less manual work.
Two main approaches are emerging:
Modernization can seem simple: analyze, translate, and deploy. In large enterprise systems, it is more complex. Small differences in logic, data handling, or transactions can cause errors. Because of this, testing and validation often take most of the time and cost.
We conducted a technical evaluation of Claude Code for COBOL modernization. The purpose was not to judge surface-level Java quality, but to test whether the generated code was complete, runnable, and suitable for enterprise use.
Our testing was done in two stages:
We then repeated the NIST test after removing comments, renaming variables and paragraph names, and removing obvious identifiers. After these changes, the results were very different. Important sections of code were missing, and none of the programs ran successfully.
This is a key finding. It shows that benchmark results do not reflect real-world COBOL systems. Success on known test sets does not guarantee correct or complete translation of real applications.
For large organizations, this changes the economics of the project. AI-based translation may appear low-cost at the start. However, if the output is incomplete, the real cost moves into manual review, debugging, code completion, and testing.
In large banking, insurance, or government systems, this can become a major cost and delivery risk. The key question is not whether code can be generated, but whether it is complete and stable enough for controlled testing and migration.
Any client considering AI-led COBOL modernization should run its own structured test. This should not rely only on benchmark programs. It should include real application code and representative system patterns.
At minimum, a client evaluation should include:
The evaluation should measure completeness, ability to compile and run, and the amount of manual correction required before the system can be tested in parallel with the legacy system.
AI tools such as Claude Code generate code based on patterns, not fixed rules. This works well for analysis and small changes where developers can review the results.
In large COBOL systems used by banks, insurance companies, and government, the requirements are stricter. Systems must behave exactly the same as before. This is usually proven by running the new system in parallel with the old one and comparing results.
If the generated code changes based on prompts or model updates, testing becomes harder. More time is needed to find and fix issues. This increases cost and project risk.
Rule-based approaches avoid this problem by producing consistent results. This makes testing, audit, and approval much simpler.
For large enterprise systems, this difference can have a major impact on cost and delivery.
The following comparison focuses on the criteria that typically determine enterprise mainframe modernization programs: repeatability, auditability, scale, and governance alignment.
| Enterprise criterion | AI-based approach (Claude Code) | Deterministic approach (SoftwareMining) |
|---|---|---|
| Cost and testing effort | Fast initial generation, but high effort for testing, debugging, and completing missing logic. | Predictable output reduces rework and shortens testing and validation cycles. |
| Governance | Additional controls required to manage variability and ensure correctness. | Stable output supports audit, change control, and approval processes. |
| Repeatability | Results may change based on prompts, naming, or model updates. | Same input produces the same output every time. |
| Completeness | May miss statements or generate incomplete logic, requiring manual fixes. | Full program structure is translated using defined rules. |
| Business logic accuracy | Depends on test coverage and manual validation to confirm correctness. | Preserves control flow, numeric precision, and transaction behavior. |
| Enterprise scale | Suitable for small or modular changes where manual review is manageable. | Designed for large, multi-million line COBOL systems. |
| Project risk | Risk increases due to hidden defects and incomplete translations. | Controlled process with predictable outcomes. |
| Transformation method | AI-generated code based on probability. | Rule-based translation with defined behavior. |
Agentic AI tools such as Claude Code represent a meaningful advance in engineering productivity. They can accelerate analysis, refactoring, and documentation across large codebases.
In regulated COBOL environments, modernization is ultimately judged on repeatability, auditability, and provable equivalence under real workloads. Deterministic transformation supports controlled validation, structured parallel run, and governed change management.
The question for enterprise leaders is not whether AI can generate modern code, but whether the modernization approach delivers predictable outcomes at scale.