Automated Conversion – How it can work

By Robert Brown
Posted on July 20, 2018

This paper provides a brief overview of some of the issues when transitioning out of legacy and under-performing software applications and why a new automated approach can revolutionise what has otherwise been an expensive, difficult and risky process.

I’ve seen business big and small struggle, sometimes for years, to try to transition to their next technology of choice. From small businesses whose very survival is at risk, to the public failures of major Government organisations, software transition can be one of the most complex, trying and painful experiences, but it doesn’t have to be that way.

My aim here is to help you avoid painting yourself into a corner with your next software acquisition and to help you escape if you’ve already found yourself trapped in.

The problem

We’ve probably all experienced to some level the frustration of limiting software, like:

an old system which is difficult to use,
an off-the-shelf system which doesn’t offer modern features (mobility, engagement),
systems which don’t properly integrate – leaving an “air gap” of work-arounds and manual/semi manual handling processes,
existing system or technology is no longer supported,
extortionate pricing for ongoing support.

Or maybe the business might just have moved on and the old system doesn’t do what you need anymore, so you begin to accumulate a bunch of spreadsheets, access databases and manual workarounds.

While you can go and buy a better off-the-shelf package tomorrow, you have a few immediate problems:

migrating valuable data (the value store you’ve built over years),
reintegrating the new system with your other existing systems, and
retraining your staff and modifying your business processes to take advantage of new features and capabilities.

The other option is to try to convert the legacy system and data to new technology, or draw a line, forget the value of your data and move on.

The challenge

Code conversion

There have been a few attempts over the years at automated conversion from legacy systems. But there are conceptual problems with automated conversion which haven’t really been properly dealt with until now.

For a start, while it’s possible (though difficult) to convert code from one technology to another, it’s almost impossible to make it good maintainable code. Conceptual differences and differences in approach from one technology to the next are too complex to resolve, let alone addressing readability and maintainability.

Secondly, this approach, though it might improve consistency and syntax, is unlikely to be able to improve conceptual errors of the original design, or to take advantage of “new concepts” in a deep way (like mobility, location, privacy, engagement).
Thirdly, the variance in older technology is vast – even if you could write a perfect Visual Basic 6->.NET conversion, VB6 is just one of a host of technologies out there.

Finally, any automated conversion no matter how smart, can’t be readily verified unless the original source code is available, visible and human-readable. If the original developers have created their own controls or been “creative”, this approach is in real trouble. For many systems we’ve transitioned, the source code was not accessible and there was little if any technical documentation.

The result is that there’s little point attempting to convert the existing code automatically – especially because modern low-code approaches mean that most of that code is not even required. It’s wasted effort.

In our experience, most application code is plumping to make things work – only about 20%-25% of the code represents the core intellectual property, and even then, most of it probably should be re-written by someone who knows what they’re doing. We’ve seen a lot of off-the-shelf solutions with code and design issues – after all, only a small proportion of software developers are star performers.

Data conversion

Data however, is a much more tractable problem.

The industry worked out early in the software era that data had to be consistently formed and comply (generally at least) with some broad “open” standards.

In fact, most systems built in the last 20 years have data stores that comply with generally accepted standards like ODBC, and if not, they generally allow export to CSV or some other accessible format.

Unfortunately, while there are standards, data conversion/migration isn’t that simple.
Poorly designed legacy software will often have resulted in data inconsistencies, like orphaned records, which represent either a loss of business value, or an additional technical challenge going forward.

Poor data modelling, and in some cases ill-informed design trends of past eras, need to be resolved to allow you to take best advantage of new technology and approaches and to improve performance.

Without getting too technical, here’s a couple of examples (skip over these if you’re not interested).

1. Composite and business keys

In the 80s, 90s and even noughties it was common for developers to uniquely define each record in terms of business data – like names or codes – and when these weren’t sufficiently unique, to define composite keys – by combining several database table columns.

For example, it’s obvious that a person’s name is not sufficiently unique as the key to their records. Making a composite key out of their name and their phone number could work, until they either change their phone number or name. You can try using other supposedly unique identifiers, like email or social security number, but in the end, you’re making an assumption that these won’t change.

Your choice depends on your knowledge about the type of data – and your knowledge might be incorrect or incomplete – it is simply better to steer clear of using real or business data as a key to the record – and avoid making assumptions. It’s better to have an approach which always works, irrespective of the type of data.

Composite keys are particularly problematic for modern systems which need to integrate or generate data offline or on mobile devices. It represents an intensive dependence on the data store, performance issues and headaches. Because of the complexity of managing the combined uniqueness of the values, performance will degrade with table size as the system needs to confirm the combination is unique across all rows. The more columns are included, the more performance will degrade.

The correct approach is to issue each record a surrogate key – a completely separate unique identifier which can never change, is always identified with that record, and is a single field. If you use a UUID, then new data can be created in disconnected systems and later incorporated into the main data store without additional effort.

There are some people who still argue composite keys are “correct” for rows that represent relationships. In the case of a school database there may be a table for Class and a table for Student each with a unique identifier. To represent the student being part of a class, they argue, the combination of the unique Class identifier and the unique Student identifier is a valid way to uniquely identify the relationship.

But it’s simply easier and faster to assign a new surrogate key, which uniquely identifies the relationship and this approach is then entirely consistent with the approach for all other tables.

2. Autonumber integer identifiers

This practice is still common today. As data is added to the system, the first record is assigned a key with the number 1, the second 2 and so on.

While this approach seems obvious, it’s usually a poor choice because:

it relies on one central broker (read as bottleneck) to assign these numbers (so they’re guaranteed to be unique), (imagine several thousand concurrent users and you start to see the problem),
if you want to create data offline or in another system you have to establish some kind of connection (perhaps through a web service) to get an ID – slowing down the process of creating new data (for example on a mobile device) and making app design more complex,
it opens potential insecurities, any user of the system can see records being accessed in the browser address bar like /user/12 and trivially start guessing at other record numbers. Poorly designed systems do not have the controls in place to properly restrict access to records.

A better approach is to use UUID identifiers as keys.

UUIDs can be created anywhere and don’t need a central broker to ensure uniqueness. This means that relational data can be created on a mobile device (or 3rd party system) and then transmitted to the server in one go. It also means data is guaranteed to be able to be uniquely identified even when “business keys” (like names, or codes) change (i.e. surrogate keys). From a security point of view, intruders are also not able to “guess” data by just incrementing numbers for each record type.

Data cleansing

Cleansing is the term used to describe the process of cleaning up data which your software and systems should never have allowed to be created in the first place.

Duplicate data (usually a sign of poor software design), orphaned data (data which has become disconnected but is still lying around in the data store – usually a sign of software bugs) and poor quality data (badly and inconsistently entered – usually a sign of poor attention to detail by the developers) are all issues which should have been prevented by your software and in some cases represent lost value – items that should have been invoiced, clients that should have been contacted or just wasted effort.

For anyone who’s ever been involved, data cleansing is always complex and often means cost blowouts and delays.

In my experience, one of the reasons cleansing is so difficult is because it’s usually done on the raw data, using SQL or related tools. But humans really only understand their data in the context of the forms and reports they normally view it in. Working on the data directly means you risk not enforcing the rules and assumptions built into the software application, and this can lead to additional problems.

Traditional “Extract Transform Load” (ETL) approach

The challenge for the traditional ETL approach is how to manage a transition and re-design of the data store, to meet a new system which works a different way – or is still being designed.

The traditional ETL approach assumes a transition to a known final state, where all issues will be resolved. It’s tractable, but expensive, time consuming and high risk – often data (value) losses are not noticed until down the track when it’s too late, or compromises are made (knowingly) to reduce complexity and cost which you later regret.

During my career, I’ve been directly and indirectly involved with some sizable ETL projects. There are great tools out there and some very sophisticated technology, but even so the projects have never been straight forward or low stress.

The solution

In attempting to solve this problem we’ve taken a different approach to what we’ve seen elsewhere, and it’s been very successful by contrast.
Using our Wildcat automated conversion process, we’ve transitioned all kinds of systems, from simple Microsoft Access databases to enterprise systems with hundreds of millions of rows and hundreds of database tables, using the exact same automated process.

We’ve been able to quickly and confidently transition organisations from VB6, .NET, Oracle forms, Paradox, proprietary off-the-shelf and in-house developed systems … in fact so far we haven’t come across a technology set that we couldn’t convert.

There’s a few reasons why Wildcat has been successful where traditional approaches have failed. I’ll break each of these down and explain why they make a difference.

We transition to a new low-code technology, and ignore attempting to automatically convert the existing software code,
We re-key, but also transition legacy keys,
We enforce and use naming and design standards,
We iterate repeatedly with both the application and data migration to ensure there’s no surprises – and continually perform side-by-side comparisons with the existing system to validate the correctness of our transition.

Let me break these down.

1. Low-code

New low-code approaches, like the Skyve methodology (www.skyve.org) we’ve developed, automatically provide centralised security, navigation, data management, reporting, mobility, location and maps and a whole lot more. In our experience, that’s about 80% of the job done before we even start.

Another way of looking at it is – that’s 80% of the application you don’t have to pay to develop or maintain.
You can build apps in Skyve using a point-and-click builder, but behind the scenes, Skyve works from a declared “domain model” – basically a technical description of what data you need to manage and what you should be able to do with it. Wildcat creates this “domain model” by analysing the underlying data store of your existing system.

With this domain model declaration, Skyve creates a powerful web-based application to manage complex data without writing any code – and will do it more consistently and better than hand-coded solutions.

So we bypass your legacy code and go straight to the data as the source of truth. Where we need to, we use our expertise to add back in your unique business logic and processes by hand – often finding and fixing conceptual errors which have been hiding out in your old system for years. But even where this has to happen, we can take advantage of high-level APIs provided by Skyve, which means less code and less effort.

2. Re-keying

Our tool re-keys the data using UUIDs, avoiding the problems outlined above and allowing us to modify the design of the data appropriately, freed from the constraints of the legacy keys, depending on how the data will be used.

But crucially, we also transition and keep the legacy key data so that we can infer what would have happened in the legacy system even after we go live with the new system. This is extremely helpful in supporting a staged transition – where other systems might still be relying on the legacy key values – and it supports using features in the target system to allow ongoing data cleansing, rather than trying to get all data problems resolved up front.

These legacy fields generally get dropped at a later date – but Skyve allows us to mark them and dispose of them when we’re confident they’re no longer needed.

3. Naming and Design Standards

The Skyve methodology doesn’t just enforce consistent naming and design standards, it uses them to infer capability. It’s one of those aspects of the Skyve approach which just keeps on giving.

Because Skyve enforces these standards, Skyve applications can be continually validated to ensure the design is internally consistent – especially useful to help manage change. Skyve applications can take advantage of generated regression testing suites, including generating a suite of automated User Interface tests – typically an expensive and brittle process most organisations can’t effectively deliver. And developers can move between very different projects and confidently begin work, because they know where things are and what they’ll be called.

Who would have thought that being absolutely committed to designing systems of the highest possible quality would yield so many benefits? I guess it should have been obvious.

4. Truly Iterative Migration

Our Wildcat tool allows us to complete the first iteration of application and data sometimes in as little as a few hours from a standing start. Yes, I said hours.
And our very first iteration is a functional application with migrated data.

From there we refine, enhance, test, review, improve and redesign problem areas.

Wildcat does some pretty sophisticated automatic analysis of the existing system – but it can’t always definitively detect what the original developer intended, or should have done – especially where there are inconsistencies or design flaws. This is where side-by-side validation of your existing system comes in, enriched by your subject matter experts, our data expertise and metrics.

We iterate the process where we identify issues and potential improvements.

For each iteration, Wildcat converts both the data and the application concepts at the same time, which means the data and application are in sync. This reduces a whole lot of complexity making it simpler to do more iterations, and with each iteration the risk of problems in the end system is reduced.

It’s a genuinely agile approach to transition.

As a bonus, the Skyve platform manages changes to the database automatically, which means we can round-trip quickly and painlessly. After the initial iteration, it’s not unusual to iterate several times a day – with each iteration we review our assumptions, add data cleansing, add design modifications and retest. Successive iterations then also add polish – tightening up specific form layouts, adding bespoke behaviours and wherever possible, improvements – taking advantage of the rich features available in the new technology.

This approach means that by the time we go live, we’ve transitioned the system and data dozens of times and validated the results side-by-side with the existing system, run a suite of unit tests, UI and user tests with up-to-date data at each stage.

This makes the whole transition straight forward and more importantly – low risk.

So far, we haven’t come across a system we couldn’t migrate with this approach.

The payoff

And the pay-offs are genuinely impressive, by anyone’s standards:

A state services levy application with >200m rows transitioned with data and live through UAT and into production in 17 person days,

An insurance stamp duty management system transitioned with data into production in 16 person days – even though the original software wasn’t even in a working state,
An Excel driven manual project management process transitioned, live and available on mobile devices in 6.25 hours, including data,
A grants funding system comprised of 18 separate .NET/VB6/MS Access applications managing funds of $3.2b, transitioned with data into a single consolidated system and live in production in 50 person days – making its first $100m payment correctly and on time.

The systems we’ve transitioned are clearly better – they perform faster, are more accessible, more usable, more flexible, more scalable, better integrated, more secure and importantly more maintainable than what they replace.

And here’s where the low-code advantage pays off again – typically the transitioned has only 20%-30% of the amount of code compared with the source system – this means less “stuff” to maintain – making maintenance and ongoing changes simpler and lower risk.

There’ll be some out there who think – “but that’s not the way it should be done”. In my career I’ve regularly come up against technicians who feel they must stick to the way it’s always been done. That’s a lack of imagination.

If “the definition of insanity is doing the same thing over and over again and expecting a different result” then it’s time to try something different.
Which approach would you choose?

If you’d like to find out more about Wildcat conversion, the Skyve platform or any of the points I’ve raised above, I’d be happy to talk. Contact me and let’s meet online or for a coffee.