The Challenges of Data

Data Migration Concept on Desktop Calendar.

Every project varies, but there are three activities commonly place the largest demands on a client team’s time during implementation: decision making, data migration, and testing. Most clients foresee the first one and plan for the last – but data migration has a tendency to exist only as a line item on the Statement of Work (SOW) and not thought about until relatively late in the project.

A Data Assessment Phase providing analysis of current state data and systems can go a long way in avoiding surprises. The process for data migration is usually fairly straightforward – it’s the details, timing, and availability of the data that tend to cause challenges.

A Data Migration Strategy must be documented with clear assumptions. During early discussions, there are often significant questions around how the new software will be configured which may affect the exact data requirements. The scope is generally specified and discussed in terms of the data the new system will need and the effort to load it rather than the effort required to produce it. This makes perfect sense given that at this stage the usual objective is in defining the SOW and getting authorization to begin work – but it also makes it easier to discount the amount of effort required to collect, clean, and organize this data.

Each software product on the market has its own data loading process, but many of them use one of a couple common models. The most common is some manner of data template, often provided to the end-user in an excel format and then translated by the implementation partner into the system required format (a delimited or structured file, loading of the data into a database staging table – it varies by product).

Typically, there is one template for each type of data to be migrated into the system, often corresponding to the screens or forms the system uses. There may also be templates for data which underlies the validation of the data – such as lists of options or geographic structures. These templates must be populated according to specific rules that will cause it to behave in the way the system expects, and generally must reflect any configuration or customization to the system the client has requested. Because of this these templates can usually only be provided after the configuration of the field layout has been finalized or is near-finalized. This provides different challenges depending on the implementation approach.

The templates provided need to be completed in a specific manner. There is usually a primary key, list options of valid values, and they may be case sensitive. Records often need to be linked or associated with each other by the key/Unique. Formatting and field character lengths must also be strictly adhered to.

Having a technical resource on the team responsible for collecting data who has familiarity with the source database systems is extremely helpful in ensuring that data correctness and quality remains high. If the data is currently stored in a database then having a resource who can query that database to generate the information for templates can dramatically minimize the number of errors and can also make generating and regenerating data efficient.

Incumbent systems hosted with a SaaS provider may have restrictions on the way the data can be accessed and exported. It is important to understand the methods available to access data and if the provider will need to be involved in the migration process. This can be a politically fraught situation, particularly if the provider is unaware that a system replacement may be imminent, but it is best to have an understanding of the options prior to committing to any data migration timelines.

Many times, the data required does not exist in a single consolidated system and the implementation of a new system is often seen as an opportunity for data clean up and validation. Small deviations from the original data set – a few fields that are missing, two similar systems which need to be consolidated, or a bit of restructuring which needs to be done – can generally be handled by exporting the data and then doing a bit of spreadsheet manipulation. However, when the data in the previous system is of genuinely poor quality, there are a multitude of different systems in use, or there is no system this can suddenly become a much more involved process.

One final item to be aware of is the time data loads can take. Depending on how much post-processing the system requires, the actual loading of the data often takes much longer than is expected by the end users. While systems with limited validation and calculations may load relatively quickly, systems with mature financial validations will take more time. Along these same lines, the turnaround time required to perform multiple runs may be either very simple or very complex depending on the specific situation, the system, and the data to be reloaded.

The duration of a given migration is too situationally variable to generalize – but the combination of the challenge related to providing data along with often somewhat long and inflexible load times often makes this part of the project both prone to schedule lapses. To understand these risks many vendors will request record counts on the areas to be loaded. The schedule sensitivity of this is why it is so important for these initial record counts to be approximately accurate – erroring on the side of skewed slightly high.

The good news is that the majority of pitfalls which happen in data loading can be avoided if the starting condition of the data is well understood and the process used is thought through before being agreed upon. It is especially important to understand the current state of the data and how it will be accessed, and be sure you appropriately choose to either migrate or load via manual entry. The EBUSINESS STRATEGIES Data Assessment is designed to help define your Data Strategy up front to ensure implementation and processing success in the future.