How to migrate from spreadsheets to a DBMS

by Mitchell Tweedie, Jun 15, 2018

The problem with using spreadsheets for your database

Spreadsheets are databases, and they work well. Why would you ever want to leave the comfort of your spreadsheet? Well, here’s a few reasons why:
  • Database Management System (DBMS) can scale - spreadsheets don’t scale;
  • spreadsheets enable human-error on a greater scale;
  • spreadsheets require more manual maintenance; and,
  • spreadsheets are poor integrators with 3rd party tools.
We don’t want to see you left in an awkward spreadsheet situation. Your friends don’t either. So here are some steps you can follow to free yourself.
Click here for more on the differences between database management systems and spreadsheets.

Data migration in 5 steps: from spreadsheets to a RDBMS

The schema separates any database from a spreadsheet. Despite both storing and managing data, spreadsheets don’t have a schema. This means they have loose rules regulating how and where users input data. Fewer rules also means fewer programmable capabilities. (It's worth noting that with enough time and resources, a spreadsheet can become a power tool despite its limitations.)

A loose data structure may work well when doing something small and basic in Microsoft Excel. However, spreadsheets trend to become more complex over time. This is especially true of those embedded within core business processes, such as CRM. 

Depending on what you and your business needs, a spreadsheet may not be serving you needs in the long term. Because of this, you may be tempted to migrate to more powerful type of database.

The obvious option is a DBMS (of which there are many types of database). Here are 5 steps to help guide your migration to a Relational DBMS, such as Microsoft Access or MySQL.

Step 1: Review your existing database

Start by reviewing your existing spreadsheet. How many workbooks (documents) are you migrating? How many worksheets (grids of columns and rows) do each of your workbooks contain?

You can tidy-up your data pre-migration or-post-migration. Which option is best for you will depend on a number of variables. A good rule of thumb - remove columns and rows reserved for totals and subtotals. You can recreate these in your DBMS using queries.

Ideally, each of your worksheets will be organised into a single block of rows and columns, with headings in the first row of each column.

Step 2: Define your RDBMS schema

Once you have a better understanding of our data, you need to define your RDBMS schema.

A database’s schema describes its structure in a formal and logical language. It describes the relationship between entities. For example, entity X connects to Y. Then, entity Z further interacts with Y and X. 

You can look at the relationship between X and Y, and then seem how that changes once you apply Z’s data.

If it sounds confusing, here’s an example. A librarian wants to see which user has borrowed out what book, and for how long. X can be users, Y can be books, and Z can be books currently out on loan. Relational Databases draw conclusions between blocks of data (entities).

Eventually, once the database schema is ready, you can install it into your RDBMS, and then look for relationships between entities.

In an RDBMS, entities are another name for tables. 

And tables are similar to tables in a spreadsheet.

A relational database organises data into tables, and the schema describes relationships between the records stored in those tables.
Still confused? Remember that a Relational Database has at least three tables - these are called entities - and that when you search in a RDBMS you’re drawing relationships between the data contained in the tables.

Here’s the library example in more detail: if you’re a library manager, you can separate book data into Users, Books, Books out on loans, authors, and categories. You can list each of these topics as a table (see below).
You might want to compare user_id to book borrowed_id to date_of_publication to see which users borrowed older editions of a particular book. 

This is a relational database in action.

For non-library manager’s out there, click here for hundreds of database schema examples.

You might ask why some of the table entries are underlined. These are Primary Keys. These are specific and unique data entries. RDBMS need a unique primary key to organise search queries around.

Step 3: Create a staging table for each of your worksheets

Build staging tables to help smooth out the migration. Rebuild and label the data tables for each of your spreadsheets. Define your spreadsheet columns as loosely as possible. 

Run through this checklist on the data once you’re read:
  • How many unique primary keys does your database contain?
  • How is your data formatted (strings, integers, etc.)?
  • Can and should you create look-up tables?
  • Do your other worksheets have primary keys represented as columns in a master worksheet?
  • How consistent are your cell values? (For example, figures such as 0, Zero, None, and Nil should be consolidated)
Answering these questions gives you a good basis for writing your migration scripts and importing your data.

Step 4: Run your migration scripts

A migration script provides a systematic and consistent import process. A data migration service provider can write these for you - outsourcing can help. Or, you can write these by hand - some training may be required.

Step 5: Normalise and clean up

Once you’ve migrated your data, normalise any remaining inconsistencies and clean up any carry over issues.

Focus on getting your data into its smallest parts (atomic data) and breaking data into topic clusters.

With this final check complete, you’ve successfully migrated your data from spreadsheets to RDBMS.
See how WorkingMouse migrated and consolidated 17 separate technologies for The Australian Department of Defence.

What's next? Turn your database into an app

Once you've migrated from a spreadsheet to a DBMS (preferably an SQL database) you're within easy striking distance of turning your data and business processes into an app.

We asked Leo, Codebots' Lead Developer for Lampbot, some questions on cloud migration and how organisations can turn more primitive artefacts, such as spreadsheets into fully-fledged web and mobile apps.
An app is just state and behaviour, with state being data and what is and behaviour being what can be. It's the combination of these that distinguishes basic databases (such as spreadsheets) and apps.

For example, you've probably got a list of contact details on a spreadsheet somewhere. This data is undoubtedly useful, even when managed as a spreadsheet, however, an app with the same data could also exhibit some advanced behaviour:
  • validating email addresses and phone numbers;
  • rejecting duplicate email addresses and phone numbers; and
  • linking disparate email addresses and phone numbers belonging to a single individual or organisation.
The core advantage of a DBMS over more primitive forms of data management is the ready potential to add behaviour (such as the above) to your state. One of Codebots' core features is the ability to import and interpret SQL databases such as Microsoft Access and transform these into fully functional web and mobile apps by enabling users to switch on ready-to-deploy behaviours.