How to migrate from spreadsheets to a DBMS
by Mitchell Tweedie, Jun 15, 2018
The problem with using spreadsheets for your database
Spreadsheets are databases, and they work well. Why would you ever want to leave a spreadsheet? Here’s a few reasons why; Database Management System (DBMS) can scale - spreadsheets don’t scale. Spreadsheets enable human-error. They also require manual maintenance, and they are poor integrators with 3rd party tools. We don’t want to see you left in an awkward spreadsheet situation. Your friends don’t either.
Data migration in 5 steps: from spreadsheets to a RDBMS
The schema separates any database from a spreadsheet. Despite both storing and managing data, a spreadsheets don’t have a schema. This means they have loose rules regulating how and where users input data. A loose data structure may work in Microsoft Excel. However, depending on what you and your business needs, a spreadsheet may not be serving you needs in the long term.
You might decide to migrate, but there’s one problem.
Your current spreadsheet pattern might be incompatible with your new DBMS. There are several types of database. Even if your new DBMS happens to be a relational database, like Microsoft Access, it does not automatically have protection from data input and relationship error.
How do you make sure you have a smooth move? Here are 6 steps to help guide your migration.
Step 1: review your existing database
Start by reviewing your existing spreadsheet. How many workbooks (documents) are you migrating? How many worksheets (grids of columns and rows) do each of your workbooks contain?
You can tidy-up your data pre-migration or-post-migration. Which option works for you depends on a lot of variables. A good rule of thumb - remove columns and rows reserved for totals and subtotals. You can recreate these in your DBMS using queries.
Step 2: define your RDBMS schema
Once you have a better understanding of our data, you need to define your RDBMS schema.
A database’s schema describes its structure in a formal and logical language. It describes the relationship between entities. For example, entity X connects to Y. Then, entity Z further interacts with Y and X.
You can look at the relationship between X and Y, and then seem how that changes once you apply Z’s data.
If it sounds confusing, here’s an example. A librarian wants to see which user has borrowed out what book, and for how long. X can be users, Y can be books, and Z can be books currently out on loan. Relational Databases draw conclusions between blocks of data (entities).
Eventually, once the database schema is ready, you can install it into your RDBMS, and then look for relationships between entities.
In an RDBMS, entities are another name for tables.
And tables are similar to tables in a spreadsheet.
A relational database organises data into tables, and the schema describes relationships between the records stored in those tables.
Still confused? Remember that a Relational Database has at least three tables - these are called entities - and that when you search in a RDBMS you’re drawing relationships between the data contained in the tables.
Here’s the library example in more detail: if you’re a library manager, you can separate book data into Users, Books, Books out on loans, authors, and categories. You can list each of these topics as a table (see below).
You might want to compare user id to book borrowed id to date_of_publication to see which users borrowed older editions of a particular book.
This is a relational database in action.
For non-library manager’s out there, click here for hundreds of database schema examples.
You might ask why some of the table entries are underlined. These are Primary Keys. These are specific and unique data entries. RDBMS need a unique primary key to organise search queries around.
Step 3: create a staging table for each of your worksheets
Build staging tables to help smooth out the migration. Rebuild and label the data tables for each of your spreadsheets. Define your spreadsheet columns as loosely as possible.
Run through this checklist on the data once you’re read:
- How many unique primary keys does your database contain?
- How is your data formatted (strings, integers, etc.)?
- Can and should you create look-up tables?
- Do your other worksheets have primary keys represented as columns in a master worksheet?
- How consistent are your cell values? (For example, figures such as 0, Zero, None, and Nil should be consolidated)
Answering these questions gives you a good basis for writing your migration scripts and importing your data.
Step 4 - run your migration scripts
A migration script provides a systematic and consistent import process. A data migration service provider can write these for you - outsourcing can help. Or, you can write these by hand - some training may be required.
Step 5 - normalise and clean up
Once you’ve migrated your data, normalise any remaining inconsistencies and clean up any carry over issues.
Focus on getting your data into its smallest parts (atomic data) and breaking data into topic clusters.
With this final check complete, you’ve successfully migrated your data from spreadsheets to RDBMS.
*Bonus tips, need help migrating from one RDBMS to another?