Migrating from MySQL to PostgreSQL
I recently migrated an old node webapp from MySQL to PostgreSQL. The application
is a little over 10k lines of code and does not use any ORM / SQL builder tool.
To convert the schema and move the data I used a little Common Lisp tool called
pgloader
which can convert from various other databases like MySQL, sqlite
or MSSQL.
The reasons1 why I did this migration was:
- PostgreSQL feels a bit more stable / long term (I don’t like that Oracle bought
MySQL). This was the last side-project I had that was still using MySQL, so
now I don’t have to deal with MySQL anymore.
- Porting this project to MongoDB would have been a lot more work and the license situation is not ideal there and also I don’t like the “schemaless” aspect of MongoDB.
- MySQL’s timestamp datatype has the 2032 problem and a long range of various other similar little technical limitations.
- PostgreSQL Ubuntu packages are quite well maintained. There are separate packages for 9.4, 9.5, 9.6, 10, 11, 12 available in parallel on 18.04
Just installing pgloader using “sudo apt install pgloader” didn’t work because
it crashed and I found some GitHub issue that was fixed but the apt packaged
version did not have the fix yet. Instead I tried to run the docker version, but
it was quite painful to get that working. I had to reconfigure both mysql and
postgres to accept connections from any interface rather than just localhost,
then set a password for the postgres
user (it doesn’t allow password
logins by default) and add a user in MySQL that was allowed to connect from a
remote IP. Finally, I had to use the docker host IP (visible on the docker0
interface if you run ip addr
) both as the source and the destination. I
ended up running something like:
docker run --rm --name pgloader dimitri/pgloader:latest pgloader \
mysql://myuser:mysqlpasswo@172.17.0.1/sourceDbName \
pgsql://postgres:passwordForPostgresUser@172.17.0.1/targetDbName
Later, when I went through a similar process for a staging environment I
installed the latest version of pgloader from apt.postgresql.org
instead
and that was a lot easier.
Migrating the source code
The first step was to convert the Mysql
SQL placeholders from ?
to
PostgreSQL $N
placeholders:
- const rows = await db.query('SELECT * FROM users WHERE user_id = ?', [userId])
+ const rows = await db.query('SELECT * FROM users WHERE user_id = $1', [userId])
Doing this manually, would be very tedious/boring so I wrote a program that
edited my source code instead. babel is a great tool for JavaScript
source-to-source transformation, but it doesn’t preserve source formatting when
it emits code. However, there is a great utility on npm called
@codemod/cli
that parses and emits code using
recast instead of the babels default
parser. I implemented my SQL migration utility as a babel plugin and I used the
excellent JavaScript AST Explorer to prototype the
first version. It was possible to implement the entire plugin in just 45 lines
of code:
/*
Tiny babel plugin that helps migrate node Mysql SQL to Postgres SQL.
Run via:
npx -p @codemod/cli codemod --plugin ./scripts/mysql-to-postgres.js src/server/foo.js
*/
const needsMigration = sql => {
const ustr = sql.toUpperCase()
return ustr.includes('?') && (
ustr.includes('SELECT') ||
ustr.includes('INSERT') ||
ustr.includes'UPDATE') ||
ustr.includes('WHERE') ||
ustr.includes('DELETE')
)
}
const migrateSql = sql => {
let i = 0
return sql.replace(/\?/g, _ => `$${++i}`)
}
module.exports = function (babel) {
const { types: t } = babel
return {
name: 'mysql-to-postgres-sql-converter',
visitor: {
StringLiteral(path) {
if (needsMigration(path.node.value)) {
path.replaceWith(
t.StringLiteral(
migrateSql(path.node.value)
)
)
}
},
TemplateElement(path) {
if (needsMigration(path.node.value.cooked)) {
path.replaceWith(
t.templateElement(
{
raw: migrateSql(path.node.value.raw),
cooked: migrateSql(path.node.value.cooked),
},
path.node.tail
)
)
}
},
}
}
}
This fixed all my SQL and preserved all source formatting but unfortunately it also added a few semicolons. I was able to remove these automatically using eslint though:
eslint --parser babel-eslint --parser-options=ecmaVersion:6 \
--parser-options=sourceType:module --no-eslintrc --no-ignore \
--fix --rule 'quotes: [2, single]' src/server/foo.js
After this I did a few manual fixups like:
- Changed all my timestamp datatypes to
timestamp(3) with time zone
. By default postgres has microsecond precision but that causes problems if you use node because the JavaScriptDate
can only represent milliseconds. If you really need microsecond precision you can use a custom type parser (see note at the bottom of this page). - Adding
RETURNING ...
onINSERT
statements with auto incrementing ids. - Change a few
INSERT ... ON DUPLICATE KEY UPDATE
intoINSERT ... ON CONFLICT ...
. - Fixed all references to the MySQL specific field
.affectedRows
. - Fixing references to camel case column names. pgloader renames camel case columns to all lowercase since this is the convention on postgresql (you can enable case sensitive column names in postgres but I thought it was better to stay with postgresql default settings).
- Fixed some
rand(someSeed)
calls to transactions withSET LOCAL SEED = someSeed
. - Ported by web session handling from
express-mysql-session
toconnect-pg-simple
. - Rewrote the setup / deploy scripts to install postgres and create a new database + schema.
- Created a new blank db-migrate seed.
- Making sure that all E2E tests were passing again. I have a very good test suite for this project, and it was quite amusing to see all the, sometimes quite unexpected, places that broke when I changed the database.
After the migration was done, I noted that:
- PostgreSQL ended up taking just 94MB of RAM memory (RSS) versus the 194MB that MySQL had required for the exact same data / workload (which is great because this project is deployed on a tiny Digital Ocean droplet with 1GB RAM).
- I learned that PostgreSQL doesn’t support logical column ordering yet, which kind of sucks.
- PostgreSQL has a TODO page that lists things that are not implemented yet.
- pg_dump emits SQL to lock down the search path to protect against CVE-2018-1058 and this ended up causing lots of problems when I was setting up db-migrate package.
1: Most people seem to prefer PostgreSQL, even though there are certainly examples of the opposite for certain very specific situations.