Following long term trend of migration from proprietor systems to open-source software, database migration from SQL Server to PostgreSQL has become quite popular way to optimize expenses on IT infrastructure. This whitepaper exposes tips and tricks to migrate SQL Server to PostgreSQL based on use cases from Intelligent Converters projects.
Database migration between two advanced DBMS includes the following steps :
- Assess SQL Server specific entries such as data types, properties, syntax patterns and built-in functions.
- Find the most suitable solution to migrate SQL Server schemas to PostgreSQL with respect to differences between the two DBMS.
- Find the best method and tool to migrate SQL Server with respect to downtime and overhead of the source system.
- Migrate SQL Server database logic entries such as stored procedures, functions, triggers and views into the PostgreSQL format.
- Validate the resulting database by running performance and functional tests. Verify stability and efficiency of the system, adjust the performance when it is necessary.
Detailed explanation of the most important steps is followed below.
Table Definitions
Main challenges of schema migration from SQL Server to Postgres include types mapping and conversion of default values for each table column. The two DBMS have many similar data types such as BIGINT, DATE, DECIMAL, INT, MONEY, NUMERIC, REAL, SMALLINT, TEXT. However, there are SQL Server types that must be mapped to Postgres properly:
SQL Server | Postgres |
BINARY(n) | BYTEA |
BIT | BOOLEAN, BOOL |
CHAR(n) where n>8000 | TEXT |
DATETIME | TIMESTAMP(3) |
DATETIME2(n) | TIMESTAMP(n) |
DATETIMEOFFSET(n) | TIMESTAMP(n) WITH TIME ZONE, TIMESTAMPTZ |
FLOAT(n) | DOUBLE PRECISION, FLOAT8 |
IMAGE | BYTEA |
NCHAR(n) where n>8000 | TEXT |
NTEXT | TEXT |
NVARCHAR(max) | TEXT |
ROWVERSION | BYTEA |
SMALLMONEY | MONEY |
UNIQUEIDENTIFIER | CHAR(16), UUID |
VARBINARY(max) | BYTEA |
VARCHAR(max) | TEXT |
SQL Server spatial types such as GEOGRAPHY and GEOMETRY require dedicated extension PostGIS installed in PostgreSQL.
Conversion of attribute IDENTITY for integer columns in SQL Server also requires special attention. If it has both seed and increment parts equal to 1, the PostgreSQL equivalent is SERIAL for INT or BIGSERIAL for BIGINT. In other cases IDENTITY attribute must be converted according to PostgreSQL syntax (supported in versions starting from 10). For example, the following SQL Server table:
CREATE TABLE Logs(
Id INT NOT NULL IDENTITY(2,4) PRIMARY KEY,
Msg VARCHAR(200)
);
must be migrated to Postgres as follows:
CREATE TABLE Logs(
Id INT NOT NULL GENERATED BY DEFAULT AS IDENTITY
(START WITH 2 INCREMENT BY 4) PRIMARY KEY,
Msg VARCHAR(200)
);
Migration of Data
Basically, it is not so hard to migrate data from SQL Server to PostgreSQL except few concerns. First, SQL Server binary data is migrated into BYTEA for small and medium size. However, this approach is limited by approximately 10MB because BYTEA data can only be extracted in a single fragment. PostgreSQL does not support piecewise reading for this data type, so reading huge data may cause RAM overhead.
For those cases, PostgreSQL offers alternative way of storing binary data known as the LARGE OBJECT providing stream-style access to the data, so it can be a workaround for BYTEA limitations. LARGE OBJECT data is stored in the dedicated system table ‘pg_largeobject’ that can have up to 4 billion rows with maximum size 4TB per object.
Spatial data is another challenge to migrate SQL Server to PostgreSQL due to different internal representation of the data types in both DBMS. The workaround is to use special text representation for spatial data called well-known text (WKT). For example, spatial columns ‘col1’ and ‘col2’ mat be extracted in WKT form from SQL Server table using built-in function STAsText as follows:
SELECT data1.STAsText(), data2.STAsText() FROM spatial_table
The resulting data may be inserted in the in PostgreSQL table using WKT-representation like this:
INSERT INTO spatial_table VALUES (1,’POLYGON ((5 5, 10 5, 10 10, 5 5))’, ‘LINESTRING (-122.36 47.656, -122.343 47.656)’);
Methods of Data Migration from SQL Server to PostgreSQL
Probably, the most important issue when migrating large data is preventing unacceptable downtime or overhead of the source database. There are three most common techniques to solve this problem:
- The most easy and straight forward option is called snapshot migration, it requires all the data is migrated in a single transaction. This method may force unacceptable downtime of SQL Server database, so it is mostly suitable for small and medium size tables.
- Piecewise Snapshot Migration is more sophisticate variation of the previous approach that splits the data in chunks and then migrates them simultaneously via parallel threads or processes. This method significantly reduces the downtime of SQL Server. Specialists of Intelligent Converters use the piecewise snapshot migration for the most of migration projects as a good combination of performance and reliability.
- Changed Data Replication (CDR) method suggests continuous data migration by exploring changes and replicating the updated data only. This approach helps to essentially decrease downtime of the SQL Server and provide near to real time migration due to the minimal volume of data processed.
Migration of SQL Code
First concern when migrating source code from SQL Server to PostgreSQL is type casting. PostgreSQL applies more stricter requirements for such cases than SQL Server. It requires explicit types casting in function calls, operators or update statements with expression results. Workaround is to use PostgreSQL ‘anyelement’ pseudo-type which enables using any data types during function calls and operations:
create or replace function my_concat(str1 anyelement, str2 anyelement)
returns varchar
language plpgsql
as $$
begin
return str1::varchar || str2::varchar;
end;
$$;
The only limitation for this workaround is that stored procedures and functions must have either single parameter or all parameters of type ‘anyelement’.
Next issue that must be resolved while migrating SQL Server to Postgres is conversion of triggers. Syntax of SQL Server demands that trigger’s source code is included inside the CREATE TRIGGER statement, while PostgreSQL trigger must call trigger’s function containing all the logic. For example:
CREATE OR REPLACE FUNCTION employees_on_update_func()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
IF (NEW.first_name <> OLD.first_name) OR (NEW.last_name <> OLD.last_name) OR (NEW.email <> OLD.email)
THEN
INSERT INTO changes_log(id,changed_on)
VALUES(OLD.id,now());
END IF;
RETURN NEW;
END;
$$
CREATE TRIGGER employees_on_update
BEFORE UPDATE ON employees FOR EACH ROW
EXECUTE PROCEDURE employees_on_update_func();
Finally, correct conversion of SQL Server built-in functions missing in PostgreSQL is required when migrating the source code of stores procedures, functions and triggers:
SQL Server | Postgres |
CHARINDEX($substr, $str, $start_pos) | STRPOS combined with SUBSTRING |
CHAR | CHR |
DATEADD($interval, $n_units, $date) | $date + $n_units * interval ‘1 second/minute/hour/day/month/year’ |
DATEDIFF($interval, $date1, $date2) | DATE_PART(<interval name>, $date2 – $date1) |
DATEPART | DATE_PART |
GETDATE | NOW() |
IIF($condition,$expr1,$expr2) | CASE WHEN $condition THEN $expr1 ELSE $expr2 END |
ISNULL | COALESCE |
LEN | LENGTH |
REPLICATE | REPEAT |
SPACE($n) | REPEAT(‘ ‘, $n) |
Conclusion
It is not so easy to migrate SQL Server to PostgreSQL, this procedure may require much time and efforts, especially for large databases. This whitepaper exposes only a few issues of database migration, while almost each migration project has its own challenges. Experts of Intelligent Converters, a company focused on database migration and synchronization for years, are always ready to assist with any database migration project.