Converting mysql ddl for cockroach import

Cloud Platform (extra geek level) not yet posted on  Jan 2018


Cockroach is an open source implementation along the lines of Google Spanner. I wanted to try it out as described in Getting cockroachdb running on google cloud platform , so I wanted to see if it was possible to automate the import of a  sizeable MySql database from a DDL file. 

Differences

There are a whole lot of syntactical differences, so the input data needed lots of massaging. Cockroach is close to PostgreSQL so I started from this .


I also spent a bit of time on the issues section of the cockroach repo and picked up many edits that others had suggested there too. 

Tools

I've been using Unix off and on for almost 40 years. In fact, I used to work at the company that invented it, and the philosophy in the wikipedia entry for Unix, is one of the main reasons for the success of *x in all its flavours, including Linux.
"the use of a large number of software tools, small programs that can be strung together through a command-line interpreter using pipes, as opposed to using a single monolithic program that includes all of the same functionality. These concepts are collectively known as the "Unix philosophy". Brian Kernighan and Rob Pike summarize this in The Unix Programming Environment as "the idea that the power of a system comes more from the relationships among programs than from the programs themselves". "

Having said that, my relationship with Unix/Linux has been very much more off than on, so embarking on this automation task has been both a "remembering experience" and a "learning experience", but in any case i wanted to stick to the basic tools available in Linux, namely bash , grep, awk and sed. Let's hope that some of this might be useful to others undertaking the same task.

CockroachDB

I covered this in Getting cockroachdb running on google cloud platform , but initially I'm using Docker running on Compute Engine - just a single VM, but multiple nodes in separate Docker containers. 

Converting and loading the ddl

There are 3 parts to this
  • split the ddl into table creation , constraints and inserts.  The import will fail if you reference constraints in tables that aren't yet created so all those kind of definitions need to be removed from the create table part, and turned into alter statements.
  • patching the syntax. Conversion of definitions (and in some case import data formats), from MySql to CockroachDB
  • Inserting the data using the built in cockroach DB SQL client. Sometimes the inserted data needed to be split into batches as there were too many to fit comfortable in a single transaction
load.sh runs the entire process
sh splitpart.sh
sh patpart.sh
sh dockpart.sh

Splitting the ddl

splitpart.sh just passes the input and output filenames to alters.sh. The original mySql export is in e.sql-orig
sh alters.sh e.sql-orig \
 e-table-creates.sql-unpatched \ 
 e-table-alters.sql-unpatched \ 
 e-inserts.sql-unpatched

alters.sh creates the separate parts as follows.
  • find all the CREATE TABLE sections in the MySql export, and write the names of the tables to a file. Note that MySql has exported all table names and fields with `backtick` round their names
grep -Ei "CREATE TABLE[^\`]+" $1 | grep -oEi "\`[^\`]+\`" | grep -iEo '[^\`]*' > e-t.u
  • get the name of the database from the export
DB=$(grep -iE "CREATE DATABASE" $1 | grep -oiE "\`.*+\`" | grep -oiE "[^\`]+")
  • create an alters.sql that will add the contraints to each of the defined tables
# create an alters sql
echo "creating $3"
echo "" > $3;
while read LINE
 do
   awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iE \
    "constraint" | sed \
    -Ee "s/CONSTRAINT/ALTER TABLE \`$LINE\` ADD CONSTRAINT/Ig" | sed -Ee "N;/\n\)/s/,\)*\n/\n/;P;D" | sed -Ee "s/$/;/g" >> $3 
done < "e-t.u"
  • create a tables.sql containing the DDL for creating each of the table
# redefine the tables without the constraints
echo "creating $2
"echo "" > $2;
while read LINE
 do 
  awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iEv "constraint" | sed -Ee 'N;/\n\)/s/,\)*\n/\n/;P;D' >> $2 
done < "e-t.u"
  •  create a single file for each table containing its inserts. These may need to be split up in a later stage if they are too big for a regular transaction.
# inserts
rm "inserts/*-$4"
echo "creating $4"
echo "" > $3;
while read LINE 
 do N="inserts/$LINE-$4"
  echo "creating $N" 
  awk "BEGIN{IGNORECASE=1} /INSERT INTO \`$LINE\`/,/;/" $1 | sed -E -e "s/INSERT INTO \`$LINE\` VALUES//" > $N 
 done < "e-t.u"

Patching the syntax

now we have a table-creates.sql, a table-alters.sql and a set of inserts, but they are still in MySql syntax so the next step is to patch them to valid cockroachsyntax

patpart.sh does this
  • patches the creates and alters sql. They both need the same treatment. I'll cover patcdb.sh later
sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sql
sh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sql
  • clean up anything left from a previous attempt, and get a list of all the unpatched inserts
rm inserts/*e-inserts.sql
F=inserts/*e-inserts.sql-unpatched
  • patch the syntax for inserts, I'll cover patinserts.sh later
for fx in $F
 do 
  P=$(echo $fx | grep -Eo ".*\.sql")
  sh patinserts.sh $fx $P
done
  • Booleans need special treatment, so find any tables that contain BOOL type columns and apply additional patching. I'll cover patbools.sh later
echo "patching bool inserts"
for b in $B
  do 
    #need to copy the patched one, 
    cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin
   sh patbools.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql
   rm inserts/$b-e-inserts.sql-bytesin
  done

patcdb.sh converts from mysql to cockroach - most of these edits came from the issues section of cockroach git.
sed -E -e 's/`/"/g' \ -e "s/float\([0-9,]+\)/FLOAT/g" \ -e "s/binary\([0-9]+\)/BYTES/Ig" \ -e 's/int\(.*\)/int/' \ -e 's/DEFAULT CHARSET=ascii//' \ -e 's/ENGINE=InnoDB //' \ -e 's/DEFAULT CHARSET=utf8//' \ -e 's/DEFAULT NULL//' \ -e 's/^LOCK/--LOCK/' \ -e 's/^UNLOCK/--UNLOCK/' \ -e 's/COMMENT=.*$/;/' \ -e 's/COMMENT\ .*$/,/' \ -e 's/int NOT NULL AUTO_INCREMENT/SERIAL NOT NULL/' \ -e 's/AUTO_INCREMENT=.* //' \ -e 's/COLLATE utf8_unicode_ci//' \ -e 's/COLLATE=utf8_unicode_ci//' \ -e '/PRIMARY/!s/KEY/index/' \ -e 's/tinyint/SMALLINT/' \ -e 's/int NOT NULL DEFAULT \x270\x27/int NOT NULL DEFAULT 0/' \ -e 's/USING BTREE//' \ -e 's/\x27\\\x27\x27/e\x27\\x27\x27/g' \ -e 's/mediumtext/text/g' \ -e 's/FOREIGN index/FOREIGN KEY/g' \ -e 's/bit\(1\)/BOOL/g' \ -e "s/b'0'/false/g" \ -e "s/b'1'/true/g" \ -e 's/datetime/TIMESTAMP/g' \ -e 's/tinyblob/BYTES/g' \ -e 's/ double / DOUBLE PRECISION /g' \ -e "s/timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'/TIMESTAMP/g" \ -e 's/ ON UPDATE CURRENT_TIMESTAMP//g' \ -e 's/ CHARACTER SET ascii //g' \ -e 's/ int unsigned NOT NULL AUTO_INCREMENT/ BIGSERIAL NOT NULL/g' \ $1 > $2

patinserts.sh - MySql surrounds  names with backticks ( I realized later there was a mysql export option to prevent this, but quoting names in cockroach preserves the case, so I decided to leave this in). Cockroach doesn't like dates expressed as 0000-00-00, so best to change these to NULL.
echo "patching inserts $1 > $2"sed -E -e 's/`/"/g' -e "s/\\\'/''/g" -e "s/\)\s?,/\),\n/g" -e "s/'0000-00-00'/NULL/g"  $1 > $2

patbools.sh - I'm using the option in MySql that sends across binary values as hex encoded. In the case of BOOL, they would have been defined as BIT(1) in MySQL, so 0x00 and 0x01 need to be converted to true or false.
sed $1 -E -e "s/\(0x00/\(false/g" -e "s/\(0x01/\(true/g" -e "s/,0x00/,false/g" -e "s/,0x01/,true/g" > $2

Loading the data
Now there's a table and alters sql file to build the database and add the contraints, along with a collection of files,  one for each table, containing the values to be inserted.

dockpart.sh is in 2 parts.
  • dockcreatepart.sh - to make the tables







Splitting the ddl

sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sqlsh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sqlrm inserts/*e-inserts.sqlF=inserts/*e-inserts.sql-unpatched
sed -E -e 's/`/"/g' -e "s/\\\'/''/g" -e "s/\),/\),\n/g"  $1 > $2
echo "patching inserts $1 > $2"#cockroach doesnt like backticks or \' for a quotesed -E -e 's/`/"/g' -e "s/\\\'/''/g" -e "s/\)\s?,/\),\n/g" -e "s/'0000-00-00'/NULL/g"  $1 > $2

sed -E -e 's/`/"/g' \ -e "s/float\([0-9,]+\)/FLOAT/g" \ -e "s/binary\([0-9]+\)/BYTES/Ig" \ -e 's/int\(.*\)/int/' \ -e 's/DEFAULT CHARSET=ascii//' \ -e 's/ENGINE=InnoDB //' \ -e 's/DEFAULT CHARSET=utf8//' \ -e 's/DEFAULT NULL//' \ -e 's/^LOCK/--LOCK/' \ -e 's/^UNLOCK/--UNLOCK/' \ -e 's/COMMENT=.*$/;/' \ -e 's/COMMENT\ .*$/,/' \ -e 's/int NOT NULL AUTO_INCREMENT/SERIAL NOT NULL/' \ -e 's/AUTO_INCREMENT=.* //' \ -e 's/COLLATE utf8_unicode_ci//' \ -e 's/COLLATE=utf8_unicode_ci//' \ -e '/PRIMARY/!s/KEY/index/' \ -e 's/tinyint/SMALLINT/' \ -e 's/int NOT NULL DEFAULT \x270\x27/int NOT NULL DEFAULT 0/' \ -e 's/USING BTREE//' \ -e 's/\x27\\\x27\x27/e\x27\\x27\x27/g' \ -e 's/mediumtext/text/g' \ -e 's/FOREIGN index/FOREIGN KEY/g' \ -e 's/bit\(1\)/BOOL/g' \ -e "s/b'0'/false/g" \ -e "s/b'1'/true/g" \ -e 's/datetime/TIMESTAMP/g' \ -e 's/tinyblob/BYTES/g' \ -e 's/ double / DOUBLE PRECISION /g' \ -e "s/timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'/TIMESTAMP/g" \ -e 's/ ON UPDATE CURRENT_TIMESTAMP//g' \ -e 's/ CHARACTER SET ascii //g' \ -e 's/ int unsigned NOT NULL AUTO_INCREMENT/ BIGSERIAL NOT NULL/g' \ $1 > $2


B=$(awk 'BEGIN {RS=";"}/BOOL/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')echo "patching bool inserts"for b in $Bdo #need to copy the patched one, cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin sh patbools.sh 
sed -E -e 's/`/"/g' \
 -e "s/float\([0-9,]+\)/FLOAT/g" \
 -e "s/binary\([0-9]+\)/BYTES/Ig" \
 -e 's/int\(.*\)/int/' \
 -e 's/DEFAULT CHARSET=ascii//' \
 -e 's/ENGINE=InnoDB //' \
 -e 's/DEFAULT CHARSET=utf8//' \
 -e 's/DEFAULT NULL//' \
 -e 's/^LOCK/--LOCK/' \
 -e 's/^UNLOCK/--UNLOCK/' \
 -e 's/COMMENT=.*$/;/' \
 -e 's/COMMENT\ .*$/,/' \
 -e 's/int NOT NULL AUTO_INCREMENT/SERIAL NOT NULL/' \
 -e 's/AUTO_INCREMENT=.* //' \
 -e 's/COLLATE utf8_unicode_ci//' \
 -e 's/COLLATE=utf8_unicode_ci//' \
 -e '/PRIMARY/!s/KEY/index/' \
 -e 's/tinyint/SMALLINT/' \
 -e 's/int NOT NULL DEFAULT \x270\x27/int NOT NULL DEFAULT 0/' \
 -e 's/USING BTREE//' \
 -e 's/\x27\\\x27\x27/e\x27\\x27\x27/g' \
 -e 's/mediumtext/text/g' \
 -e 's/FOREIGN index/FOREIGN KEY/g' \
 -e 's/bit\(1\)/BOOL/g' \ 
 -e 's/datetime/TIMESTAMP/g' \
 -e 's/tinyblob/BYTES/g' \
 -e 's/ double / DOUBLE PRECISION /g' \
 -e "s/timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'/TIMESTAMP/g" \
 -e 's/ ON UPDATE CURRENT_TIMESTAMP//g' \
 -e 's/ CHARACTER SET ascii //g' \
 -e 's/ int unsigned NOT NULL AUTO_INCREMENT/ BIGSERIAL NOT NULL/g' \
 $1 > $2
inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql rm inserts/$b-e-inserts.sql-bytesindone#BYTES need very special treatment#B=$(awk 'BEGIN {RS=";"}/BYTES/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')#echo "patching byte inserts"#for b in $B#do #need to copy the patched one, as some may be both bool and bytes #cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin #sh patbytes.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql #rm inserts/$b-e-inserts.sql-bytesin#done
rm inserts/*e-inserts.sqlF=inserts/*e-inserts.sql-unpatched
sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sqlsh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sqlrm inserts/*e-inserts.sqlF=inserts/*e-inserts.sql-unpatchedfor fx in $Fdo P=$(echo $fx | grep -Eo ".*\.sql") sh patinserts.sh $fx $PdoneB=$(awk 'BEGIN {RS=";"}/BOOL/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')echo "patching bool inserts"for b in $Bdo #need to copy the patched one, cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin sh patbools.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql rm inserts/$b-e-inserts.sql-bytesindone#BYTES need very special treatment#B=$(awk 'BEGIN {RS=";"}/BYTES/{print}' e-table-creates.sql | grep -i "CREATE TABLE" | grep -iEo '".*"' | grep -iEo '[^"]*')#echo "patching byte inserts"#for b in $B#do #need to copy the patched one, as some may be both bool and bytes #cp inserts/$b-e-inserts.sql inserts/$b-e-inserts.sql-bytesin #sh patbytes.sh inserts/$b-e-inserts.sql-bytesin inserts/$b-e-inserts.sql #rm inserts/$b-e-inserts.sql-bytesin#done
# this splits it up into Create, constrainsts and inserts# 1 .. intput filr# 2 .. table definition# 3 .. constraints# 5 .. insert fileegrep -Ei "CREATE TABLE[^\`]+" $1 | grep -oEi "\`[^\`]+\`" | grep -iEo '[^\`]*' > e-t.u

DB=$(grep -iE "CREATE DATABASE" $1 | grep -oiE "\`.*+\`" | grep -oiE "[^\`]+")

sh patcdb.sh e-table-creates.sql-unpatched e-table-creates.sqlsh patcdb.sh e-table-alters.sql-unpatched e-table-alters.sql
# redefine the tables without the constraints
echo "creating $2
"echo "" > $2;
while read LINE
 do 
  awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iEv "constraint" | sed -Ee 'N;/\n\)/s/,\)*\n/\n/;P;D' >> $2 
done < "e-t.u"
# inserts
rm "inserts/*-$4"
echo "creating $4"
echo "" > $3;
while read LINE 
 do N="inserts/$LINE-$4"
  echo "creating $N" 
  awk "BEGIN{IGNORECASE=1} /INSERT INTO \`$LINE\`/,/;/" $1 | sed -E -e "s/INSERT INTO \`$LINE\` VALUES//" > $N 
 done < "e-t.u"

# create an alters sql
echo "creating $3"
echo "" > $3;
while read LINE
 do
   awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iE \
    "constraint" | sed \
    -Ee "s/CONSTRAINT/ALTER TABLE \`$LINE\` ADD CONSTRAINT/Ig" | sed -Ee "N;/\n\)/s/,\)*\n/\n/;P;D" | sed -Ee "s/$/;/g" >> $3 
done < "e-t.u"


# because a ddl file could be in the wrong order for dependenices# this splits it up into Create, constrainsts and inserts# 1 .. intput filr# 2 .. table definition# 3 .. constraints# 5 .. insert fileegrep -Ei "CREATE TABLE[^\`]+" $1 | grep -oEi "\`[^\`]+\`" | grep -iEo '[^\`]*' > e-t.uDB=$(grep -iE "CREATE DATABASE" $1 | grep -oiE "\`.*+\`" | grep -oiE "[^\`]+")## create an alters sqlecho "creating $3"echo "" > $3;while read LINE do awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iE \ "constraint" | sed \ -Ee "s/CONSTRAINT/ALTER TABLE \`$LINE\` ADD CONSTRAINT/Ig" | sed -Ee "N;/\n\)/s/,\)*\n/\n/;P;D" | sed -Ee "s/$/;/g" >> $3 done < "e-t.u"# redefine the tables without the constrainstecho "creating $2"echo "" > $2;while read LINE do awk "BEGIN{IGNORECASE=1} /CREATE TABLE \`$LINE\`/,/;/" $1 | grep -iEv "constraint" | sed -Ee 'N;/\n\)/s/,\)*\n/\n/;P;D' >> $2 done < "e-t.u"# insertsrm "inserts/*-$4"echo "creating $4"echo "" > $3;while read LINE do N="inserts/$LINE-$4" echo "creating $N" #echo "DELETE FROM \`$LINE\` WHERE true;" > $N awk "BEGIN{IGNORECASE=1} /INSERT INTO \`$LINE\`/,/;/" $1 | sed -E -e "s/INSERT INTO \`$LINE\` VALUES//" > $N done < "e-t.u"



Converting and loading the ddl

There are 3 parts to this

# create separate alter and table filesh alters.sh e.sql-orig \ e-table-creates.sql-unpatched \ e-table-alters.sql-unpatched \ e-inserts.sql-unpatched#split the file into creation.constraint.insertssh splitpart.sh#patch the syntaxsh patpart.sh#load to dbsh dockpart.sh


sudo docker pull cockroachdb/cockroach:v1.1.4






Why not join our forum, follow the blog or follow me on twitter to ensure you get updates when they are available.
Comments