Skip to content

Latest commit

 

History

History
127 lines (79 loc) · 3.19 KB

README.md

File metadata and controls

127 lines (79 loc) · 3.19 KB

utf8convert

This application reads every text and character field in a MySQL database and analyzes each DataPoint (the intersection of a row and column like a cell in a spreadsheet) looking for invalid utf8 sequences. This is a command line only application.

utf8convert database

The utf8convert database stores a deconstructed target database and provides a working environment for storing conversions and converted data.

erd

Invalid utf8 sequence

8°6 crew

This should be rendered as 8°6 crew but the extended ° character has been decoded from utf8 into multibyte component parts. Where there is one there are probably many.

Duvalier’s Dream

This will be corrected to Duvalier’s Dream.

I created this tool to correct every invalid utf8 sequence in my database in a single conversion. My example database finds around 89,000 invalid sequences. Only DataPoints which have been converted will be exported. Valid utf8 characters will be evaluated too and ignored if they are correct.

About

This application was inspired by https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell

Install

composer install
cp config/autoload/local.php.dist config/autoload/local.php
; edit local for specific environment
php public/index orm:schema-tool:create

Use

Validation

Validation occurs before a conversion may be ran.

Step 1: Validate the database. This command will verify all database settings, table data types, and column data types are utf8.

php public/index.php database:validate

If the validate command failed you must correct the problem(s) before continuing.

Create a Conversion

You need to create a conversion. Each conversion requires a name.

whitelist and blacklist are comma delimited lists of table names. If not specified then all tables will be evaluated.

php public/index.php conversion:create [--name=conversionName] [--whitelist=] [--blacklist=]

Run a Conversion

After your conversion has been created you must convert it.

php public/index.php conversion:convert --name=conversionName

Export a Conversion

To copy corrected utf8 data back into your database you must export it:

php public/index.php conversion:export --name=conversionName

Clone a Conversion

To clone a conversion to a new name:

php public/index.php conversion:clone --from=conversionName --to=conversionName

Watch Out For These

Any text field containing data which is not just text may cause a problem.
Examples are a text field to store serialized data or a text field which stores otherwise binary data. If you have any fields like this be sure to set their approved flag in the DataPoint table to 0 (false) before exporting a conversion.

Troubleshooting

If you ever get stuck you can always delete the utf8convert database and start over.