Importing Tricks - Importing CSV Files With Special Characters

Link: https://support.brilliantdirectories.com/support/solutions/articles/12000001776

The database of all Brilliant Directories sites is collated in UTF-8 format (utf8-general-ci). This is the most universal format available today and allows for nearly all special characters in nearly all languages to be stored in the database correctly. 


However, most desktop spreadsheet programs do a very poor job of handling files collated in UTF-8 format (for example, legacy versions of desktop office applications on macOS and Windows).


Thankfully, Google Sheets, the free spreadsheet application offered by Google, natively handles data in UTF-8 format, making it a good tool to use for creating import files that contain special characters.


How To Format The Import File In UTF-8 Using Google Sheets

  1. Create a new spreadsheet in Google Sheets
  2. Click on File >> Import
  3. Find the existing CSV file already uploaded to Google Drive, or upload it through this dialog by selecting the "Upload" option
  4. Click on "Import"
  5. Click on "Open now" once complete


Next, the data must be downloaded as a CSV file.. This CSV file will be natively encoded in UTF-8 format.  


To download the CSV file, click on File >> Download as >> Comma-separated values.



 

The file that has been downloaded is a CSV file with the correct UTF-8 format.  If this file is opened in Excel or a similar spreadsheet program, the values may look different - this is because desktop spreadsheet programs do not handle UTF-8 files well as mentioned above.  

 

CRUCIAL WARNING:  Do not save the file if it is opened in a desktop spreadsheet program after download. Saving the file locally will likely strip the UTF-8 encoding and corrupt the special characters again.


Next, import the file to the site as normal, and all of the special characters should remain intact.


Google Sheets Limitations

The only downside to using Google Sheets is that it has a limitation on the amount of data a single sheet can handle at a time (currently 2 million cells per spreadsheet).


For large files that contain more than 2 million cells of data, they just have to be broken down into smaller files to be processed correctly.


For example, if the import file contains 8 columns, Google Sheets can process up to 250,000 rows per individual spreadsheet file: 8 columns x 250,000 rows = 2,000,000 cells.