File: pyedit-products/unzipped/docetc/examples/Non-BMP-Emojis/try-surrogates.py

"""
=======================================================================================
Demo attempts to use surrogateescape encoding to preserve emojis in Tk's Text widget.
See this folder's README.txt for more context.

Summary: surragates can preserve content on file saves, but they break both display
and editing for both non-BMP emojis, and BMP symbols that otherwise work correctly.

Details:

1) Encoding per ASCII doesn't work at all: the GUI hangs in an odd state, and does
   not show lines with surrogates when it shows anything at all.

2) Encoding per LATIN-1 (or charmap) _almost_ works: file saves do retain emojis in 
   the original text, but emoji surrogates display as odd/random garbage glyphs. 
   Worse, non-emoji Unicode symbols in the supported BMP range that normally work 
   fine display as garbage glyphs in this scheme as well.  

   See Non-BMP-viewed-{surrogates, pyedit}.png for the resulting display of file
   Non-BMP-Emoji-text-only.txt in both this demo and PyEdit.  See the corresponging 
   "-saved" files for the results of a file save in each.

2) Editing lines with with surrogates doesn't work properly either; pastes duplicate
   text, deletes and moves require multiple strokes (2+), end-lines are broken, etc.  
   This isn't the case for emojis when they are changed to Unicode replacement 
   characters, and is never the case for BMP Unicode symbols when left intact.

In short, punt: Tk (up to 8,6) just doesn't support characters outside the Unicode
BMP (i.e., UCS2) for display or edit.  Surrogates can preserve emojis over file 
loads+saves, but at the cost of displaying all other BMP Unicode text and symbols,
and this is too high a price to pay.  

Note that this extends to all widgets, not just Text; all non-BMP text displayed 
in a Tk GUI must be sanitized with Unicode replacement characters for reliable 
display and edit, at the exppense of possibly losing original non-BMP content.

Programs can also insert raw bytes into Text, but it also cannot display or edit
symbols properly, and its content always comes back as str with mangled characters.
see the "-binary" versions here for tests.
=======================================================================================
"""

from tkinter import *
from tkinter.filedialog import askopenfilename, asksaveasfilename
START  = '1.0'
TRYENC = 'charmap'

def load():
    # what PyEdit does on Open
    fn = askopenfilename()
    ft = open(fn, mode='r', encoding=TRYENC, errors='surrogateescape').read()  
    text.delete(START, END)              # store text string in widget
    text.insert(END, ft)                 # or START; text=bytes or str
    text.mark_set(INSERT, START)         # move insert point to top
    text.see(INSERT)                     # scroll to top, insert set
    text.see(INSERT)                     # no, really: see note above

def save():
    # what PyEdit does on Save
    fn = asksaveasfilename()
    ft = text.get(START, END+'-1c')      # extract text as str string
    fo = open(fn, 'w', encoding=TRYENC, errors='surrogateescape')
    fo.write(ft)
    fo.close()

root = Tk()
text = Text(root, relief=RIDGE, borderwidth=2)
text.pack()
Button(root, text='load', command=load).pack()
Button(root, text='save', command=save).pack()
root.mainloop()



[Home page] Books Code Blog Python Author Train Find ©M.Lutz