File: pyedit-products/unzipped/docetc/examples/Non-BMP-Emojis/try-surrogates.py
""" ======================================================================================= Demo attempts to use surrogateescape encoding to preserve emojis in Tk's Text widget. See this folder's README.txt for more context. Summary: surragates can preserve content on file saves, but they break both display and editing for both non-BMP emojis, and BMP symbols that otherwise work correctly. Details: 1) Encoding per ASCII doesn't work at all: the GUI hangs in an odd state, and does not show lines with surrogates when it shows anything at all. 2) Encoding per LATIN-1 (or charmap) _almost_ works: file saves do retain emojis in the original text, but emoji surrogates display as odd/random garbage glyphs. Worse, non-emoji Unicode symbols in the supported BMP range that normally work fine display as garbage glyphs in this scheme as well. See Non-BMP-viewed-{surrogates, pyedit}.png for the resulting display of file Non-BMP-Emoji-text-only.txt in both this demo and PyEdit. See the corresponging "-saved" files for the results of a file save in each. 2) Editing lines with with surrogates doesn't work properly either; pastes duplicate text, deletes and moves require multiple strokes (2+), end-lines are broken, etc. This isn't the case for emojis when they are changed to Unicode replacement characters, and is never the case for BMP Unicode symbols when left intact. In short, punt: Tk (up to 8,6) just doesn't support characters outside the Unicode BMP (i.e., UCS2) for display or edit. Surrogates can preserve emojis over file loads+saves, but at the cost of displaying all other BMP Unicode text and symbols, and this is too high a price to pay. Note that this extends to all widgets, not just Text; all non-BMP text displayed in a Tk GUI must be sanitized with Unicode replacement characters for reliable display and edit, at the exppense of possibly losing original non-BMP content. Programs can also insert raw bytes into Text, but it also cannot display or edit symbols properly, and its content always comes back as str with mangled characters. see the "-binary" versions here for tests. ======================================================================================= """ from tkinter import * from tkinter.filedialog import askopenfilename, asksaveasfilename START = '1.0' TRYENC = 'charmap' def load(): # what PyEdit does on Open fn = askopenfilename() ft = open(fn, mode='r', encoding=TRYENC, errors='surrogateescape').read() text.delete(START, END) # store text string in widget text.insert(END, ft) # or START; text=bytes or str text.mark_set(INSERT, START) # move insert point to top text.see(INSERT) # scroll to top, insert set text.see(INSERT) # no, really: see note above def save(): # what PyEdit does on Save fn = asksaveasfilename() ft = text.get(START, END+'-1c') # extract text as str string fo = open(fn, 'w', encoding=TRYENC, errors='surrogateescape') fo.write(ft) fo.close() root = Tk() text = Text(root, relief=RIDGE, borderwidth=2) text.pack() Button(root, text='load', command=load).pack() Button(root, text='save', command=save).pack() root.mainloop()