So the user has to know which encoding was used to write the file, because the computer has no way to know.īut meanwhile, computer manufacturers realized that the increasing globalization of the world required a new approach to text. Sadly, files encoded in this way do not contain a "magic byte" defining the encoding. But many more encodings are available in TeXShop Preferences. When the European currency was introduced, IsoLatin1 was extended by adding the Euro symbol, to become IsoLatin9, and TeXShop adopted that as default. One of these encodings, IsoLatin1, contained all of the characters routinely used in Western Europe. Different encodings were invented for each country, each with different characters in those 128 spots.
To solve such problems, the unused 128 bytes in ASCII were used to represent new characters. For instance, scripts in Western Europe use accented vowels, umlauts, upside down question marks, and unusual Scandinavian letters. The original version of TeX expected ASCII input.ĪSCII had difficulties in Europe and in regions of the world that used completely different scripts. There are less than 128 such characters, so ASCII only uses bytes between 0 and 127.
This method encodes all the characters on a standard American typewriter: small letters, capitol letters, punctuation marks, numbers, tabs, carriage return. From the beginning of the personal computer era, a standard encoding of text known as ASCII has been used. A large fraction of computer files are just text files.
Thus a byte stream might represent music on a CD, or a movie on a DVD, or a jpg picture, or a computer program, or an encyclopedia. These are called bytes, and the ability to encode virtually any kind of information into a stream of bytes defines the current digital age. If you examine a CD or DVD with a microscope, you will discover that the disk contains a long stream of whole numbers, each between 0 and 255. To understand the issues, it is best to tell the story from the beginning. These users may want to take this opportunity to switch to UTF-8 Unicode. The rest of this section is for users who ignored encoding issues until now. If you have already switched to UTF-8 or you use an unusual encoding, then you know all about encodings and can stop reading. TeXShop has special default values for users in Japan, and their defaults have not changed.
It affects new users and it also affects old users who install a copy of TeXShop on a new machine for the first time. This change will not affect current TeXShop users because TeXShop doesn't change Preference settings that users may have already set. To match this LaTeX change, the default file encoding for TeXShop files has been changed from IsoLatin9 to UTF-8 Unicode.
These users are likely to find XeLaTeX or LuaLaTeX particularly attractive. Users in the United States with European collaborators and users in Western Europe need only deal with accents, umlauts, and the like, and this font problem is handled with one extra line, which usually comes before the inputenc line:Īppropriate latex commands for users in other parts of the world are beyond my expertise.
The "inputenc" line tells LaTeX how to interprete source code, but it does nothing to guarantee that fonts are used which understand Unicode characters. But from now on, if a source file is encoded in IsoLatin9 and contains non-ASCII characters, the line below is required in the header:īy the way, XeTeX, XeLaTeX, LuaTeX, and LuaLaTeX require Unicode source files. This was the default encoding in Latex, so no inputenc line was required.
For many years the default TeXShop encoding was IsoLatin9, which contained ASCII code but also non-ASCII code for accents, umlauts, and other characters required in Western Europe. Notice that a straight ASCII file is legal UTF-8, so the line above is also not required for ASCII input files. FromĢ018 on, the line is not required for UTF-8 input because Latex expects UTF-8 Unicode source files by default. Such an "inputenc" line tells TeX which encoding was used when the input source file was written.