エンコーディングに関係する関数など

  • ファイルの書き出しでエンコーディングの指定はfileEncoding=""というようなデフォルト指定になっている
  • デフォルトはgetOption("encoding")のようにしてとってくる
  • コンピュータが使っているエンコーディングをRの中でも使うのがデフォルトで、Rの中で使うエンコーディングをローカルに指定しなおすこともできて(connection とかそんな概念らしい…)
  • Rの中でエンコーディングの確認や、エンコーディング変換もできるらしい(Encoding(),iconv() : iconv()のiはinternationalization. 「国際文字への変換」)
  • 具体的な文字化け問題例と対処はこちら
help(write.table)
write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            eol = "\n", na = "NA", dec = ".", row.names = TRUE,
            col.names = TRUE, qmethod = c("escape", "double"),
            fileEncoding = "")
fileEncoding	
character string: if non-empty declares the encoding to be used on a file (not a connection) so the character data can be re-encoded as they are written. See file.
help(file)
file(description = "", open = "", blocking = TRUE,
     encoding = getOption("encoding"), raw = FALSE)
Encoding

The encoding of the input/output stream of a connection can be specified by name in the same way as it would be given to iconv: see that help page for how to find out what encoding names are recognized on your platform. Additionally, "" and "native.enc" both mean the ‘native’ encoding, that is the internal encoding of the current locale and hence no translation is done.

Re-encoding only works for connections in text mode: reading from a connection with re-encoding specified in binary mode will read the stream of bytes, but mixing text and binary mode reads (e.g. mixing calls to readLines and readChar) is likely to lead to incorrect results.

The encodings "UCS-2LE" and "UTF-16LE" are treated specially, as they are appropriate values for Windows ‘Unicode’ text files. If the first two bytes are the Byte Order Mark 0xFFFE then these are removed as some implementations of iconv do not accept BOMs. Note that whereas most implementations will handle BOMs using encoding "UCS-2" and choose the appropriate byte order, some (including earlier versions of glibc) will not. There is a subtle distinction between "UTF-16" and "UCS-2" (see http://en.wikipedia.org/wiki/UTF-16/UCS-2: the use of surrogate pairs is very rare so "UCS-2LE" is an appropriate first choice.

Requesting a conversion that is not supported is an error, reported when the connection is opened. Exactly what happens when the requested translation cannot be done for invalid input is in general undocumented. On output the result is likely to be that up to the error, with a warning. On input, it will most likely be all or some of the input up to the error.
help(Encoding)
## x is intended to be in latin1
x <- "fa\xE7ile"
Encoding(x)
Encoding(x) <- "latin1"
x
xx <- iconv(x, "latin1", "UTF-8")
Encoding(c(x, xx))
c(x, xx)
Encoding(xx) <- "bytes"
xx # will be encoded in hex
cat("xx = ", xx, "\n", sep = "")