Tuesday, June 12, 2012

Compiling Enca for Windows

Enca – the command line utility that allows you to determine the encoding of any text file. It may be necessary when the system receives a set of text files in various encodings. We need to clearly know the encoding of a particular file and re-encode it if necessary to some common encoding, for example, UTF-8. The enca program allows to identify and convert encoding if necessary (enconv utility is also included). However, all this works under Unix. In this article I'll show you how to compile enca under Windows.

On the Internet you can find already compiled distributions of enca for Windows, but they may not be the last version, and besides, most importantly, can contain malicious code and viruses. Therefore it is better and more correct to compile everything themselves.

To compile you need to download the source code of the program from the developer's site. At the time of this writing the latest version was numbered 1.13. The source code is available in the archives, so it should be unpacked in a convenient location, such as, E:\enca\enca-1.13.

Now you need a compiler. You can use one from the set of MinGW. It can be easily downloaded and installed. After installation, you must run the MSYS. Usually this is the file D:\MinGW\msys\1.0\msys.bat, and you should run it.

When the console opens, you will need to go to the enca source code directory, using the command

cd E:/enca/enca-1.13

Next, run the commands to configure and compile:

./configure
make
make install

Compiled program is located in the folder D:\MinGW\msys\1.0\local\bin.

But you cannot run the program outright, because it is not prepared for the Windows environment. To prepare it, you need to make the minor changes in the source code.

Let's begin from the file E:\enca\enca-1.13\src\filebuf.c

At the beginning the program chooses a method of generating random numbers:

/* Good random seed source, prefer urandom, this is not a crypto app. */
#if !(defined RANDOM_FILE) && (defined HAVE__DEV_URANDOM)
# define RANDOM_FILE "secret.txt"
#endif /* HAVE__DEV_URANDOM */
#if !(defined RANDOM_FILE) && (defined HAVE__DEV_ARANDOM)
# define RANDOM_FILE "/dev/arandom"
#endif /* HAVE__DEV_ARANDOM */
#if !(defined RANDOM_FILE) && (defined HAVE__DEV_RANDOM)
# define RANDOM_FILE "/dev/random"
#endif /* HAVE__DEV_RANDOM */
#if !(defined RANDOM_FILE) && (defined HAVE__DEV_SRANDOM)
# define RANDOM_FILE "/dev/srandom"
#endif /* HAVE__DEV_SRANDOM */

We remove the variable RANDOM_FILE, so the program did not ask for a nonexistent file:

/* reset random file in Windows */
#undef RANDOM_FILE

By doing so, the program will generate random numbers based on the timestamp. That's too bad that if the same timestamp is used, there will be the same temporary file. If desired, you can improve it.

Further, in the function file_temporary you need to set the path to the Windows directory of temporary files:

/* 'template' for temporary file creation */
static const char *TMP_PREFIX = "C:\\WINDOWS\\Temp\\" PACKAGE_TARNAME;

Here, in this function, the program creates a temporary file that is used to convert. When the program creates it, a command immediately sent to its removal. This is acceptable in the POSIX-systems, but Windows does not understand that mechanism. Windows is trying to delete a file when it is still used. Naturally, this causes an error. Therefore, in this place you need to comment the deletion, and delete the file when the conversion is completed:

/* here, we have a unique temporary file opened readwrite */
if (ulink)
  //file_unlink(file->name);

The transformation is completed in a file E:\enca\enca-1.13\src\convert_iconv.c in the function convert_iconv. Its ending should be modified as follows:

  const char *tmp_name;
  tmp_name = _strdup(tempfile->name);
  file_free(tempfile);
  file_unlink(tmp_name);
  enca_free(tmp_name);
  do_iconv_close(icd);
  
  return err;

That is a temporary variable named as temporary file is created, then after closing the file, this variable is used to delete this file.

After all of this manipulations with the source code, enca can be compiled, run and used in a Windows environment.

Example of use:

enca -L ru -x UTF-8 < %_in% > %_out%

Here is a text file encoding conversion in Russian. At the entrance - a file in the unknown encoding, at the output - in the UTF-8. By default, enca overwrites the file in the new encoding. If you want to save the old file and write a new, use the output redirection, as in the example.

0 comments:

Post a Comment