Go to the first, previous, next, last section, table of contents.

Format conversions

Conversion between (La)TeX and others

troff
troff-to-latex (available as support/troff-to-latex), written by Kamal Al-Yahya at Stanford University (California, USA), assists in the translation of a troff document into LaTeX format. It recognises most -ms and -man macros, plus most eqn and some tbl preprocessor commands. Anything fancier needs to be done by hand. Two style files are provided. There is also a man page (which converts very well to LaTeX...). The program is copyrighted but free. An enhanced version of this program, tr2latex, is available in support/tr2latex

The DECUS TeX distribution (see sources of software) also contains a program which converts troff to TeX.

WordPerfect
wp2latex (available as support/wp2latex) has recently been much improved, and is now available either for MS-DOS or for Unix systems, thanks to its current maintainer Jaroslav Fojtik.
PC-Write
pcwritex.arc, from support/pcwritex, is a print driver for PC-Write that ``prints'' a PC-Write V2.71 document to a TeX-compatible disk file. It was written by Peter Flynn at University College, Cork, Republic of Ireland.
runoff
Peter Vanroose's ([email protected]) conversion program is written in VMS Pascal. The sources and a VAX executable are available from support/rnototex
refer/tib
There are a few programs for converting bibliographic data between BibTeX and refer/tib formats. They are in biblio/bibtex/utils/refer-tools

In spite of the directory name, it also contains a shell script to convert BibTeX to refer as well. The collection is not maintained.

RTF
A program for converting Microsoft's Rich Text Format to TeX is available in support/rtf2tex, which was written and is maintained by Robert Lupton ([email protected]). There is also a convertor to LaTeX by Erwin Wechtl, in support/rtf2latex

Translation to RTF may be done (for a somewhat constrained set of LaTeX documents) by TeX2RTF, which can produce ordinary RTF, Windows Help RTF (as well as HTML, conversion to HTML). TeX2RTF is supported on various Unix platforms and under Windows 3.1; it is available from support/tex2rtf

Microsoft Word
A rudimentary program for converting MS-Word to LaTeX is wd2latex, for MS-DOS (dviware/wd2latex); a better idea, however, is to convert the document to RTF format and use one of the RTF converters mentioned above.

A FAQ that deals specifically with conversions between TeX-based formats and word processor formats is regularly posted to comp.text.tex, is available via http://www.kfa-juelich.de/isr/1/texconv/texcnv.html and is archived as help/wp-conv/wp-conv.zip

A group at Ohio State University (USA) is working on a common document format based on SGML, with the ambition that any format could be translated to or from this one. FrameMaker provides ``import filters'' to aid translation from alien formats (presumably including TeX) to Framemaker's own.

Conversion from (La)TeX to plain ASCII

The aim here is to emulate the Unix nroff, which formats text as best it can for the screen, from the same input as the Unix typesetting program troff.

Ralph Droms ([email protected]) has a style file and a program that provide the LaTeX equivalent of nroff, though it doesn't do a good job with tables and mathematics. The software is available in support/txt; the original dvi2tty often does an acceptable job and is available in dviware/dvi2tty

Another possibility is to use screen.sty (available as macros/latex209/contrib/misc/screen.sty). Use a dvi2tty program of some kind; you might try dviware/crudetype as well. Another possibility is to use the LaTeX-to-ASCII conversion program, l2a (support/l2a), although this is really more of a de-TeXing program.

The canonical de-TeXing program is detex (support/detex), which removes all comments and control sequences from its input before writing it to its output. Its original purpose was to prepare input for a dumb spelling checker.

Conversion from SGML or HTML to TeX

SGML is a very important system for document storage and interchange, but it has no formatting features; its companion ISO standard DSSSL (http://www.jclark.com/dsssl/) is designed for writing transformations and formatting, but this has not yet been widely implemented. Some SGML authoring systems (e.g., SoftQuad Author/Editor) have formatting abilities, and there are high-end specialist SGML typesetting systems (e.g., Miles33's Genera). However, the majority of SGML users probably transform the source to an existing typesetting system when they want to print. TeX is a good candidate for this. There are three approaches to writing a translator:

  1. Write a free-standing translator in the traditional way, with tools like yacc and lex; this is hard, in practice, because of the complexity of SGML.
  2. Use a specialist language designed for SGML transformations; the best known are probably Omnimark and Balise. They are expensive, but powerful, incorporating SGML query and transformation abilities as well as simple translation.
  3. Build a translator on top of an existing SGML parser. By far the best-known (and free!) parser is James Clark's nsgmls, and this produces a much simpler output format, called ESIS, which can be parsed quite straightforwardly (one also has the benefit of an SGML parse against the DTD). Two good public domain packages use this method: Both of these allow the user to write `handlers' for every SGML element, with plenty of access to attributes, entities, and information about the context within the document tree.

    If these packages don't meet your needs for an average SGML typesetting job, you need the big commercial stuff.

Since HTML is simply an example of SGML, we do not need a specific system for HTML. However, Nathan Torkington ([email protected]) developed html2latex from the HTML parser in NCSA's Xmosaic package. The program takes an HTML file and generates a LaTeX file from it. The conversion code is subject to NCSA restrictions, but the whole source is available as support/html2latex

Michel Goossens and Janne Saarela published a very useful summary of SGML, and of public domain tools for writing and manipulating it, in TUGboat 16(2).

(La)TeX conversion to HTML

TeX is a typesetting language, not a markup system. With properly-used LaTeX, you may be luckier, but don't expect a free lunch. Remember that a) if you want a really good Web document, you had better redesign it from scratch, and b) HTML (even HTML3) has pretty poor `typesetting' facilities, and anything beyond the trivial will probably need to end up a graphic.

LaTeX2HTML (support/latex2html) is a package by Nikos Drakos (mostly of perl scripts) that breaks up a LaTeX document into one or more components, and links them together so that they can be read over the World-Wide Web as an hypertext document. It defines a mapping between LaTeX intra-document references and hyperlinks, and extends the mechanisms to permit reference to other (possibly remote) documents and other Internet resources. It translates LaTeX accented and other characters (as best it can) to things that World-Wide Web browsers can display, and translates mathematics (and other things that browsers can't deal with) to images that can be loaded in-line into the hypertext document.

LaTeX2HTML needs Perl, the PBM utilities, dvips, Ghostscript, and other sundries; it assumes it is running on a Unix system. Michel Goossens and Janne Saarela published a detailed discussion of LaTeX2HTML, and how to tailor it, in TUGboat 16(2).

There are two alternative strategies:

  1. Free-standing LaTeX to HTML translations. Hard, but not impossible. Julian Smart's latex2rtf (available from support/latex2rtf) does a plausible job on a subset of LaTeX;
  2. Writing an HTML-output backend in LaTeX itself. See Sebastian Rahtz' paper in TUGboat 16(3) for a discussion of how to go about this for the general case of SGML.

Making hypertext documents from TeX

If you want on-line hypertext with a (La)TeX source, probably on the World Wide Web, consider four technologies (which overlap):

  1. Try direct LaTeX conversion to HTML; see (La)TeX conversion to HTML;
  2. Rewrite your document using Texinfo (see Texinfo macro package), and convert that to HTML;
  3. Look at Adobe Acrobat, an electronic delivery system guaranteed to preserve your typesetting perfectly. See Making Acrobat documents from LaTeX;
  4. Invest in the hyperTeX conventions (standardised \special commands); there are supporting macro packages for plain TeX and LaTeX).

The HyperTeX project aims to extend the functionality of all the LaTeX cross-referencing commands (including the table of contents) to produce \special commands which are parsed by DVI processors conforming to the HyperTeX guidelines; it provides general hypertext links, including those to external documents.

The HyperTeX specification says that conformant viewers/translators must recognize the following set of \special commands:

href:
html:<a href = "href_string">
name:
html:<a name = "name_string">
end:
html:</a>
image:
html:<img src = "href_string">
base_name:
html:<base href = "href_string">

The href, name and end commands are used to do the basic hypertext operations of establishing links between sections of documents.

Further details are available on http://xxx.lanl.gov/hypertex/; there are two commonly-used implementations of the specification, a modified xdvi and (recent releases of) dvips. Output from the latter may be used in recent releases of Ghostscript or Acrobat Distiller.

Making Acrobat documents from LaTeX

There are three general routes to Acrobat output: Adobe's original `distillation' route (via PostScript output), conversion of an DVI file, and the use of a direct PDF generator such PDFTeX (see the PDFTeX project) or MicroPress's VTeX (see commercial TeX implementations).

For simple documents (with no hyper-references), you can either

To translate all the LaTeX cross-referencing into Acrobat links, you need a LaTeX package to suitably redefine the internal commands. There are two of these for LaTeX, both capable of conforming to the HyperTeX specification (see Making hypertext documents from TeX): Sebastian Rahtz's hyperref (available from macros/latex/contrib/supported/hyperref), and Michael Mehlich's hyper (available from macros/latex/contrib/supported/hyper). Hyperref uses a configuration file to determine how it will generate hypertext; it can operate using PDFTeX primitives, the hyperTeX \specials, or DVI driver-specific \special commands. Both dvips or Y&Y's \ProgName|DVIPSONE| to translate the DVI into PostScript acceptable to Distiller.

There is no free implementation of all of Adobe Distiller's functionality, but Ghostscript (version 4.00 onwards) provides some restricted distilling capability (note the restrictions on the fonts it can use). However, Distiller itself is now remarkably cheap (for academics at least).

For viewing (and printing) the resulting files, Adobe's Acrobat Reader is available for a wide range of platforms (see ftp://ftp.adobe.com/pub/adobe/acrobatreader). For those platforms for which Adobe's reader is unavailable, GhostScript (versions 3.51 onwards) can display and print PDF files.


Go to the first, previous, next, last section, table of contents.