Chapter 4. Editing documents

Table of Contents

Working with Unicode
Opening and saving Unicode documents
Opening and closing documents
Creating new documents
Saving documents
Closing documents
Creating and sharing new document templates
Viewing file properties
Editing XML documents
Associate a schema to a document
Streamline with Content Completion
Debugging XML documents
Document navigation
Grouping documents in XML projects
Including document parts with XInclude
Working with XML Catalogs
Converting between schema languages
Formatting and indenting documents (pretty print)
Viewing status information
XML editor specific actions
Editing XML Schema schemas
Special content completion features
XML Schema diagram
Create an XML Schema from a relational database table
XML Schema Instance Generator
Flatten an XML Schema
XML Schema regular expressions builder
Generating HTML documentation for an XML Schema
XML Schema editor specific actions
Search References and Declarations
Editing Relax NG schemas
Relax NG schema diagram
Relax NG editor specific actions
Search References and Declarations
Editing XSLT stylesheets
Validating XSLT stylesheets
Content Completion in XSLT stylesheets
The XSLT Input View
The Stylesheet Templates view
Finding XSLT references and declarations
XSLT refactoring actions
Editing XQuery documents
Generating HTML Documentation for an XQuery Document
Editing CSS stylesheets
Validating CSS stylesheets
Content Completion in CSS stylesheets
Folding in CSS stylesheets
Formatting and indenting CSS stylesheets (pretty print)
Other CSS editing actions
Scratch Buffer

Working with Unicode

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Unicode is an internationally recognized standard, adopted by industry leaders. The Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646.

It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends. Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets.

As a modern XML Editor, <oXygen/> provides support for the Unicode standard enabling your XML application to be targeted across multiple platforms, languages and countries without re-engineering. Internally, the <oXygen/> XML Editor uses 16bit characters covering the Unicode Character set.

Opening and saving Unicode documents

On loading documents of the type XML, XSL, XSD and DTD, <oXygen/> receives the encoding of the document from the Eclipse platform. This is then used to instruct the Java Encoder to load support for and save using the code chart specified.

While in most cases you will use UTF-8, simply changing the encoding name will cause the file to be saved using the new encoding. The appendix Unicode Character Encoding provides a matrix that matches common names with Java Names. It also explains what you should type in the XML prolog to cause the document to be saved as the required encoding.

To edit document written in Japanese or Chinese, you will need to change the font to one that supports the specific characters (a Unicode font). For the Windows platform, use of Arial Unicode MS or MS Gothic is recommended. Do not expect Wordpad or Notepad to handle these encodings. Use Explorer or Word to eventually examine XML documents.

If an XML document which specifies the UTF-16 encoding in the prolog using the attribute encoding="UTF-16" is edited in <oXygen/> and saved on disk the byte order mark (BOM) which always begins such an XML document is created by the save operation according with the byte order accepted by the CPU of that computer. That means that a UTF-16 document created on a Windows + Intel computer, where the byte order mark is UnicodeLittle and loaded and saved in <oXygen/> running on a Mac OS computer, where the byte order mark is UnicodeBig, is saved with the UnicodeBig encoding.

Note

The naming convention used under Java does not always correspond to the common names used by the Unicode standard. For instance, while in XML you will use encoding="UTF-8", in Java the same encoding has the name "UTF8".