Chapter 4. Working with Documents

Table of Contents

Importing documents
Importing Plain Text
Importing Acrobat PDFs
More about importing documents
Reading a document
Changing the size and font used to display text
Using document text in other applications
Changing a document's title and adding a memo
Deleting a document

Documents are the raw material of your analysis. They may be interview transcripts, field observation notes, texts produced in the field such as posters or leaflets, newspaper articles and so on. At present, Weft only supports the analysis of textual material, though it is hoped to add support for the analysis of images in future versions of Weft QDA.

Importing documents

Documents are added to a project by importing them from files saved on your computer. Weft can import documents in the plain text and PDF formats. The process of of importing a file copies the text in the file into a new document. It also indexes the text in the file so that it can be quickly searched later.

Importing Plain Text

Plain text is a simple file format that contains just letters and punctuation, without special formatting such as bold, colour, or differen-sized type. Plain text is the preferred file format for importing into Weft QDA.

Preparing documents for use in Weft QDA

Plain text can be produced in many common software applications, including word-processing software. To save a file in plain text format from within a word-processing package such as Word or OpenOffice, there is usually a menu option to Save As, and then selecting the option for Plain text (*.txt) in File Type setting in the Save dialog. You may be offered additional options when saving a text file, such as line endings and encoding; the default options are normally suitable. See the section called “More about importing documents” below.

How to import plain text documents

  • Make sure you have a project open into which to import documents.

  • Click the Import button within the Documents & Categories window. Alternatively, choose Project then Import Document from the menu.

  • Find the file or files on your computer that you want to import. To import multiple files in a batch, you should select each one by clicking its name with the CTRL button held down.

  • When ready, click the "Open" button, and then the file or files you selected will be imported as documents. Importing many documents, or long documents may take a little time, as the words in the file are scanned and indexed. A little window shows to update you on progress.

  • Imported documents will appear at the of the document list in Documents and Categories window. Each imported document will be given a document title based on the name of the file from which it was imported. So if you imported a text file called interview.txt , a document called 'interview' would be created. You can change a document's title once it's imported: see the section called “Changing a document's title and adding a memo”.

  • If you import the same file twice, a new, separate copy of the document is created; it will have a number appended to its title to distinguish it from the first copy. So if you imported a file called interview.txt, and there was already a document with the title 'interview' in your project, a new document called 'interivew (1)' would be created.

Tip

Plain text files give the most consistent results when imported into Weft as they can be imported directly without an prior conversion. If possible, acquire or convert your documents into plain text format before importing them.

Importing Acrobat PDFs

Acrobat PDF is a file format that is commonly used on the internet. It is often used, for example, for distributing articles from journals. Although PDFs can contain images and diagrams, Weft is only able to import the textual content of the files.

The process for importing PDFs is broadly the same as for importing plain text documents. When finding files on your computer to import, select "PDF documents" from the menu Files of Type... option list at the bottom of the file chooser window.

Note

Some PDF files are "locked". This is especially common - and especially irritating - among files from some journal article providers. It means that the text in the PDF cannot be copied or extracted, and so cannot be imported into Weft. When trying to import locked PDF files, no text will be imported and a warning will be shown instead.

Note

Note that the ability to use PDF documents depends on the pdftotext utility. Windows users using the installer will have this automatically as part of the bundle. Other users may have to install the software themselves; for more information see the installation chapter.

More about importing documents

Line endings

When saving plain text files, you may be asked whether you want to 'insert line endings' - either option is fine, though you may find that not inserting 'hard line breaks' in the plain text, as Weft will automatically wrap and resize the text in documents automatically when you are reading it.

Text encodings

You may also be asked what 'encoding' you want to use to save a plain text file. On Windows>, for European languages including English, the 'ASCII' 'iso-8859-XX' or 'windows latin' are most suitable. Do not choose 'utf-8' or 'utf-16' encodings, especially if your source document file contains accented characters or special symbols. If you are working with non-Western scripts on Windows, choose your platform's native encoding for that script in preference to a universal encoding like 'utf-8' or 'utf-16' - for example, GB2312 for Chinese.

On Linux, choose a text file encoding that matches your system locale - on most systems, for European languages, this will be an 'iso-8859-XX' encoding. However, some recent versions of Ubuntu, for example, use UTF-8 as platform default encoding, in which case you should save your text files in this encoding.

Maximum document size

Weft is capable of handling large documents. It has been tested with documents as long as three hundred A4 pages. Problems may arise with very large source documents, and if this occurs, it may be possible to split a source document into several parts.

Note that larger documents will take longer to import, and very large documents may cause Weft to work more slowly. Obviously, this will also depend on how fast a computer you are using.

Search indexes

When a document is imported, its textual content is scanned and indexed so that word searches can later be performed quickly. See Chapter 6, Searches, Queries and Code Reviews for more information.