Gutenberg-tar

The Project Gutenberg Library on Your Computer

gutenberg_txt.7z
permits convenient downloading and creation a local copy of Project Gutenberg txt files. It does not include any images, audio files or other non-txt data.

You can acquire these files from the original website but our format is simply and conveniently accessible without special software. Only an ordinary browser is required.

The download file (gutenberg_txt.7z) is 6GB. 70GB of disk space is required to expand the archive: 35GB for the intermediate tar file and another 35GB for the 74,489 files. The tar file can be deleted if not used to create additional copies of the expanded files.

After downloading, first use the 7-Zip File Manager or other appropriate software to expand to the tar format. Then extract the Gutenberg files from the tar archive using 7-Zip a second time.

Here are the detail installation instructions for the thumb drive version listed below. It applies to all the other compressed versions with appropriate name replacement.

With a one MByte/second network connection downloading the entire archive from Gutenberg can take a month depending the frequency of interruptions and throttling by your ISP and other systems in the path to the server. Our txt download would be about 2 hours.


Gutenberg Library on Disk

To see the current offerings on ebay enter the search value gutenberg-tar.

Complete Library -- Uncompressed

All works in the Gutenberg archive posted up to March 1, 2013 plus the last html version of Wikipedia (2008) can be acquired as a 1Tb disk drive and maintained by downloading only the material posted after that date. The disk includes the uncompressed files for immediate direct access. We strongly suggest that you copy everything to another disk and keep the original disk as a backup. The Wikipedia 2008 in html format is included in compressed format--tar and 7zip.

Here are the lists of files in each collection:
Gutenberg base collection
Gutenberg generated collection

Text Files Only -- Compressed

This collection has all the txt files but excludes all other types--images, audio, pdf and all other formats. This is the same material as gutenberg_txt.7z (above). It is available as an 8GB USB thumb drive or two DVDs.

Here are the lists of files:
Gutenberg txt collection- Thumb Drive
Gutenberg txt collection- DVD12
Gutenberg txt collection- DVD39

The thumb drive combines these two DVD archives.

Complete Gutenberg Library and Wikipedia 2008 -- Uncompressed

This 2TB disk includes the uncompressed files for the Gutenberg and Wikipedia 2008.

Here are the lists of files in each collection:
Gutenberg base collection
Gutenberg generated collection

FAQ

Disk Format

The Gutenberg and Wikipedia 2008 archives uses huge files and contains a huge number of files--hundreds of thousands. When copying or expanding an archive do not use FAT formatting. Use NTFS or native MAC or UNIX file systems. USB thumb drives are usually FAT and must be reformatted to NTFS.

Searching and Browsing

Search for ebooks by exploring the GUTINDEX.ALL file. Use an editor or browser search function (cntl-F) to find the work by title or author. Use the listed ebook number (on the right side of the record) to identify and find the work.

Use Agent Ransack to search for file content. It is included in the programs directory of each disk. A sample search of the entire txt archive took 40 minutes.

Gutenberg file structure

The Gutenberg library consists of the files uploaded by the proofreaders. Most works are txt and html format, but may also include jpg, pdf, doc, mp3 and other formats. The contents are also packaged in zip archives. The Gutenberg library is organized using the digits of the ebook number. The top level directory "1" (and its subdirectories) contain all ebooks beginning with "1". Each succeeding digit is another subdirectory, except for the last digit. The ebook number itself is the name of the leaf directory. There is often additional subdirectories for different representation of the work and associated images or audio files. Thus the following ebooks are located as follows:

12345 /1/2/3/4/12345
1234 /1/2/3/1234
123 /1/2/123
12 /1/12

Gutenberg Generated File Structure

These files are produced automatically from the submitted files. These are mobi, plucker, qioo and other formats. The material for each ebook is contained in a directory with the same name/number as the book, just as with the original structure. Unlike the above original structure, all the ebook directories are in "flat" structure; everything is in the root directory.

Preservation and Distribution

Keep the original disk in a safe place. All working disks eventually stop working. Better yet make a copy of the original disk--always have two copies of anything you value. Even better, make four copies of the original disk and sell or give the other two to friends. If everyone did this within a week of receipt of their original, in less than a year everyone on Earth would have a copy.