Perhaps the most exciting and ambitious project of 2018 is the end-of-year release of Part 1 of the Patrologiae Cursus Completus, Series Graeca on December 26, 2018. We talked to both Rick Brannan, the profoundly-expertised-Greek-guru-resident-scholar-guy at Faithlife/Logos, and Kyle Anderson, Content Production Manager at Faithlife, about the process of getting these volumes into a searchable format previously unrealized (his actual title is Information Architect and Team Manager, Content Innovation). There is a fascinating story here concerning the intersection of ancient texts with the Internet, and the importance of making significant works both utilitarian and delightful to end users.
The Patrologia Graeca is “the largest collection ever published of the extant writings of the ante-Nicene Greek Fathers of the Early Church,” the same series upon which Philip Schaff based his translation work for Early Church Fathers. To fully grasp the significance of this monumental work, and see the ancient authors included, you need to read this.
These first 18 volumes (out of a massive 167!) will be made available on the Logos digital library on the day after Christmas, but you can still get in on the pre-order price of $399.99 before December 26, 2018.
TB: Hi Rick, thanks for talking with us about PG. Has a project of this magnitude this ever been attempted by anyone before Logos?
RB: The Thesaurus Lingua Graecae has pretty much every extant Greek author we know of, so it has PG plus scads more of classical stuff (well, minus all the Latin descriptive text found in PG). Five or so years ago, someone essentially bootlegged the PG Greek portions from TLG, reformatted them as PDF, and put them online. They’ve been pretty much excised from the web by the TLG folks. Also, there are “attempts.” One vendor has indexed page scans for years allowing navigation of the scans by author, page, and column, though with Google Books and archive.org, I’m unsure if his product is still viable. The site Documenta Catholica Omnia has done something similar. Also there is an effort to OCR the entire corpus with accuracy at an acceptable level and release the result as open source/open data (see here: https://github.com/OGL-PatrologiaGraecaDev ).
Patrologia Graeca (PG) has always been a unique project because it is huge. It is also important because for several of the works included in the series, it was the last time a Greek text of a work was published. Now, there have been attempts to publish digital editions of PG, but apart from Thesaurus Lingua Graecae (a product targeted at universities and libraries but which only encodes Greek and skips all of the introductory text, which is in Latin), all of the attempts have been facsimile based. That is, they are still just pictures of the page (PDF). One vendor, years ago, had page scans indexed to volume, page, and column, which made looking up citations easy. But it still wasn’t searchable. And, with the advent of archive sites like archive.org and Google Books, a simple facsimile version at least made the pages available, but did not make them searchable. The Logos edition aims to change this.
TB: If the PG is so important, why did it take Logos so long to attempt it themselves?
KA: The long and short answer is that it’s really expensive. At an average of 1000 pages per volume, it was simply a massive amount of material to recreate, especially when one considers having to key each of them twice and then do quality checks on all the pages.
TB: Could you describe the process of creating the volumes?
KA: Each PDF was assigned to two different teams and each team digitized it by the same process–hand keying the resource, no OCRs (software that scans PDFs and images for individual characters, to convert the characters into searchable, editable text) were used—and created two different files. The files are then compared through software to find any discrepancies between the two, fixed accordingly, and merged. The benefits of the double key strategy is minimization of errors as it doubles the accuracy.
TB: What challenges or limitations did the Logos teams face in producing the PG?
KA: The greatest challenge was in the source files themselves; they are extremely rough. There is a reason why this has never been attempted. The PDFs were created from old books which can result in letters that are unclear or illegible. Unless one is fluent in Latin or Greek it can be difficult to get the correct letters.
It is very important to mention that the PG is very much a work in progress and active reporting of words we got wrong in the keying of the resources is actively encouraged, that together we can make these amazing. We sincerely welcome typo reports and suggestions for improvement. The benefits of a digital medium is that mistakes can be fixed easily.
Get the 18-volume PG: Part 1 before the price increases after the pre-order.
Also note that the entire 167-volume Patrologiae Cursus Completus: Series Graeca is currently being bid on. This potentially means that you can get the entire collection at an incredibly low price, as a way of saying “thank you” for helping to fund this incredible resource to publication.
Get yourself a real digital library.