Computerization
t the end of 2005, Prof. Yaacov Choueka was invited to join the FGP Directors as Chief Computerization Scientist, to design, implement and oversee the computerization aspects of the project. Since then a Computerization Unit – called Genazim – was created.
From its offices in Jerusalem, the Genazim team consisting of 15 programmers and consultants – created the algorithms, software and databases necessary to support the project for many years to come. Development of the website and the input procedure is expected to continue until 2010.
Defining the Genizah World
& its Computerization
Shelfmarks
The shelfmark of a fragment is the number or formal code-name assigned to it by the library where it is located. This is the unique and fundamental way by which the fragment is recognized, cited and mentioned. Hence, it was necessary to first build a computerized list of all shelfmarks encompassing all Genizah manuscripts found in different locations worldwide. Each and every unit of information relevant to any particular manuscript can then be attached to this shelfmark.
The shelfmark is supposed to be a fixed-forever name of the fragment, somewhat similar to a Social Security number of a person, which does not change when that person changes addresses, employments or marriage status.
The world of shelfmarks today, in contrast, can be described in many cases as a “loosely controlled chaos.” Libraries very often change shelfmarks when they are reorganizing their inventory. Fragments often change locations and therefore also shelfmarks. Different encoding systems are used by different libraries. Sometimes even within the library itself no uniform policy is maintained. And finally, researchers, when mentioning a fragment, often refer to it not through its shelfmark, but by mentioning the number of a microfilm roll in which it appears or by mentioning the corresponding entry number in some published catalog, or by citing a paper or a book in which it is discussed.
The computer team had to ensure that a formal unique shelfmark was determined for every physical manuscript, and that all other variants or “alternate” names are recognized by the computer and attached to the formal one.
This complex task of recognizing a shelfmark through any of its numerous variants and attaching it to the intended formal shelfmark, even before processing the data associated with it, caused a great amount of work in almost every computerization task in which Genazim was involved. Although complex and time-consuming, without it the crucial database at the heart of FGP would itself have been chaotic, lacking integrity and reliability.
In order to solve this problem in Genizah processing for the future, the team introduced the concept of a fixed FGP number.
Libraries and Collections
In order to begin the computerization agenda, the Genazim team needed a comprehensive list of libraries, institutions and private collections in possession of Genizah manuscripts. Amazing as it may seem, during more than a hundred years of prolific Genizah research, no such list, or even close to one, was ever compiled. The Genazim team created one.
This list is the blueprint by which FGP's Computerization Unit planned its activities in inventory compilation and digital imaging tasks.
Inventories
One of the aims the Computerization Unit wished to achieve was to get an accurate, updated and comprehensive computerized inventory of all shelfmarks in a given collection.
The Genazim team made the necessary contacts with the library authorities to convince them to create accurate inventories while covering their expenses. Libraries that did not have the human resources needed for this task were assigned a competent person to go there and do the job. In all these cases, the received inventories went through a series of sophisticated computerized as well as manual analyses in order to check for the consistency, integrity, non-ambiguity and completeness of the inventory and the data therein. Hundreds of errors and inconsistencies were found in such lists.
In other cases, when an agreement for digitizing the collection was quickly reached, the team skipped the inventory step, building the inventory from the actual images, these being, of course, the ultimate level of authenticity.
Images
Since the discovery of the Cairo Genizah at the beginning of the 20th century and until about the mid-sixties of that century, the only way to research a Genizah fragment was to travel to the library where it was located – be it Cambridge, Paris, Kiev or London – in order to look at it there (subject, of course, to its availability at the manuscripts department).
Around the mid-sixties, the Institute of Microfilmed Hebrew Manuscripts (IMHM) essentially completed its gigantic task of microfilming all Hebrew manuscripts all over the world, including the ones of the Cairo Genizah. Since then, life was somewhat easier for scholars because they could study the microfilmed fragments in Jerusalem. Even though these improvements were very helpful, soon enough the drawbacks of such a framework became apparent: poor images in many cases, no easy capabilities of magnifying or otherwise changing the parameters of the image to see it more clearly, missing images (or duplicate ones), no clear shelfmarks on the microfilm, strict hours of opening and closing of the library and the non-availability of microfilm readers.
In response to these obstacles, one of the major missions of Genazim is to make available on the Internet high quality (600 DPI – as required by the Research Libraries International Organization), full color digital images of every side of every Genizah fragment, large or small, written or blank. This way anyone interested can study any Genizah fragment in his office or at home, at any time and without any limitations. By using a powerful viewer, a user can magnify the image (to the maximum extent), change its contrast, brightness, colors, etc. – thus making fragments that were thought to be illegible – readable once again.
Catalogs
The most obvious source of data to be attached to shelfmarks is the published catalogs of Genizah collections or of Hebrew manuscripts in general that also mention Genizah-related material. Though hard to believe, no comprehensive list of such catalogs was available anywhere. The Genazim team created one.
The team made every effort to cover all types of catalogs besides the published ones, and so it included also typewritten or handwritten (and never published) ones in any language (English, French, German, Arabic), as well as computerized (and never printed) ones, whether online or in database formats.
Types of Data
Nine types of data are attached to a given fragment’s shelfmark:
- Digital Image - A full-color high-quality image of both sides of any leaf in the fragment.
- Domain - (e.g., Bible, Talmud, Responsum, personal letter) according to a carefully compiled list of some 30 domains, some of which are further divided into sub-domains.
- Identification – a short description of the fragment's content.
- Coded Catalog Record - comprising about 50 fields that can describe virtually every aspect of the fragment.
- Free text description - notes and comments.
- Transcription - including citations from the fragment.
- Translation to Hebrew.
- Bibliographical References - to any publication that mentions the fragment.
- A Join - the reconstruction of a fragmented manuscript through the "joining" of its component parts.
The Friedberg Input Module
One of the first tasks that the Genazim team tackled was the development of the Friedberg Input Module (FIM). This is a specially designed and constantly updated input-module that contains uniform facilities for inputting domains, identifications, Data, transcriptions, citations, translatiosn, running titles, free descriptions and notes. FIM is the point of contact between the academic activity in Genizah studies and the platform offered by FGP.
Friedberg Information Storehouse (FIST)
The critical and main database of the system is known as the Friedberg Information Storehouse or FIST. All nine types of Genizah-related data collected by Genazim are integrated into FIST, and all data displayed on the FGP website are extracted from FIST.
Each unit of information in FIST is stamped electronically by a unique and fixed "signature" that contains, among others, the following information: a unique identifying number, the date it was received, the source (and sub-source) of that unit (e.g., the research team that produced it or the catalog) from which is was extracted and the version number of that unit.