My ContentDigitization - Planning, Selection of Materials, Hardware, Software, Process, IssuesDigitizationWhy digitize?/ Need for DigitisationPlanning for DigitizationSelection of Materials for digitisationHardware for DigitizationSoftware for DigitizationSteps of Digitization / Process Scanning Indexing Storing RetrievingDigitization Issues / Disadvantages of digitizationOMR - Optical Mark Recognition
Digitization - Planning, Selection of Materials, Hardware, Software, Process, Issues
Digitization
Digitization is a process of converting the non-digital form (printed documents, manuscripts, etc.) into digital form using different methods and techniques. With the increased availability of economical digital storage media, high speed scanners and high-bandwidth network, digital libraries have received a boost in the last few years. The new technologies help to allow the digitisation of all types of documents, including paper documentis, photographs, sound recordings, and motion picture etc.
Digitisation refers to the process of translating a piece of information such as a book, journal articles, sound recording, pictures, audio tapes or videos recording etc. into bits. Bits are the fundamental units of information in a computer system. Converting information into these binary digits its called digitisation.
Why digitize?/ Need for Digitisation
• Better and wider access to information.
• Preservation of information.
• Space management.
• Easy accessibility.
• Cost benefit.
Planning for Digitization
• User community.
• Identification of information material to be digitized.
• Consider copyright issues.
• Work flow involved in conversion of analog material into digital format.
• In-house /out-source.
• Deciding upon the use of information Technology (IT) infrastructure, personnel and financial requirements.
• Digital Preservation.
Selection of Materials for digitisation
•Audio
The sound quality has to be checked and required corrections made together by the subject export and computer sound editon.
• Video
The video clippings are normally edited on beta max tapes, which can be used for transferring on to digital format. While editing colour tone, resolution is checked and corrected.
• Photographie
The selection of photographs is very crucial process. High resolution is very required for photographic images and sliders. Also, the quality and future needs are to be checked and the copyright aspects are to be taken care of
• Documents
Documents which are much in demand, too fragile to handle, and rare in availability are reviewed and selected for the process. If the correction of library value demands much input then documents are considered for, publication rather than digitisation.
Hardware for Digitization
A powerful computer is the basis for any digitization. A dedicated computer should be used specifically for the imaging. The basic equipment for digitization is given below :
• PC/Macintosh
• Large screen monitor
• Planetary Scanner
• Digital Camera.
• CD recorder
• Back up drive.
• Colour Black and white printer
Software for Digitization
For digitization at least three types of software are needed. The first is the scanning software i.e. the scanner the second is the image editing software normally applied to the image after it has been scanned. Third the batch-processing software it is usefull for the generation of the thumbnails and access images converting from one file format to another or compressing files.
Several digital library softwares are currently available like.
1. Greenstone Digital Library (GSDL 2. Dspace
3. Eprints
4. Fedora, etc.
Steps of Digitization / Process
The following four steps are involved in the process of digitisation, software, variably called Document image processing (DIP), Electronic Filing System (EFS) and Document Management System (DMS), Provides all or most of these functions.
1. Scanning
Electronic scanners are used for acquisition of an electronic image into a compuler from its original that may be a photograph, text, manuscript, etc. An image is 'read' on 'scammed' at a predefined resolution and dynamic range. The resulting file, called "bit-map page image" is formatted (image formats described elsewhere) and tagged for storage and subsequent retrieval by the software package, used for scanning. Acquisition of image through fax card, electronic camera or other imaging devices is also feasible. However, image scanners are most important and most commonly used component of an imaging system for the transfer of normal paper based documents.
Steps in the process of scanning using a flatbed scanner.
Step 1 - Place picture on the scanner's glass.
Step 2 - Start scanner software.
Step 3 - Select the area to be scanned.
Step 4 - Choose the image type.
Step 5 - Sharpen the image.
Step 6 - Set the image size.
Step 7 - Save the scanned image using a desirable format (GIF or JPEG)
2. Indexing
If converting a document into an image or text file is considered as the first step in the process of imageing, indexing these files comprises the second step. The process of indexing, scanned images involves linking, of the database of scanned images to a text database. Scanned images are just like a set of pictures that need to be related to a text database describing them and their contents. An imaging system typically stores a large amount of unstructured data in a two file system for storing and retrieving scanned images. The first is traditional file that has a text description of the image (keywords or descriptors) along with a key to a second file. The second file contains the document location. The user selects a records from the first f ile using a search algorithm. Once the user selects a record, the application program keys into the location index, f inds the document and displays it.
Author/ Title /key words / image 45. image 45/ image/new/smith pdf
key to image. Image Location
Most of the document imaging software packages through their menu driver or command driven interface, facilitate elaborate indexing of documents. While some document manage systems facilitate selection of indexing terms from the image file, others allow only manual keying in of indexing terms. Further, many DMS packages provide OCRed capabilities for transforming the images into standard ASCII files. The OCRed text them serves as a database for fulltext search of the stored images.
3. Storing
The most tenacious problem of a document image relates to its file size and, therefore, to its storage, Every part of an electronic page image is saved regardless of the presence or absence of link. The file size varies directly with scanning resolution, the size of the area being digilised and the style of graphic file format used to save the image. The scanned images, therefore, need to be transferred from the hard disc, CD-ROM/DVD ROM disc, snap servers etc. while the smaller document imaging system may use offline media, which need to be reloaded when required, or f ixed hard disc drives allocated for image storage, larger document management systems use auto-changers such as optical jukeboxes and tape library systems. The storage required by the scanned images varies and depends upon factors such as scanning resolution, page size, compression ratio and page content. Further, the image storage device may be either remote or local to the retrieval workstation depending upon the imaging system and document management system used.
4. Retrieving
Once scanned images and OCRed text documents have been saved as a file, a database is needed for selective retrieval of data contained in one or more fields within each record in the database. Typically, a document imaging system uses at least two f iles to store and retrieve document. The first is traditional file that has a text description of the image along with a key to the second file. The second file contains the document location. The user selects a record from the first-file using a search algorithm. Once the user selects a record, the application program keys into the location index, finds the document and displays it. Most of the document management system provide elaborate search possibilities including use of Boolean and proximity operators (AND, OR, NOT) and wild cards. Users are also allowed to refine their search strategy. Once the required images have been identified their associated document image can quickly be retrieved from the image storage device for display or for getting printed output.
Digitization Issues / Disadvantages of digitization
Digitization faces many problems apart from the technical point of view. Required staff expertise and additional resources are often the greatest costs in digitization. Not only are large budget allocations needed to fund research and intellectual selection, but also time must be spent for feasibility assessments, training and methodical prioritization of items or collections to be digitized. These requirements pull staff away from their regular workloads. A part from this digitization faces challenge in several to areas like :
• Storage
• Compression techniques save storage
• User Interface
• Classification and Indexing.
• Information retrieval
• Content delivery.
• Presentation
• Administrative.
• Ease of access to a digital collection leads to high expectations of end users.
OMR - Optical Mark Recognition
Blank
No comments:
Post a Comment