How much will the all-in-one service cost?
Every project is different but the All-in-One Newspaper Digitization service includes:
- vendor pricing for digitization and full text OCR processing;
- any image splitting, rotating, quality control etc;
- hard drives for storage and delivery;
- shipping and handling;
- ingestion of all files–page images, issue and publication records–into VITA;
- and, mapping of any electronic index records to digital pages.
The pricing for ODW’s All-in-One service is very competitive, costing only up to 75-cents per page. Depending on your archival needs, what material you are digitizing from, the number of pages and how you want your content to display and be searched, the actual price will vary. Talk to us about determining your needs and get a free estimate by contacting us.
How can I digitize from microfilm or fiches?
If you have a collection of newspapers on microfilm or fiche, first ensure that you have copyright permission to reproduce the content for online display. Many third-party vendors will scan hundreds or thousands of images directly from microformats. Scanning microformats in-house is also an option if you have a microfilm reader/scanner that can render high quality digital files, a desktop license for OCR software, and sufficient staff time to dedicate to this task. See What is ‘the best clean copy’?
How can I digitize from hard copy (print) newspapers?
If you have a print collection of newspapers, first ensure that you have copyright permission to reproduce the content for online display. Many third-party vendors are able to scan the paper copy, often providing a better digital copy than would originate from a microfilm or microfiche copy. Scanning in-house is also an option if you have a large bed scanner or a camera with tripod and sufficient staff time to dedicate to this task.
What if I have a card catalogue of BMDs or other index records?
Typed index cards can be digitized using a scanner pen or transcribed and entered into the VITA data management screens. Handwritten index records can be transcribed and entered manually into VITA.
What if I have an existing database of index records?
We can import your existing digital index records into VITA so you can edit, update, and add to that collection. If you are also working with full-run newspaper page images and have included page references in the index records, the index can be linked to the page images as well.
What is ‘the best clean copy’?
The best clean copy is the one that is the most legible after digitization. Microfilm or fiche positive exposures that have been used at public workstations are often scratched, torn or mended in ways that affect the image of the newspaper. Whenever possible, obtain the unused negative reels of the film version and either have a vendor digitize directly from the negatives or produce a first copy positive to digitize (gives you a clean back up reel for your archives as well!). If the material is not filmed or negatives are unobtainable, then scanning the print copies will likely produce the best results. See How can I digitize from paper copies?
What is the difference between Full page display and Full page display with hit highlighting?
A full page display provides the end user with full text results, usually shown with a snippet of the OCR results for context, link to a page view where, unless the search terms have been indexed separately, the end user needs to browse the full page scan to find the terms they searched.
A full page display with hit highlighting is processed so that the search terms are highlighted within the full page scan to guide the end user directly to their search result within a page. See Hit Highlighting.
What is “full page display”?
The full page display provides the end user with full text results, usually shown with a snippet of the OCR results for context, link to a page view where, unless the search terms have been indexed separately, the end user needs to browse the full page scan to find the terms they searched. Click here for an example of full page news results.
Why should I process for hit highlighting?
Hit highlighting makes it much easier to navigate through dense text, especially with heritage material where low lighting, heavy serif fonts, blurred print and other distractions make skimming difficult. The success of hit highlighting does depend on the quality of images being processed (See What is ‘the best clean copy’?). Click here for an example of hit highlighting results.
What is positional OCR?
Unlike traditional OCR output where all the text on a newspaper page is compiled into one big unordered TXT file, we process the JPG files produced by your third party vendor with a powerful OCR software and add a little bit of XML so that the OCR text results are “positioned” according to where the text result appears on the page image. Then, when your end user searches a term, those terms are returned with visual “highlighting” of specific words on the page image to draw the eye immediately to the article or headline where it occurs. For researchers working through masses of dense text, hit highlighting is a huge advantage. See more about our hit highlighting process here.
What file formats am I supposed to get from the vendor?
For all digitization projects, we recommend getting a high resolution (300-400dpi) TIFF image to keep as your archival master. The TIFF format is “lossless” and is the best file type for storing and deriving other versions from or creating high resolution print material for other related projects.
For full page display with full text discovery, we need a medium resolution image and text file. We can work with your TIFFs, but recommend sending either a 300-400dpi greyscale JPG that has been OCR’d and output as a TXT file or a readable PDF. Colour files make sense if the original newspaper content has been printed in colour. VITA can extract text from most PDFs and will render a JPG from the PDF file for web display. For content that will be processed for hit highlighting, we’ll need your 300-400dpi greyscale TIFFs or derivative 300-400dpi greyscale JPGs. Talk to us about this service.
If you are sending us digital files, proper file naming is a critical part of a smooth process. Please read more about File Naming.
What are best practices for File Naming?
Good file names have meaning for humans and will not confuse computer systems. To accomplish this, follow the recommended file name standards for handing over files to ODW for processing:
The file name describes the relationship between publication title>issue>page. A good example is EFP18870923_001.jpg
- Publication title code: EFP
- Issue date: YYYYMMDD
- Separator: use underscore or hyphen
- Page number: enough zeros to accommodate the maximum number of pages possible per issue
- File format extension
- Do not include: spaces, ampersands, periods, colons, commas, slashes, question marks, etc.
- Do include: hyphens, underscores and alphanumeric content
- Keep file names under 32 characters
If working with multiple publications titles, please also provide an authority file, i.e. a list that indicates which Publication Title uses what acronym (EFP = Essex Free Press). Download our File Naming Best Practices document.
How do I get the material into VITA?
When you have a mass of newspaper content–page images, hit highlighted content, or a digital index—we can systemically “ingest” all the parts into VITA so that each is individually identifiable, but but mapped to link properly in relationship to other objects (i.e. Publication record>Issue records>Pages>Index records and Page images have their associated TXT or OCR files layered for full text searchability) and that every displays properly. This is a process that we conduct as part of our all-in-one service workflow or can be provided as a separate service. Contact us about your project ideas.
How do I use VITA for my newspaper collection
Once your material has been ingested and mapped (see How do I get the material into VITA?), we activate your VITA account and you can login to manage or add to the newspaper content. Single organizations with just index records can go with News Basic plan; any full-run newspaper collections need the News Plus plan; if you have a multimedia collection as well, the VINTA plan will accommodate all your needs! We will work with you throughout the process to determine what the best plan is for you. See our price plan features to compare.
I don’t have the resources to manage all this myself! Can ODW help?
Absolutely. OurDigitalWorld is now offering an All-in-One Newspaper Digitization Service to help take the pressure off your organization. Once we talk with you to determine your needs, we will take it from there. ODW navigates all the negotiations with vendors to accomplish the best quality digitization for your archive and online display, we ingest the files from the vendor, we can “map” any of your existing index records to the full page scans, and we upload all your content into the VITA News search and discovery interface. Once the newspaper digitization project is complete, you receive the digital files back and can manage and continue to add to your online collection using the VITA software. Pricing is on a per project basis, so talk to us about your project ideas: email@example.com