Content Digitization QA – Do it the Smart Way

Content Digitization QA – Do it the Smart Way

I had blogged sometime back on whether or not Content QA is required. While the general consensus is that it is needed, it often gets a little challenging to tackle because of some of these core reasons:

  • 1. Voluminous content to be handled during digitization
  • 2. Lack of time to focus on testing towards end of release, when product functionality testing takes front seat
  • 3. Complex specifications to test
  • 4. Compatibility to be verified across a wide range of platforms and devices

In this blog I am going to specifically talk about doing Content Digitization QA the smart way trying to handle as many issues as possible upfront before content hits the application’s UI. Explained in very simple terms, the content digitization process can be represented as below:


To tackle the challenges outlined above the “Content extracted into XML” is a very important stage to get a lot of testing done. Issues caught here and resolved before content hits the application UI saves a lot of time and money in the overall testing process. There are going to be some tests though which can be done only after the prod3 ingestion happens (such as the overall fit and finish of the content on a specific device). But several tests such as content format and meta data checks and associated parameters can be done at the pre-ingestion stage. At both the pre-ingestion and post-ingestion stages, smart ways need to be adopted to identify potential areas of automation to save time and handle voluminous XML files without missing any checks.

We at QA InfoTech have built our own scripts along with an application that consumes those scripts to offer a neat UI to take on pre-ingestion testing. This has been customized with keywords and checks specific to the publishing and educational domains which are where a lot of digitization happens. This application and scripts can easily be customized for pre-ingestion testing for any domain, with minimal effort. We’ve built this on PhP making it simple to use and reporting issues down to the line number at which they occur to ease the debugging process. We’ve used these scripts for several of our publishing clients and the expertise that we’ve built help us guarantee a turnaround time of 24 hours for such pre-ingestion testing from the time, we have a new file to test (regardless of how voluminous the XML is).
We have our test automation framework built on open source technologies with built in virtualization, pluggable utilities for continuous integration and reporting, to address test automation at the post ingestion / application UI level. For more details on this framework, listen to our webinar hosted here. Seen below is a pictorial representation of our overall Content Digitization QA process. We soon plan to do a webinar explaining our content digitization automation solution in detail. Stay tuned for more updates. In the mean time, if you have any questions/need more information, please reach out to me


About the Author

Avatar QA InfoTech
Established in 2003, with less than five testing experts, QA InfoTech has grown leaps and bounds with three QA Centers of Excellence globally; two of which are located in the hub of IT activity in India, Noida, and the other, our affiliate QA InfoTech Inc Michigan USA. In 2010 and 2011, QA InfoTech has been ranked in the top 100 places to work for in India.