The New York Times and Google Cloud have partnered to assist them in digitising their massive photo collection. It makes use of a variety of Google Cloud Platform technologies that let them store their images safely, give users a better search experience, and uncover fresh information even from the data that is locked on the backs of pictures.
The Times has kept five to seven million of its historic photographs in hundreds of file cabinets three levels below street level, close to its headquarters in Times Square, where the collection is known as the “morgue,” for more than a century. Many of the pictures had been in folders for years, unused. A card catalogue gives a general idea of what the archive contains, but the photos contain a lot of information that are not captured in an indexed form.
Consuming visual history
The morgue contains photos from as far back as the late 19th century, and many of its contents have tremendous historical value—some that are not stored anywhere else in the world
Not just the imagery in the images is important information. The date and location of the photo’s capture are frequently listed on the reverse of the image. staff members have been looking into options for digitizing the morgue’s images for years, both in the photo department and on the business side. But even as recently as last year, a digital archive looked unattainable.
To preserve this priceless history, and to give The Times the ability enhance its reporting with even more visual storytelling and historical context, The Times is digitizing its archive, using Cloud Storage to store high-resolution scans of all of the images in the morgue.
Cloud Storage is our durable system for storing objects, and it provides customers like The Times with automatic life-cycle management, storage in geographically distinct regions, and an easy-to-use management interface and API.
Creating an asset management system
It takes more than just storing high-resolution photographs to build a system that photo editors can utilise with ease. Users must be able to browse and search for photographs with ease in order for an asset management system to be effective. The Times created a pipeline for analysing the photos that stores and organises them. Text, handwriting, and other details that can be found in the images will be processed and identified using cloud technology.
Once an image is ingested into Cloud Storage, The Times uses Cloud Pub/Sub to kick off the processing pipeline to accomplish several tasks. Images are resized through services running on Google Kubernetes Engine (GKE) and the image’s metadata is stored in a PostgreSQL database running on Cloud SQL, Google’s fully-managed database offering.
The New York Times was able to establish its processing pipeline with the aid of Cloud Pub/Sub without the need to create intricate APIs or business process systems. Since it is a completely managed solution, there is no time required for infrastructure upkeep.
In order to resize the images and modify image metadata, The Times uses “ImageMagick” and “ExifTool”, open-source command-line programs. They added ImageMagick and exiftool wrapped with Go services to Docker images in order to run them on GKE in a horizontally-scalable manner with minimal administrative effort. Adding more capacity to process more images is trivial, and The Times can stop or start its Kubernetes cluster when the service is not needed. The images are also stored in Cloud Storage multi-region buckets for availability in multiple locations.
Tracking photos and their metadata as they flow through The Times’s systems is the last component of the archive. Cloud SQL is an excellent option. A basic PostgreSQL instance is made available to developers by Cloud SQL as a fully managed service, eliminating the need to install updates, apply security patches, or set up intricate replication setups. Engineers may use a typical SQL solution with ease thanks to Cloud SQL.
Machine learning for additional insights
The story is not complete after the photographs are stored. Leveraging additional GCP functionalities is advantageous for increasing the usability and accessibility of an archive like The Times’ morgue. Adding information about the substance of the photographs has proven to be one of The Times’ biggest hurdles while scanning their photo library. To close that gap, the Cloud Vision API can be useful.
In Cloud the photo will not be clear from the front of the photo what it contains. The back of the photos contains a wealth of useful information, and the Cloud Vision API can help us process, store, and read it. The digital text transcription isn’t perfect, but it’s faster and more cost effective than alternatives for processing millions of images.
BOX : Bringing image recognition and OCR to cloud content management
Images which include marketing assets, product shots, and completed forms photographed on a mobile device, are pertinent to company processes and hold a wealth of important data. The methods that businesses use to identify, classify, and tag photographs are still mostly manual in spite of the richness of value contained in these files.
These images, which include marketing assets, product shots, and completed forms photographed on a mobile device, are pertinent to company processes and hold a wealth of important data. The methods that businesses use to identify, classify, and tag photographs are still mostly manual in spite of the richness of value contained in these files.
On the other hand, personal services like Google Photos now offer far more than just image storage. These services intelligently group pictures, making it simpler to find them. Additionally, when users search for particular keywords, they automatically recognise images and produce a list of photos that are pertinent. As we looked at this technology, we thought, “Why can’t we bring it to the enterprise?”
The Google Cloud Vision was simple to use. The API takes a picture file as input, examines its content, extracts any written text, and then returns labels and recognised characters as part of a JSON response. According to similar images, Google Cloud Vision divides the image into categories. It then does the content analysis specified in the developer’s request and provides the results along with a confidence rating for the analysis.
You require login information—either an API key or a service account—to access Google Cloud Vision. Whatever credentials you choose to use, you should keep them a secret so that nobody else may access your account and steal your money. For instance, manage access to your credentials, encrypt them at rest, and cycle them on a regular basis. Don’t put your credentials directly in your code or source tree.
The ability to automatically classify and label images provides dozens of powerful use cases for Box customers. In beta version, it’s working with companies across a number of industries:
- Image recognition in Box is being used by a retail customer to streamline the administration of product photographs’ digital assets. They can eliminate human tagging and organising of crucial photos, which is essential to multi-channel workflows, by using automatic object detection and metadata labelling.
- Images from freelance photographers throughout the world are being automatically tagged by a big media organisation using image recognition in Box. They could not previously preview or tag each and every image. They can now automatically evaluate a greater number of photographs than ever before and find fresh methods to value-extract from that content.
- A global real estate company is using Box’s optical character recognition to digitise workflows for paper-based leases and agreements, enabling its staff to identify sensitive assets more rapidly and skipping a laborious tagging procedure.
Google Cloud has teamed up with The New York Times to help them digitize their vast photo collection. The Times has archived approximately five to seven million of its old photos in hundreds of file cabinets three stories below street level near their Times Square offices in a location called the “morgue.” The morgue is a treasure trove of perishable documents that are a priceless chronicle of not just The Times’s history, but of nearly more than a century of global events that have shaped our modern world. To preserve this priceless history, and to give The Times the ability enhance its reporting with even more visual storytelling and historical context, The Times is digitizing its archive, using Cloud Storage to store high-resolution scans of all of the images in the morgue.
For more details contact firstname.lastname@example.org
- No similar blogs