The library aims to solve key engineering problems in scientific computing by improving the management and processing of large data sets. In a blog article published last week, Google AI introduced TensorStore, an open-source C++ and Python library designed to store and process n-dimensional data. The library aims to solve key engineering problems in scientific computing by improving the management and processing of large data sets.
Various modern computer science and machine learning (ML) applications manipulate multidimensional datasets that span a single and wide coordinate system. An example would be using air measurements over a geographic grid to estimate weather. Another might be the prediction of medical images using multichannel image intensity values from 2D or 3D scans.
A single dataset in these settings may also need petabytes of storage, and working with such datasets can be challenging because users may receive and record data at different scales and at unpredictable intervals. Google AI researchers say TensorStore is already being used to solve key engineering problems, such as managing and processing large data sets in neuroscience, such as peta-scale 3D electron microscopy data and “4d” videos of neural activity.
In addition, the library has been used in the creation of PaLM, a large-scale machine learning model, by solving the problem related to the management of model parameters or control points during distributed learning.
TensorStore is also expected to offer an asynchronous API that allows high-performance access even to remote storage with high latency. It provides a simple Python API for loading and processing large array data. The blog states, “No actual data is available or stored in memory until a specific 100 slice is requested; therefore, arbitrarily large base datasets can be loaded and processed without having to store the entire dataset in memory, using an indexing and manipulation syntax much identical to standard NumPy operations.”
To learn more, Google AI has provided a TensorStore package that can be installed with simple commands.