Medical big data is the talk of the medical industry. In fact, medically advanced countries like the U.S and the U.K have already deployed healthcare systems tailored to each individual patient, using medical big data. The Korean government has also established a ‘Healthcare Big Data Initiative’ consisting of major public institutions and experts from various fields in efforts to leverage medical big data as the key resource in what we call the Fourth Industrial Revolution. The initiative seeks to make use of the existing medical big data, by suggesting ways to improve legal and institutional measures on the utilization of big data in the medical field and by devising service models that meet the needs of both public and private sectors.
With that in mind, we will discuss the concept of medical big data in the following series of articles to help us understand and explore how we can leverage the big data at hand. In this article, we start by asking the three most essential questions regarding big data.
1. What is medical big data?
The concept of big data first appeared in 2001 when an IT consulting firm META Group (now Gartner) defined big data as the 3Vs.
Big data is characterized by large data volume, high velocity in generation, delivery and analysis of data, and variety of data forms such as numbers, texts, and images. The same applies for medical big data. The only difference is that it deals with health-related data generated in medical environments and it also refers to the technology of deriving value from the collection, storage, and analysis of the data itself.
2. Medical big data, how big and how fast-growing is it?
The size of medical data has grown dramatically with the spread of medical imaging and information systems such EMR (Electronic Medical Record), PACS (Picture Archiving and Communication System), and mobile/wearable health devices.
According to IDC, an American IT research firm, the total size of existing medical data worldwide will increase from 153 Exabytes in 2013 up to 2,314 Exabytes by 2020. Here, 1 Exabyte is equivalent to 8 million 128-Gigabyte iPads. What’s more, with the advent of Internet of Things (IoT) which accelerates real-time collection of personal health information, the size of the medical big data is growing even faster and faster.
3. Medical big data, how diverse is it?
Data can be divided into 3 categories—structured, semi-structured, and unstructured—depending on how structured the data values are.
Structured data refers to any data with fixed values such as gender type (male/female) or age (numbers).
Semi-structured data refers to data with at least some kind of fixed patterns, though not completely fixed in its values. A typical example of semi-structured data is an ‘annotation’, the physician comment added to a medical image or document and which usually consists of a set of shapes or measurement units. Being either hand-written or digitally added, annotation data isn’t always fixed in its values but still certain patterns exist.
Unstructured data, on the other hand, refers to data completely without rules. For example, medical images fall under the category of unstructured data, and they don’t have any fixed or repeated patterns such as numbers, variables, and terms.
If the above-mentioned structured data is like a multiple-choice question, the unstructured data is closer to an essay question. There is only one correct answer for a multiple-choice question, but no fixed answer exists for an essay question which makes it much harder to grade as well. In other words, unstructured data requires a higher level of analytical techniques compared to structured or semi-structured data.
It’s interesting to note that in April 2017, the Korean National Information Society Agency (NIA) declared, “the healthcare industry is one of the key potential areas for big data applications” (Big Data Monthly Vol.29, National Information Society Agency).
In the next article, we will continue the discussion on medical big data by highlighting why it’s important to first integrate the big data and the benefits that come with the integration.