Navigating Data Challenges: Insights from a Healthcare Data Engineer

Navigating Data Challenges: Insights from a Healthcare Data Engineer

Table of Contents

  1. Introduction
  2. Simona's Role at Adoc
  3. The Challenges of Data Management at Adoc
    • Dealing with Existing Data
    • Managing New Data
  4. Storing and Accessing Data Correctly
    • Building a Data Lake
    • Providing Access to Different Users
  5. Running Algorithms on Scale
    • Optimizing Algorithm Performance
    • Balancing Accuracy and Speed
  6. The Type of Data Adoc Stores
    • DICOMs and Medical Images
    • Unstructured Data Challenges
  7. Applying Domain Knowledge to Interpret Data
  8. Technologies Used at Adoc
    • Zeppelin, Spark, and EMR
    • Future Technologies: DBT, Airflow, Iceberg
  9. Evaluating Data Lake Solutions: Delta Lake vs. Apache Iceberg
  10. The Importance of Data Auditing
    • Ensuring Data Integrity and Trust
    • Designing a Data Auditing System
  11. Conclusion

Introduction

In this article, we Delve into the world of data management through the eyes of Simona Miriam, a senior data engineer at Adoc. Adoc is an AI-Based healthcare startup in Israel that specializes in analyzing medical images to assist physicians in making better decisions. Through her experiences at Adoc, Simona shares the challenges she faces, the technologies she works with, and the value data brings to their products.

Simona's Role at Adoc

Simona introduces herself as a music and travel enthusiast with a passion for data. As a senior big data engineer at Adoc, her role revolves around building a robust data architecture that meets the company's needs. Simona highlights two major challenges she faces: managing existing data and handling the continuous flow of new data. Adoc, like many startups, initially focused on building its product without considering scale or data architecture. Simona's task is to address this issue retrospectively.

The Challenges of Data Management at Adoc

Simona explains how Adoc, similar to other startups, has amassed a significant amount of data before establishing a solid data infrastructure. Dealing with existing data and establishing an efficient system for new data pose unique challenges. The data architecture needs to Align with Adoc's evolving requirements while meeting the demands of different users such as data analysts, data scientists, and AI engineers.

Storing and Accessing Data Correctly

Creating a well-structured data lake and providing access to different consumers are vital aspects of data management at Adoc. Simona's role involves defining the best practices for storing and organizing data, selecting suitable tools, and ensuring different users have access to the right level of data granularity. Adoc's wide range of consumers, each with specific needs, further complicates the task.

Running Algorithms on Scale

Simona elaborates on the significance of optimizing algorithm performance to achieve accurate and efficient results. Leveraging frameworks like Spark and EMR, she investigates the best ways to process and analyze data at scale. Balancing accuracy and speed is crucial for running algorithms effectively in Adoc's healthcare environment.

The Type of Data Adoc Stores

Adoc primarily deals with DICOMs, a type of medical imaging data. Simona emphasizes the uniqueness and challenges associated with DICOMs, as they are unstructured data. Unlike structured media data commonly found in the industry, DICOMs require a deep understanding of image processing and metadata interpretation. Simona draws a Parallel between the interpretation of diagnoses by doctors and the interpretation of DICOMs by data engineers.

Applying Domain Knowledge to Interpret Data

Simona sheds light on the necessity of incorporating domain knowledge when interpreting metadata and images. The main challenge lies in uncovering the meaning behind each attribute within the DICOMs and understanding how they need to be processed. This process involves extensive research and investigation to ensure accurate interpretation and analysis.

Technologies Used at Adoc

Simona shares her experience working with various technologies at Adoc. She mentions using Zeppelin, an investigation tool, to understand and investigate data. The team also leverages Spark and EMR for data processing and analysis. Additionally, she expresses Curiosity about emerging technologies such as DBT for data transformation and Presto and Daxter for workflow management. Simona highlights the importance of benchmarking and evaluating these technologies to determine their suitability for Adoc's evolving needs.

Evaluating Data Lake Solutions: Delta Lake vs. Apache Iceberg

Simona expresses interest in comparing data lake solutions like Delta Lake and Apache Iceberg. She highlights the benefits of these technologies in terms of solving specific challenges like file compaction and managing log files. By comparing the two, Adoc aims to determine the best fit for its data architecture, considering scalability, ease of use, and overall performance.

The Importance of Data Auditing

Data auditing is a crucial aspect of data management. Simona shares her insights into the significance of data integrity and trust, especially in the Context of a healthcare company like Adoc. She delves into the challenges faced at Nielsen, a previous company she worked for, and how they implemented a robust data auditing system. Simona explains the process and design considerations, providing practical guidance to the audience.

Conclusion

In conclusion, Simona Miriam's Journey as a senior data engineer at Adoc offers valuable insights into the challenges of data management and the use of big data technologies in a healthcare startup. From establishing a reliable data architecture to dealing with unstructured medical imaging data, Simona's experiences provide readers with a comprehensive understanding of the complexities and importance of data management in a data-driven organization.

Find AI tools in Toolify

Join TOOLIFY to find the ai tools

Get started

Sign Up
App rating
4.9
AI Tools
20k+
Trusted Users
5000+
No complicated
No difficulty
Free forever
Browse More Content