A few years ago, one of our clients shipped us a hard drive — the old-fashioned way, in the post. It was full of X-ray inspection data they'd accumulated over a year. About 4 terabytes of it. Their request was simple enough: "Can you help us find which scans had defect pattern X?"
I plugged it in and opened the drive. Inside were thousands of folders. Some had TIFFs. Others had a mix of TIFFs and Excel sheets. A few used a custom binary format from an older inspection system. There were PDFs of inspection reports. The naming convention quietly changed somewhere around month seven, when a new operator joined. The technique parameters for each scan were spread across at least four different places, depending on who was running the shift.
We spent a week just figuring out what was on the drive.
This is the unglamorous reality of NDT data. And it's the reason I want to talk about a file format you've probably never heard much about — even though it might be the single most important piece of plumbing in modern radiographic and CT inspection.
That format is called HDF5.
What is HDF5? (No jargon, I promise)
HDF stands for "Hierarchical Data Format." HDF5 is the fifth version of it.
The simplest way I can describe it:
HDF5 is a file system inside a single file.
Think of a Windows folder structure — folders inside folders, with files at the bottom. Now imagine that the entire structure, with every file in it, is packed into one .h5 file you can email someone or stick on a server.
That's it. That's HDF5.
Inside that one file you can have:
- Folders (HDF5 calls them groups)
- Files (HDF5 calls them datasets)
- Tags on anything (HDF5 calls them attributes)
A dataset can be a 2D image, a 3D volume, a 1D signal, a giant table — anything numeric.
Here's what a typical NDT inspection file looks like inside:
inspection_2026-04-07.h5
├── /raw_images/
│ ├── frame_001 ← 2048 × 2048 pixel image
│ ├── frame_002 ← 2048 × 2048 pixel image
│ └── ...
├── /metadata/
│ ├── operator ← "John Smith"
│ ├── technique ← {kV: 160, mA: 5, exposure_ms: 2000}
│ └── timestamp ← 2026-04-07T14:30:00Z
├── /calibration/
│ └── flat_field ← Reference image
└── /report/
└── findings ← Defect coordinates and severity
Everything about that inspection — the images, the technique, the operator, the calibration used, the findings — is in one file. Open it in five years on a different operating system with a different software stack, and it's still there, still readable, still self-explanatory.
"Wait, doesn't TIFF already do this?"
I get this question a lot. TIFF is great. We use TIFFs all the time. But TIFFs have a problem in NDT: they're an image format, not a data format.
If you have a 3D CT scan of an aerospace bracket — say, 2,000 slices, each 4K × 4K — that's not really an "image." It's a volume. Storing it as 2,000 separate TIFFs means:
- Anyone reading them has to figure out the slice order on their own
- Metadata about the scan (kV, mA, geometry, source-detector distance) lives somewhere else, usually in an XML or text file alongside
- One reorganization, one filename change, one missing sidecar file — and your data is essentially garbage
HDF5 keeps the volume as a single 3D dataset. The metadata sits beside it as attributes on the same dataset. Lose your filename convention? Doesn't matter — the file describes itself.
This is the part most people miss. HDF5 is self-describing. The file knows what's in it, what units it's in, what the geometry was, and what the operator was doing. You don't need a manual to read it five years later.
Why this matters for NDT specifically
Let me give you four real reasons from our work at Mind Fox — building software around detectors, X-ray sources, and inspection systems for clients across the world.
1. Single-file truth
A typical industrial CT scan generates: raw projections, a reconstructed volume, calibration reference frames, geometry parameters, technique log, and a defect report. In a traditional setup, those live in separate folders or formats. We've watched clients lose hours trying to match a reconstruction to its raw projections because someone renamed a folder.
In HDF5, all of that is one file. Move it, copy it, archive it — it stays together.
2. Massive datasets without the pain
Modern flat-panel detectors push out 16-megapixel images at 30+ frames per second. A two-minute weld inspection produces 5–10 GB of raw data. HDF5 was designed by the scientific computing community precisely for this — it can handle terabyte-scale datasets, lets you read just the slice you need without loading the whole file into memory, and supports built-in compression (lossless gzip, or specialized codecs).
When we built the data layer for a cargo inspection system, switching from TIFF stacks to HDF5 cut storage by roughly 60% and reduced load times for individual frames from seconds to milliseconds.
3. AI/ML readiness
Every NDT vendor is now talking about AI-assisted defect recognition. Training those models requires huge labeled datasets. The data has to be in a format that ML pipelines can consume efficiently — and HDF5 is one of the few formats with first-class support in TensorFlow, PyTorch, NumPy, and basically every major scientific tool.
If your inspection data lives in TIFFs and spreadsheets, you're going to spend the first three months of any AI project just cleaning it up. If it lives in HDF5, you can start training in week one.
4. Vendor lock-in escape
I've seen too many shops trapped in proprietary inspection software because their decade of data only opens in that one tool. License costs go up every year, but they can't switch.
HDF5 is an open format, maintained by The HDF Group, with libraries in every language you've heard of. Your data is portable. If you want to switch software, you switch software — your data comes with you.
Where HDF5 fits with DICONDE and other NDT formats
Quick clarification because I get this question constantly:
DICONDE (Digital Imaging and Communication in Non-Destructive Evaluation) is the NDT-specific extension of DICOM, the medical imaging standard. It's good for individual images and exam-level metadata, and many regulators expect it.
HDF5 is more general. It's not "instead of" DICONDE — it's often "alongside." DICONDE for the inspection record that needs to satisfy compliance. HDF5 for the rich underlying dataset that engineers and AI models actually work with.
In practice, modern systems we build store the master data in HDF5 and export DICONDE views on demand for archival or regulatory submission. Best of both worlds.
Where HDF5 falls short
It's not perfect. A few honest caveats from someone who's debugged it at 2 a.m.:
- It has a learning curve. Writing efficient HDF5 takes thought — chunk sizes, compression filters, attribute conventions. Sloppy writes lead to bloated files.
- No built-in indexing. It's a file format, not a database. If you need to query "find all scans where defect probability > 0.8" across 10,000 files, you'll need a separate index layer.
- Concurrent writes are tricky. Multiple processes writing to the same HDF5 file simultaneously is possible but requires care (SWMR mode, file locking).
None of these are dealbreakers — but they're real, and you should plan for them.
One Piece of Advice
If you're starting a new NDT software project today, plan your HDF5 schema before you write your first line of UI code.
The data model is the part that's hard to change later. Everything else — the GUI, the algorithms, the reports — can be rewritten. The file format you commit to will outlast every UI redesign.
The 30-second takeaway
If you remember nothing else from this article:
- HDF5 is a file format that holds folders, files, and tags inside a single
.h5file. - It's self-describing — your data won't go stale when people leave or formats change.
- It handles huge datasets efficiently and integrates with every modern AI/ML tool.
- For NDT, it solves the "where did the metadata go?" problem most labs live with daily.
It's not flashy. There's no marketing department for HDF5. But it's quietly the foundation under most serious scientific data infrastructure on the planet — and NDT is finally catching on.
If you're building or buying inspection software in 2026 and HDF5 isn't part of the conversation, it should be.