Why AI Training Needs to Rethink Storage: The Starting Point and Technical Path of AskAIs Mini SSD

When people think of AI infrastructure, the first things that come to mind are usually GPUs, compute chips and model parameters. Storage tends to become the focus only when capacity runs out, training is interrupted, or GPUs start waiting on data.

But the speed of an AI system has never depended on compute alone. Model training requires continuously reading the dataset, the data pipeline must feed samples to the accelerators, and the system must periodically write checkpoints to ensure recovery after a failure.

If storage cannot complete this work with sufficiently stable throughput and latency, even the most expensive GPU may be forced to wait.

This is exactly the starting point for Stellar AGI Labs' development of AskAIs Mini SSD.

As the company's first public AI hardware product, AskAIs Mini SSD uses the M.2 NVMe 2280 form factor, with planned capacities of 128GB, 256GB, 512GB, 1TB, 2TB and 4TB. It is officially positioned as high-capacity, high-throughput storage for LLM training, and the company says the product is research-complete and expected to launch in Hong Kong at the end of the year.

That said, "AI SSD" is not a product category established by name alone. M.2 describes the form factor, NVMe describes the protocol and interface direction, and capacity describes how much data can be stored; what truly determines whether it suits AI training also includes the controller, NAND, firmware, cache, sustained performance, latency, write endurance, thermals, power and compatibility.

Where Do AI Training's Storage Problems Occur?

Training a large model is not about loading the entire dataset into the GPU at once and waiting for compute to finish. Data flows continuously among storage, CPU memory, GPU memory and different nodes.

The first problem is how to keep feeding training data to the GPU.

Training usually splits the dataset into many samples or shards, which a data loader reads, decodes and preprocesses, then forms into batches sent to the GPU.

If storage cannot supply data fast enough, the data loader cannot prepare the next batch in time. After the GPU finishes the current computation, with no new data to process, it waits.

MLCommons' MLPerf Storage benchmark measures exactly how fast a storage system can supply training data while keeping accelerators at least 90% utilized. It shows that the value of AI storage is not just posting pretty MB/s figures, but keeping expensive compute resources working.

Checkpoints Are Also a Massive Storage Job

Model training may last hours, days or longer. To avoid losing all progress to a hardware failure or software error, the system periodically saves checkpoints.

A checkpoint contains not only model weights but possibly optimizer state, learning rate, random state and distributed-training information. The larger the model, the larger the checkpoint.

MLPerf Storage's public data shows its Llama 3 checkpoint workload ranges from about 105GB for an 8B model to about 18TB for a 1T model.

This does not mean a single M.2 SSD must independently bear an 18TB checkpoint, but it shows that as models grow, saving and restoring training progress itself becomes an important storage-engineering problem.

When writing checkpoints, the system needs not only instantaneous peak speed but stable sustained writes. If an SSD's speed drops sharply after its high-speed cache is exhausted, the training node may wait a long time for the save to complete.

The Data Path Between Storage and GPU

The traditional data path usually reads data from storage into CPU system memory first, then copies it to GPU memory. This consumes CPU, memory bandwidth and PCIe resources.

NVIDIA GPUDirect Storage provides a direct data path between storage and GPU memory, reducing data passing through a CPU bounce buffer, and can lower CPU load, latency and system-bandwidth bottlenecks.

But buying an NVMe SSD does not automatically mean GPUDirect Storage support. It also requires a compatible GPU, driver, CUDA, file system, system topology and application.

The SSD is part of the whole AI data pipeline, not the single component that alone determines end-to-end performance.

What Does M.2 NVMe 2280 Mean?

M.2 2280 means a module about 22mm wide and 80mm long, a compact SSD form factor common in desktops, workstations and some servers.

NVMe is a protocol designed for non-volatile storage, usually connected to the host via PCIe, supporting higher parallelism and lower software overhead.

But two SSDs that are both M.2 NVMe 2280 can perform entirely differently.

An SSD usually consists of a controller, a DRAM or host-memory buffer, NAND, a PCB and firmware. The controller handles the Flash Translation Layer, garbage collection, wear levelling and data scheduling; NAND, cache, back-end channels, power and thermals all jointly affect performance.

Therefore, when AskAIs Mini SSD officially launches, it needs to further disclose the PCIe generation, controller platform, NAND type, whether it has dedicated DRAM, and the full performance of each capacity.

Peak Speed Does Not Equal AI Training Speed

Many SSDs temporarily use part of their NAND as a high-speed write zone. For short writes, test results look great; once the cache is exhausted, speed may fall back to native NAND levels.

An ordinary file copy may finish before the cache runs out, but checkpoints, dataset preparation and long training logs may continuously write hundreds of GB.

So AI training needs to focus more on sustained write speed beyond the cache, and on performance after the drive is 50% or 80% full.

A formal review of AskAIs Mini SSD should disclose empty-drive peak, cache size, post-cache speed, long-duration steady-state performance and write-recovery time together.

Endurance Matters More Than Short Benchmarks

NAND has a limited number of write and erase cycles. SSDs usually reserve some raw capacity to replace ageing cells and improve reliability.

SSD endurance is generally expressed as TBW or DWPD. If a product is positioned for long-duration AI training, TBW, warranty period and workload conditions are more convincing than a single phrase like "sustained and stable".

Different capacities have different amounts of NAND and write distribution, so the 128GB, 1TB and 4TB versions should each disclose their TBW, rather than using one figure for the whole range.

M.2's Advantage Is Compactness; So Is Its Challenge

M.2 is small and easy to install, suitable for AI workstations, development PCs and edge devices. But the limited area also means constrained thermal and power headroom.

Long continuous read/write heats the controller and NAND. Once a temperature threshold is reached, the SSD may throttle to protect the hardware.

Short benchmarks may not reveal thermal throttling, but hours of training and checkpoint cycles may expose the problem.

So AskAIs Mini SSD needs to disclose operating temperature, maximum power, recommended cooling, and whether throttling occurs under sustained load.

Six Capacities Should Match Different AI Scenarios

128GB and 256GB are better suited to the operating system, development environment, model cache and edge devices, and should not be loosely described as large-scale LLM training storage.

512GB and 1TB can serve AI learning, inference, fine-tuning experiments, local data preprocessing and small-to-medium checkpoints.

2TB and 4TB are better suited to AI workstations, local datasets, model files, cache and checkpoint staging.

Even 4TB cannot independently cover all the needs of a large training cluster. Large systems usually require multiple local SSDs, network storage, parallel file systems or object storage working together.

A more accurate positioning is to treat AskAIs Mini SSD as the local high-speed tier in an AI storage architecture.

How Should an AI SSD Be Tested?

Beyond peak sequential read/write, formal testing should include random IOPS, different queue depths, average latency, P95 and P99 tail latency, sustained writes after cache exhaustion, TBW, power, temperature and throttling.

AI workload testing should include PyTorch data loading, different file sizes, multi-worker parallelism, GPU utilization, checkpoint writing, checkpoint recovery and concurrent multi-GPU access.

Testing should also state the processor, GPU, motherboard, OS, driver, framework version, data scale, drive fill level and ambient temperature.

Only with fully disclosed test conditions can results be reproduced and compared by customers.

Hong Kong R&D, Shenzhen Manufacturing

AskAIs Mini SSD is designed by the Hong Kong R&D headquarters and manufactured by Shenzhen Xingwen Chip Technology Co., Ltd.

This model combines Hong Kong's product, R&D and international-market capabilities with Shenzhen's mature electronics supply chain and manufacturing resources.

But for "Hong Kong R&D, Shenzhen manufacturing" to become a genuine brand asset, it still needs to explain the specific division of labour: who is responsible for product definition, PCB, signal integrity, thermals, controller and NAND selection, firmware tuning, production testing, burn-in testing and quality sampling.

Customers do not require every component to be made by the same company. What truly matters is whether the brand can own product definition, quality standards, validation data and after-sales responsibility.

Why Does Stellar Move from AI Software into SSDs?

Stellar AGI Labs started with AskAIs AI applications, models and APIs. Developing these services brought the team into direct contact with datasets, model files, cache, checkpoints and training infrastructure.

So the Mini SSD is not a consumer-electronics attempt entirely detached from the existing business, but part of Stellar's move from the application layer toward models, APIs and underlying hardware.

If the path can form a loop, the software team can provide real AI workloads, the hardware team can optimize the product accordingly, and the new SSD can return to AskAIs training and customer environments for validation.

This software-hardware co-design has more chance of creating differentiation than generic benchmarks alone.

Questions to Answer Before Official Launch

AskAIs Mini SSD still needs to disclose the controller, NAND, PCIe generation, NVMe version, DRAM, SLC cache strategy, sequential and random performance, sustained writes, TBW, temperature, power and warranty.

It also needs to state whether it has power-loss protection, SMART monitoring, firmware updates and an RMA mechanism, and which motherboards, workstations and operating systems have completed compatibility testing.

For "self-developed", professional markets will also care about which parts Stellar is responsible for and which use partner solutions. Clearly explaining technical boundaries does not weaken a brand; it builds credibility.

Conclusion: AI Hardware Must Ultimately Be Proven by Workloads

AskAIs Mini SSD is Stellar AGI Labs' first step from AI applications, models and APIs toward underlying hardware.

The judgement behind it is correct: AI infrastructure is not only GPUs; storage equally affects data supply, checkpoints, recovery time and overall accelerator utilization.

M.2 NVMe 2280 provides a widely available, compact product base, and 128GB to 4TB cover different cost and capacity needs; Hong Kong R&D and Shenzhen manufacturing provide the organizational conditions for rapid iteration.

But an AI SSD should not be just three letters on the packaging. It must be able to answer a concrete question: when the model reads data, saves progress or recovers from failure, can this SSD stably reduce waiting and keep compute resources working?

When that answer can be proven through open, transparent, reproducible testing, AskAIs Mini SSD will truly become the first cornerstone of Stellar's self-reliant compute and storage ecosystem.