What's New in the Optimized Build
The optimized version of NDM-TCP (ndm_tcp_lkm_optimized.c) introduces several low-level improvements over the standard v1.0 implementation while maintaining the same algorithmic approach.
Key Optimizations
1. AVX/SIMD Acceleration
The optimized version includes x86_64 AVX intrinsics for neural network forward pass computation. When available, the first hidden layer computation uses vectorized vpmaddwd and vphaddd instructions to parallelize multiply-accumulate operations across input features.
2. Reduced Memory Footprint
- RTT history window compressed from 16 to 8 slots
- Hidden layer reduced from 8 to 4 neurons
- Input features streamlined from 8 to 6
- Total struct size optimized to exactly 64 bytes (ICSK_CA_PRIV_SIZE limit)
3. LUT-Based Activation Functions
Pre-computed lookup tables replace runtime calculations for tanh and sigmoid, trading ~500 bytes of read-only memory for faster activation.
4. Fast Entropy Calculation
Histogram binning uses fixed-point arithmetic and bit shifts instead of floating-point division, with an 8-bin LUT for entropy values.
5. Computation Caching
When network conditions are stable (low entropy, high plasticity), the module reuses the previous cwnd delta for up to 8 consecutive ACKs, skipping neural network inference.
Important Disclaimers
This optimized version does NOT represent actual performance gains from the real NDM-TCP v1.0 algorithm. The optimizations here are purely implementation-level improvements for compilation and runtime efficiency. The core congestion control logic and performance characteristics remain conceptually similar to the standard version.
These changes primarily affect:
- CPU cycles per packet processing(expecting 56% to 62% total reduction)
- Memory cache efficiency
- Compilation time and binary size
They do not fundamentally alter the network throughput, latency, or congestion response that users would observe in real-world testing.
Compilation Instructions
To compile the optimized version:
# Option 1: Rename and compile
cp ndm_tcp_lkm_optimized.c ndm_tcp_lkm.c
make
# Option 2: Modify Makefile to target optimized source directly
The module requires kernel headers and FPU support configuration. AVX optimizations activate automatically on compatible x86_64 systems.
Repository
Source code and build instructions: https://github.com/hejhdiss/lkm-ndm-tcp
Note: Both versions implement the same entropy-aware, neural network-based TCP congestion control algorithm. Choose the optimized build for production deployments where CPU efficiency matters, or stick with the standard version for easier debugging and code readability.