English | 简体中文
A high-performance CUDA-based image processing library — a mini OpenCV with GPU-accelerated operators covering pixel operations, convolution, morphology, geometric transforms, filtering, color space conversion, and async pipeline processing.
┌──────────────────────────────────────────────────────┐
│ Application Layer │
│ ImageProcessor · PipelineProcessor │
├──────────────────────────────────────────────────────┤
│ Operator Layer (CUDA Kernels) │
│ PixelOperator │ ConvolutionEngine │ HistogramCalc │
│ ImageResizer │ Morphology │ Threshold │
│ ColorSpace │ Geometric │ Filters │
│ ImageArithmetic │
├──────────────────────────────────────────────────────┤
│ Infrastructure Layer │
│ DeviceBuffer · MemoryManager · StreamManager │
│ GpuImage · HostImage · ImageIO (stb) · CudaError │
└──────────────────────────────────────────────────────┘
| Category | Operators | Highlights |
|---|---|---|
| Pixel Ops | Invert, grayscale, brightness | Per-pixel parallel |
| Convolution | Gaussian blur, Sobel edge detection, custom kernels | Shared memory tiling |
| Histogram | Calculation, equalization | Atomic ops + parallel reduction |
| Scaling | Bilinear, nearest-neighbor | Arbitrary size |
| Morphology | Erosion, dilation, open/close, gradient, top/black-hat | Custom structuring elements |
| Threshold | Global, adaptive, Otsu auto | Histogram-driven |
| Color Space | RGB/HSV/YUV conversion, channel split/merge | Batch conversion |
| Geometric | Rotate, flip, affine, perspective, crop, pad | Bilinear interpolation |
| Filters | Median, bilateral, box, sharpen, Laplacian | Edge-preserving |
| Arithmetic | Add, subtract, multiply, blend, weighted sum, abs diff | Scalar & image |
| Pipeline | Multi-step chaining, batch async processing | Multi-stream concurrency |
- CUDA Toolkit 11.0+
- CMake 3.18+
- C++17 compatible compiler
- NVIDIA GPU (Compute Capability 7.5+)
mkdir build && cd build
cmake -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON ..
make -j$(nproc)
# Run tests
ctest --output-on-failure| Option | Default | Description |
|---|---|---|
BUILD_TESTS |
ON | Build unit tests (GTest v1.14.0) |
BUILD_EXAMPLES |
ON | Build example programs |
BUILD_BENCHMARKS |
OFF | Build benchmarks (Google Benchmark v1.8.3) |
GPU_IMAGE_ENABLE_IO |
ON | Enable image file I/O via stb |
#include "gpu_image/gpu_image_processing.hpp"
using namespace gpu_image;
ImageProcessor processor;
GpuImage gpuImage = processor.loadFromHost(hostImage);
GpuImage blurred = processor.gaussianBlur(gpuImage, 5, 1.5f);
GpuImage edges = processor.sobelEdgeDetection(gpuImage);
GpuImage gray = processor.toGrayscale(gpuImage);
HostImage result = processor.downloadImage(blurred);PipelineProcessor pipeline(4); // 4 CUDA streams
pipeline.addStep([](GpuImage& img, cudaStream_t s) {
GpuImage temp;
ConvolutionEngine::gaussianBlur(img, temp, 3, 1.0f, s);
img = std::move(temp);
});
std::vector<HostImage> outputs = pipeline.processBatchHost(inputs);| Architecture | Compute Capability | Examples |
|---|---|---|
| Turing | SM 75 | RTX 20xx / T4 |
| Ampere | SM 80 / 86 | A100 / RTX 30xx |
| Ada Lovelace | SM 89 | RTX 40xx / L4 |
| Hopper | SM 90 | H100 |
mini-opencv/
├── include/gpu_image/ # Public headers (19 modules)
│ ├── gpu_image_processing.hpp # Unified entry header
│ ├── image_processor.hpp # High-level sync API
│ ├── pipeline_processor.hpp # Pipeline async API
│ ├── convolution_engine.hpp # Convolution operators
│ ├── morphology.hpp # Morphological operators
│ ├── geometric.hpp # Geometric transforms
│ ├── filters.hpp # Filters + image arithmetic
│ ├── color_space.hpp # Color space conversion
│ ├── threshold.hpp # Thresholding
│ ├── device_buffer.hpp # RAII GPU memory
│ └── ... # cuda_error, gpu_image, stream_manager, etc.
├── src/ # CUDA/C++ source files (16)
├── tests/ # Unit tests (12 test files)
├── examples/ # Example programs
│ ├── basic_example.cpp # Basic usage
│ └── pipeline_example.cpp # Pipeline usage
├── benchmarks/ # Performance benchmarks
└── CMakeLists.txt # Build system
- Modern CMake — Target-based compile options with generator expressions,
BUILD_INTERFACE/INSTALL_INTERFACE - FetchContent dependencies — GTest v1.14.0, Google Benchmark v1.8.3, stb (no manual third-party installs)
- Auto GPU arch detection — CMake 3.24+ uses
native, older versions fall back to common arch list - Install support —
gpu_image::gpu_image_processingCMake export target - Version injection — Compile-time
GPU_IMAGE_VERSION_MAJOR/MINOR/PATCHmacros - CI pipeline — GitHub Actions: CUDA build + clang-format check
- Full test coverage — 12 test files covering all operator modules
- Cross-platform flags — GCC/Clang (
-Wall -Wextra -Wpedantic) + MSVC (/W4)
MIT License