之前有簡單介紹過 nVIDIA 的 CUDA 4.1 的 RC2 了。而現在，CUDA 4.1 的正式版、以及 Parallel Nsight 2.1 的正式版，也都正式推出了！他們的官方發布新聞，分別是：
以 CUDA Toolkit 來說，他主要的變更包括了：
- ~10% performance improvement using the new LLVM-based CUDA compiler.
- Over 1000 new imaging and signal processing functions added to the NVIDIA Performance Primitives library. If you do image processing, NPP has a GPU accelerated function for you.
- Completely re-designed Visual Profiler with a new automated expert system to give you step-by-step performance optimizations.
而 Parallel Nsight 2.1 的主要更新為：
- New Frame Timings page allows DirectX developers to get to the exact measured draw call timings in isolation or in concurrent execution of the GPU.
- Traced workloads can now navigate the dependencies and call stack to allow the developer to follow through GPU workloads, corresponding API calls and host code that was the cause of the activity.
- The new CUDA information tool window gives detailed information about the state of CUDA launches in the user’s application. Users can filter and find detailed information about exceptions, asserts, breakpoints, MMU faults, and easily switch to a specific warp of interest to debug problems.
- CUDA warp watch visualizes variables and expressions across an entire CUDA warp.
- CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput.
- Multiple bug fixes and stability improvements