[Openbook question]  #define N 10000 __global__ void vectorA…

[Openbook question]  #define N 10000 __global__ void vectorAdd(float *a, float *b, float *c) {     int idx = blockIdx.x * blockDim.x + threadIdx.x;     if (idx < N/10)         c[idx*10] = a[idx*10] + b[idx*10]; } How can we improve the Floating-point operations per byte for the above code? There are 4 CUDA blocks, and each CUDA block has 10 threads.   choose all 

[Openbook question]  #define N 10000 __global__ void vectorA…

[Openbook question]  #define N 10000 __global__ void vectorAdd(float *a, float *b, float *c) {     int idx = blockIdx.x * blockDim.x + threadIdx.x;     if (idx < N/10)         c[idx*10] = a[idx*10] + b[idx*10]; } In the above code, what will be the Floating-point operations per Byte? Assume that the memory transaction size is 128B and there is no cache. Choose the closest value.

[Open book] #define N 10000 __global__ void vectorAdd(float…

[Open book] #define N 10000 __global__ void vectorAdd(float *a, float *b, float *c) {     int idx = blockIdx.x * blockDim.x + threadIdx.x;     if (idx < N/10)         c[idx*10] = a[idx*10] + b[idx*10]; } Assuming 100 CUDA blocks, each consisting of 100 threads, with a warp width of 16, and a page size of 4KB, what optimizations would be most helpful in reducing address translation overhead in this code?

For the following passage by Burns, (1) name the poem and (2…

For the following passage by Burns, (1) name the poem and (2) briefly explain its significance.   O wad some Pow’r the giftie gie us To see oursels as others see us! It wad frae monie a blunder free us An’ foolish notion: What airs in dress an’ gait wad lea’e us, And ev’n Devotion!