[closed book question] If a program consists of 10 dynamic instructions, and each CUDA Block has 128 threads, and the width of a warp is 32 threads, how many times does that SM fetch an instruction? Assume there is only one CUDA block for this question.
Blog
[closed book question] Using Amdahl’s Law, if an application…
[closed book question] Using Amdahl’s Law, if an application has a 10% serial portion, how many processors are needed to achieve a 10x speedup? Choose the most closet answer.
[Open book question] Which basic blocks would have diverge…
[Open book question] Which basic blocks would have divergent branches? Choose all that apply.
[Openbook question] #define N 10000 __global__ void vectorA…
[Openbook question] #define N 10000 __global__ void vectorAdd(float *a, float *b, float *c) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < N/10) c[idx*10] = a[idx*10] + b[idx*10]; } How can we improve the Floating-point operations per byte for the above code? There are 4 CUDA blocks, and each CUDA block has 10 threads. choose all
[Open book] Discuss the pros and cons of having one large m…
[Open book] Discuss the pros and cons of having one large matrix size of tensor cores versus many small matrix sizes of tensor cores in GPUs.
[Openbook question] #define N 10000 __global__ void vectorA…
[Openbook question] #define N 10000 __global__ void vectorAdd(float *a, float *b, float *c) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < N/10) c[idx*10] = a[idx*10] + b[idx*10]; } In the above code, what will be the Floating-point operations per Byte? Assume that the memory transaction size is 128B and there is no cache. Choose the closest value.
[Open book] #define N 10000 __global__ void vectorAdd(float…
[Open book] #define N 10000 __global__ void vectorAdd(float *a, float *b, float *c) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < N/10) c[idx*10] = a[idx*10] + b[idx*10]; } Assuming 100 CUDA blocks, each consisting of 100 threads, with a warp width of 16, and a page size of 4KB, what optimizations would be most helpful in reducing address translation overhead in this code?
[closed book question] Between OpenMP and MPI, which of the…
[closed book question] Between OpenMP and MPI, which of the programming models require directive for critical sessions ?
Wordsworth believed that, “All good poetry is ________.”
Wordsworth believed that, “All good poetry is ________.”
One of the Innocence poems states “thousands of [children],…
One of the Innocence poems states “thousands of [children], Dick, Joe, Ned, and Jack/were all of them lock’d up in coffins of black.” “Coffins of black” is a metaphor for