Home Hello World to OpenCL
Post
Cancel

Hello World to OpenCL

This is going to be a very basic post showing how to set up an OpenCL development environment. The environment assumption I’m gonna work with is:

  • You have Windows installed
  • You have an Intel CPU

I personally do not have access to AMD and other environments so I’ve not tried setup in those but it should not be too different than what we will do. In the next post, I will show a step-by-step process for Linux environment as well, with this hardware.

On your windows system, go to msys2 download the installer and set it up. Optionally, if your internet provider or proxy has blocked pacman from updating, msys won’t be able to install the packages required, so use TDM-GCC as an alternative.

Next, for Intel drivers, the runtime should already be available, we will need the development tools. As of Aug 14, 2023, Intel no longer supports the system_studio way of development, instead we have to install OneAPI

After installation, you can find the OpenAPI libraries at the location: C:\Program Files (x86)\Intel\oneAPI\compiler\2023.2.0\windows\lib

And the header files at: C:\Program Files (x86)\Intel\oneAPI\compiler\2023.2.0\windows\include\sycl\CL

Once you have all this, create a folder for our basic test code. Inside this folder, hold down the shift key and right click. There should be an option “Open PowerShell Window Here”. Click on this, then type in commands:

1
touch main.c Makefile

Your folder should now contain there 2 new files. To test out whether the runtime OpenCL drivers are working in our system, fill the following contents into the Makefile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
OUTPUT = hello.exe

INCLUDE_NVIDIA = -I./src/ \
	-I"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/include"

INCLUDE_INTEL = -I./src/ \
	-I"C:/Program Files (x86)/Intel/oneAPI/compiler/2023.2.0/windows/include/sycl"

SRCS := src/main.c

CFLAGS = -Wall -DCL_TARGET_OPENCL_VERSION=120

LIBS_DIR_NVIDIA = -L"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/lib/x64"

LIBS_DIR_INTEL = -L"C:/Program Files (x86)/Intel/oneAPI/compiler/2023.2.0/windows/lib"

LDFLAGS = -lOpenCL

clean:
	rm -f $(OUTPUT)

nvidia:
	gcc $(INCLUDE_NVIDIA) $(CFLAGS) $(LIBS_DIR_NVIDIA) $(SRCS) -o $(OUTPUT) $(LDFLAGS)

intel:
	gcc $(INCLUDE_INTEL) $(CFLAGS) $(LIBS_DIR_INTEL) $(SRCS) -o $(OUTPUT) $(LDFLAGS)

And use the following as the contents of your main.c file

1
2
3
4
5
6
7
8
9
10
11
12
13
// Hello World for OpenCL
#include <stdio.h>
#include <stdlib.h>

#include <CL/cl.h>

int main()
{
    cl_uint nPlatforms;
    clGetPlatformIDs(0, NULL, &nPlatforms);
    printf("System has %u platforms\n", nPlatforms);
    return 0;
}

From the powershell session, enter:

1
make all

If the setup of msys (or TDM-GCC) and the intel tools was ok, then it should compile without problem. Else, check those installations.

When you run the program, it should show something like

1
2
PS E:\shared\opencl\hello_world> .\hello
System has 5 platforms

If your output shows 0 platforms, the OpenCL runtime isn’t properly installed but this should be rare because by default Intel graphics drivers will support it. Additionally, the OneAPI installation would have made sure to install the required runtimes. Now we’re all set up to try some more interesting stuff. But first, time to study a bit more about OpenCL. Just browse quickly over the next couple of sources, to get a feel for OpenCL.

The OpenCL Specification

Recommend to go through the intro sections of both the specs, because they’ve written in very simple but beautiful manner some basic terms and knowledge for how heterogenous computing is possible using OpenCL.

Get info about the platform we’re working on

There is a utility clinfo that is able to generate a very extensive list of information about the platform we’re trying to use OpenCL on, however, for beginning it becomes daunting. Anyway, reinventing the wheel a bit in beginning is good for learning the basics. Let’s just see what are some basic devices that support OpenCL on our platform:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>

#define numPlatInfoItems 5

struct platInfo {
    int ID;
    char name[50];
};
    
void printPlatInfo(int index, cl_platform_id id)
{    
    static struct platInfo infTable[numPlatInfoItems] = {
        {CL_PLATFORM_PROFILE, "Profile"},
        {CL_PLATFORM_VERSION, "Version"},
        {CL_PLATFORM_NAME, "Name"},
        {CL_PLATFORM_VENDOR, "Vendor"},
        {CL_PLATFORM_EXTENSIONS, "Extensions"},
    };

    printf("\n--------------------------------\nPlatform details: Index %d\n", index);
    for (int i = 0; i < numPlatInfoItems; i++) {
        size_t plat_param_size;
        clGetPlatformInfo(id, infTable[i].ID, 0, NULL, &plat_param_size);
        char *param = calloc(plat_param_size, sizeof(char));
        clGetPlatformInfo(id, infTable[i].ID, plat_param_size, param, NULL);
        printf("\t%s -- %s\n", infTable[i].name, param);
        free(param);
    }
}

void InitAndPrintCLInfo(cl_device_id *devices)
{
    // Initialize everything
    int num_platforms;
    cl_platform_id *platforms;
    clGetPlatformIDs(0, NULL, &num_platforms);
    printf("There are %d platforms on this system\n", num_platforms);
    platforms = malloc(num_platforms * sizeof(cl_platform_id));
    clGetPlatformIDs(num_platforms, platforms, NULL);

    for (int i = 0; i < num_platforms; i++) {
       printPlatInfo(i,platforms[i]);
    }
}

int main(int argc, char *argv[])
{
    // Local vars
    cl_device_id *devices;

    InitAndPrintCLInfo(devices);

    return 0;
}

This prints out something similar to the following (your output may vary) –

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
PS E:\shared\opencl\hello_world> .\hellogpu.exe
There are 2 platforms on this system

--------------------------------
Platform details: Index 0
        Profile -- FULL_PROFILE
        Version -- OpenCL 3.0 CUDA 11.6.99
        Name -- NVIDIA CUDA
        Vendor -- NVIDIA Corporation
        Extensions -- cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32
        Host Timer Resolution --
        Numeric Version --
        Extensions with Version --

--------------------------------
Platform details: Index 1
        Profile -- FULL_PROFILE
        Version -- OpenCL 3.0
        Name -- Intel(R) OpenCL HD Graphics
        Vendor -- Intel(R) Corporation
        Extensions -- cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_simultaneous_sharing
        Host Timer Resolution -- d
        Numeric Version --
        Extensions with Version --

We will go into more depth in later articles. But now, with this we’re set up for some exciting stuff.

This post is licensed under CC BY 4.0 by the author.