Class Kernel
- All Implemented Interfaces:
OpenCLObject
- Direct Known Subclasses:
LwjglKernel
Terminology:
A Kernel is executed in parallel. In total number of parallel threads,
called work items, are specified by the global work size (of type
Kernel.WorkSize). These threads are organized in a 1-D, 2-D or 3-D grid
(of course, this is only a logical view). Inside each kernel,
the id of each thread (i.e. the index inside this grid) can be requested
by get_global_id(dimension) with dimension=0,1,2.
Not all threads can always be executed in parallel because there simply might
not be enough processor cores.
Therefore, the concept of a work group is introduced. The work group
specifies the actual number of threads that are executed in parallel.
The maximal size of it can be queried by Device.getMaxiumWorkItemsPerGroup().
Again, the threads inside the work group can be organized in a 1D, 2D or 3D
grid, but this is also just a logical view (specifying how the threads are
indexed).
The work group is important for another concept: shared memory
Unlike the normal global or constant memory (passing a Buffer object
as argument), shared memory can't be set from outside. Shared memory is
allocated by the kernel and is only valid within the kernel. It is used
to quickly share data between threads within a work group.
The size of the shared memory is specified by setting an instance of
Kernel.LocalMem or Kernel.LocalMemPerElement as argument.
Due to heavy register usage or other reasons, a kernel might not be able
to utilize a whole work group. Therefore, the actual number of threads
that can be executed in a work group can be queried by
getMaxWorkGroupSize(com.jme3.opencl.Device), which might differ from the
value returned from the Device.
There are two ways to launch a kernel:
First, arguments and the work group sizes can be set in advance
(setArg(index, ...), setGlobalWorkSize(...) and setWorkGroupSize(...).
Then a kernel is launched by Run(com.jme3.opencl.CommandQueue).
Second, two convenient functions are provided that set the arguments
and work sizes in one call:
Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
and Run2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...).
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classA placeholder for kernel arguments representing local kernel memory.static final classA placeholder for a kernel argument representing local kernel memory per thread.static final classThe work size (global and local) for executing a kernelNested classes/interfaces inherited from interface com.jme3.opencl.OpenCLObject
OpenCLObject.ObjectReleaser -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final Kernel.WorkSizeThe current global work sizeprotected final Kernel.WorkSizeThe current local work sizeFields inherited from class com.jme3.opencl.AbstractOpenCLObject
releaser -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract intabstract longgetMaxWorkGroupSize(Device device) Returns the maximal work group size when this kernel is executed on the specified deviceabstract StringgetName()register()Registers this object for automatic releasing on garbage collection.abstract EventRun(CommandQueue queue) Launches the kernel with the current global work size, work group size and arguments.Run1(CommandQueue queue, Kernel.WorkSize globalWorkSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.voidRun1NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.Run2(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.voidRun2NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.voidRunNoEvent(CommandQueue queue) Launches the kernel with the current global work size, work group size and arguments without returning an event object.abstract voidsetArg(int index, byte b) abstract voidsetArg(int index, double d) abstract voidsetArg(int index, float f) abstract voidsetArg(int index, int i) abstract voidsetArg(int index, long l) abstract voidsetArg(int index, short s) voidabstract voidabstract voidsetArg(int index, Quaternion q) abstract voidabstract voidabstract voidabstract voidabstract voidsetArg(int index, Kernel.LocalMem t) abstract voidsetArg(int index, Kernel.LocalMemPerElement t) voidSets the kernel argument at the specified index.
The argument must be a known type:LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int, long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f.abstract voidsetArg(int index, ByteBuffer buffer, long size) Raw version to set an argument.voidsetGlobalWorkSize(int size) Sets the global work size to a 1D gridvoidsetGlobalWorkSize(int width, int height) Sets the global work size to be a 2D gridvoidsetGlobalWorkSize(int width, int height, int depth) Sets the global work size to be a 3D gridvoidSets the global work size.voidsetWorkGroupSdize(int width, int height, int depth) Sets the work group size to be a 3D gridvoidsetWorkGroupSize(int size) Sets the work group size to be a 1D gridvoidsetWorkGroupSize(int width, int height) Sets the work group size to be a 2D gridvoidSets the work group sizevoidTells the driver to figure out the work group size on their own.toString()Methods inherited from class com.jme3.opencl.AbstractOpenCLObject
finalize, getReleaser, release
-
Field Details
-
globalWorkSize
The current global work size -
workGroupSize
The current local work size
-
-
Constructor Details
-
Kernel
-
-
Method Details
-
register
Description copied from interface:OpenCLObjectRegisters this object for automatic releasing on garbage collection. By default, OpenCLObjects are not registered in theOpenCLObjectManager, you have to release it manually by callingOpenCLObject.release(). Without registering or releasing, a memory leak might occur.
Returnsthisto allow calls likeBuffer buffer = clContext.createBuffer(1024).register();.- Specified by:
registerin interfaceOpenCLObject- Overrides:
registerin classAbstractOpenCLObject- Returns:
this
-
getName
- Returns:
- the name of the kernel as defined in the program source code
-
getArgCount
public abstract int getArgCount()- Returns:
- the number of arguments
-
getGlobalWorkSize
- Returns:
- the current global work size
-
setGlobalWorkSize
Sets the global work size.- Parameters:
ws- the work size to set
-
setGlobalWorkSize
public void setGlobalWorkSize(int size) Sets the global work size to a 1D grid- Parameters:
size- the size in 1D
-
setGlobalWorkSize
public void setGlobalWorkSize(int width, int height) Sets the global work size to be a 2D grid- Parameters:
width- the widthheight- the height
-
setGlobalWorkSize
public void setGlobalWorkSize(int width, int height, int depth) Sets the global work size to be a 3D grid- Parameters:
width- the widthheight- the heightdepth- the depth
-
getWorkGroupSize
- Returns:
- the current work group size
-
setWorkGroupSize
Sets the work group size- Parameters:
ws- the work group size to set
-
setWorkGroupSize
public void setWorkGroupSize(int size) Sets the work group size to be a 1D grid- Parameters:
size- the size to set
-
setWorkGroupSize
public void setWorkGroupSize(int width, int height) Sets the work group size to be a 2D grid- Parameters:
width- the widthheight- the height
-
setWorkGroupSdize
public void setWorkGroupSdize(int width, int height, int depth) Sets the work group size to be a 3D grid- Parameters:
width- the widthheight- the heightdepth- the depth
-
setWorkGroupSizeToNull
public void setWorkGroupSizeToNull()Tells the driver to figure out the work group size on their own. Use this if you do not rely on specific work group layouts, i.e. because shared memory is not used.Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)implicitly calls this method. -
getMaxWorkGroupSize
Returns the maximal work group size when this kernel is executed on the specified device- Parameters:
device- the device- Returns:
- the maximal work group size
-
setArg
-
setArg
-
setArg
-
setArg
-
setArg
public abstract void setArg(int index, byte b) -
setArg
public abstract void setArg(int index, short s) -
setArg
public abstract void setArg(int index, int i) -
setArg
public abstract void setArg(int index, long l) -
setArg
public abstract void setArg(int index, float f) -
setArg
public abstract void setArg(int index, double d) -
setArg
-
setArg
-
setArg
-
setArg
-
setArg
-
setArg
Raw version to set an argument.sizebytes of the provided byte buffer are copied to the kernel argument. The size in bytes must match exactly the argument size as defined in the kernel code. Use this method to send custom structures to the kernel- Parameters:
index- the index of the argumentbuffer- the raw buffersize- the size in bytes
-
setArg
Sets the kernel argument at the specified index.
The argument must be a known type:LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int, long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f.
Note: Matrix3f and Matrix4f will be mapped to afloat16(row major).- Parameters:
index- the index of the argument, from 0 togetArgCount()-1arg- the argument- Throws:
IllegalArgumentException- if the argument type is not one of the listed ones
-
Run
Launches the kernel with the current global work size, work group size and arguments. If the returned event object is not needed and would otherwise be released immediately,RunNoEvent(com.jme3.opencl.CommandQueue)might bring a better performance.- Parameters:
queue- the command queue- Returns:
- an event object indicating when the kernel is finished
- See Also:
-
RunNoEvent
Launches the kernel with the current global work size, work group size and arguments without returning an event object. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, useRun(com.jme3.opencl.CommandQueue).- Parameters:
queue- the command queue- See Also:
-
Run1
Sets the work sizes and arguments in one call and launches the kernel. The global work size is set to the specified size. The work group size is automatically determined by the driver. Each object in the argument array is sent to the kernel bysetArg(int, java.lang.Object).- Parameters:
queue- the command queueglobalWorkSize- the global work sizeargs- the kernel arguments- Returns:
- an event object indicating when the kernel is finished
- See Also:
-
Run1NoEvent
Sets the work sizes and arguments in one call and launches the kernel. The global work size is set to the specified size. The work group size is automatically determined by the driver. Each object in the argument array is sent to the kernel bysetArg(int, java.lang.Object). The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, useRun1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...).- Parameters:
queue- the command queueglobalWorkSize- the global work sizeargs- the kernel arguments- See Also:
-
Run2
public Event Run2(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.- Parameters:
queue- the command queueglobalWorkSize- the global work sizeworkGroupSize- the work group sizeargs- the kernel arguments- Returns:
- an event object indicating when the kernel is finished
-
Run2NoEvent
public void Run2NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, useRun2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...).- Parameters:
queue- the command queueglobalWorkSize- the global work sizeworkGroupSize- the work group sizeargs- the kernel arguments
-
toString
-