Class Kernel
- All Implemented Interfaces:
OpenCLObject
- Direct Known Subclasses:
LwjglKernel
Terminology:
A Kernel is executed in parallel. In total number of parallel threads,
called work items, are specified by the global work size (of type
Kernel.WorkSize
). These threads are organized in a 1-D, 2-D or 3-D grid
(of course, this is only a logical view). Inside each kernel,
the id of each thread (i.e. the index inside this grid) can be requested
by get_global_id(dimension)
with dimension=0,1,2
.
Not all threads can always be executed in parallel because there simply might
not be enough processor cores.
Therefore, the concept of a work group is introduced. The work group
specifies the actual number of threads that are executed in parallel.
The maximal size of it can be queried by Device.getMaxiumWorkItemsPerGroup()
.
Again, the threads inside the work group can be organized in a 1D, 2D or 3D
grid, but this is also just a logical view (specifying how the threads are
indexed).
The work group is important for another concept: shared memory
Unlike the normal global or constant memory (passing a Buffer
object
as argument), shared memory can't be set from outside. Shared memory is
allocated by the kernel and is only valid within the kernel. It is used
to quickly share data between threads within a work group.
The size of the shared memory is specified by setting an instance of
Kernel.LocalMem
or Kernel.LocalMemPerElement
as argument.
Due to heavy register usage or other reasons, a kernel might not be able
to utilize a whole work group. Therefore, the actual number of threads
that can be executed in a work group can be queried by
getMaxWorkGroupSize(com.jme3.opencl.Device)
, which might differ from the
value returned from the Device.
There are two ways to launch a kernel:
First, arguments and the work group sizes can be set in advance
(setArg(index, ...)
, setGlobalWorkSize(...)
and setWorkGroupSize(...)
.
Then a kernel is launched by Run(com.jme3.opencl.CommandQueue)
.
Second, two convenient functions are provided that set the arguments
and work sizes in one call:
Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
and Run2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
.
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
A placeholder for kernel arguments representing local kernel memory.static final class
A placeholder for a kernel argument representing local kernel memory per thread.static final class
The work size (global and local) for executing a kernelNested classes/interfaces inherited from interface com.jme3.opencl.OpenCLObject
OpenCLObject.ObjectReleaser
-
Field Summary
Modifier and TypeFieldDescriptionprotected final Kernel.WorkSize
The current global work sizeprotected final Kernel.WorkSize
The current local work sizeFields inherited from class com.jme3.opencl.AbstractOpenCLObject
releaser
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract int
abstract long
getMaxWorkGroupSize
(Device device) Returns the maximal work group size when this kernel is executed on the specified deviceabstract String
getName()
register()
Registers this object for automatic releasing on garbage collection.abstract Event
Run
(CommandQueue queue) Launches the kernel with the current global work size, work group size and arguments.Run1
(CommandQueue queue, Kernel.WorkSize globalWorkSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.void
Run1NoEvent
(CommandQueue queue, Kernel.WorkSize globalWorkSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.Run2
(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.void
Run2NoEvent
(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.void
RunNoEvent
(CommandQueue queue) Launches the kernel with the current global work size, work group size and arguments without returning an event object.abstract void
setArg
(int index, byte b) abstract void
setArg
(int index, double d) abstract void
setArg
(int index, float f) abstract void
setArg
(int index, int i) abstract void
setArg
(int index, long l) abstract void
setArg
(int index, short s) void
abstract void
abstract void
setArg
(int index, Quaternion q) abstract void
abstract void
abstract void
abstract void
abstract void
setArg
(int index, Kernel.LocalMem t) abstract void
setArg
(int index, Kernel.LocalMemPerElement t) void
Sets the kernel argument at the specified index.
The argument must be a known type:LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int, long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f
.abstract void
setArg
(int index, ByteBuffer buffer, long size) Raw version to set an argument.void
setGlobalWorkSize
(int size) Sets the global work size to a 1D gridvoid
setGlobalWorkSize
(int width, int height) Sets the global work size to be a 2D gridvoid
setGlobalWorkSize
(int width, int height, int depth) Sets the global work size to be a 3D gridvoid
Sets the global work size.void
setWorkGroupSdize
(int width, int height, int depth) Sets the work group size to be a 3D gridvoid
setWorkGroupSize
(int size) Sets the work group size to be a 1D gridvoid
setWorkGroupSize
(int width, int height) Sets the work group size to be a 2D gridvoid
Sets the work group sizevoid
Tells the driver to figure out the work group size on their own.toString()
Methods inherited from class com.jme3.opencl.AbstractOpenCLObject
finalize, getReleaser, release
-
Field Details
-
globalWorkSize
The current global work size -
workGroupSize
The current local work size
-
-
Constructor Details
-
Kernel
-
-
Method Details
-
register
Description copied from interface:OpenCLObject
Registers this object for automatic releasing on garbage collection. By default, OpenCLObjects are not registered in theOpenCLObjectManager
, you have to release it manually by callingOpenCLObject.release()
. Without registering or releasing, a memory leak might occur.
Returnsthis
to allow calls likeBuffer buffer = clContext.createBuffer(1024).register();
.- Specified by:
register
in interfaceOpenCLObject
- Overrides:
register
in classAbstractOpenCLObject
- Returns:
this
-
getName
- Returns:
- the name of the kernel as defined in the program source code
-
getArgCount
public abstract int getArgCount()- Returns:
- the number of arguments
-
getGlobalWorkSize
- Returns:
- the current global work size
-
setGlobalWorkSize
Sets the global work size.- Parameters:
ws
- the work size to set
-
setGlobalWorkSize
public void setGlobalWorkSize(int size) Sets the global work size to a 1D grid- Parameters:
size
- the size in 1D
-
setGlobalWorkSize
public void setGlobalWorkSize(int width, int height) Sets the global work size to be a 2D grid- Parameters:
width
- the widthheight
- the height
-
setGlobalWorkSize
public void setGlobalWorkSize(int width, int height, int depth) Sets the global work size to be a 3D grid- Parameters:
width
- the widthheight
- the heightdepth
- the depth
-
getWorkGroupSize
- Returns:
- the current work group size
-
setWorkGroupSize
Sets the work group size- Parameters:
ws
- the work group size to set
-
setWorkGroupSize
public void setWorkGroupSize(int size) Sets the work group size to be a 1D grid- Parameters:
size
- the size to set
-
setWorkGroupSize
public void setWorkGroupSize(int width, int height) Sets the work group size to be a 2D grid- Parameters:
width
- the widthheight
- the height
-
setWorkGroupSdize
public void setWorkGroupSdize(int width, int height, int depth) Sets the work group size to be a 3D grid- Parameters:
width
- the widthheight
- the heightdepth
- the depth
-
setWorkGroupSizeToNull
public void setWorkGroupSizeToNull()Tells the driver to figure out the work group size on their own. Use this if you do not rely on specific work group layouts, i.e. because shared memory is not used.Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
implicitly calls this method. -
getMaxWorkGroupSize
Returns the maximal work group size when this kernel is executed on the specified device- Parameters:
device
- the device- Returns:
- the maximal work group size
-
setArg
-
setArg
-
setArg
-
setArg
-
setArg
public abstract void setArg(int index, byte b) -
setArg
public abstract void setArg(int index, short s) -
setArg
public abstract void setArg(int index, int i) -
setArg
public abstract void setArg(int index, long l) -
setArg
public abstract void setArg(int index, float f) -
setArg
public abstract void setArg(int index, double d) -
setArg
-
setArg
-
setArg
-
setArg
-
setArg
-
setArg
Raw version to set an argument.size
bytes of the provided byte buffer are copied to the kernel argument. The size in bytes must match exactly the argument size as defined in the kernel code. Use this method to send custom structures to the kernel- Parameters:
index
- the index of the argumentbuffer
- the raw buffersize
- the size in bytes
-
setArg
Sets the kernel argument at the specified index.
The argument must be a known type:LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int, long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f
.
Note: Matrix3f and Matrix4f will be mapped to afloat16
(row major).- Parameters:
index
- the index of the argument, from 0 togetArgCount()
-1arg
- the argument- Throws:
IllegalArgumentException
- if the argument type is not one of the listed ones
-
Run
Launches the kernel with the current global work size, work group size and arguments. If the returned event object is not needed and would otherwise be released immediately,RunNoEvent(com.jme3.opencl.CommandQueue)
might bring a better performance.- Parameters:
queue
- the command queue- Returns:
- an event object indicating when the kernel is finished
- See Also:
-
RunNoEvent
Launches the kernel with the current global work size, work group size and arguments without returning an event object. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, useRun(com.jme3.opencl.CommandQueue)
.- Parameters:
queue
- the command queue- See Also:
-
Run1
Sets the work sizes and arguments in one call and launches the kernel. The global work size is set to the specified size. The work group size is automatically determined by the driver. Each object in the argument array is sent to the kernel bysetArg(int, java.lang.Object)
.- Parameters:
queue
- the command queueglobalWorkSize
- the global work sizeargs
- the kernel arguments- Returns:
- an event object indicating when the kernel is finished
- See Also:
-
Run1NoEvent
Sets the work sizes and arguments in one call and launches the kernel. The global work size is set to the specified size. The work group size is automatically determined by the driver. Each object in the argument array is sent to the kernel bysetArg(int, java.lang.Object)
. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, useRun1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
.- Parameters:
queue
- the command queueglobalWorkSize
- the global work sizeargs
- the kernel arguments- See Also:
-
Run2
public Event Run2(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel.- Parameters:
queue
- the command queueglobalWorkSize
- the global work sizeworkGroupSize
- the work group sizeargs
- the kernel arguments- Returns:
- an event object indicating when the kernel is finished
-
Run2NoEvent
public void Run2NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args) Sets the work sizes and arguments in one call and launches the kernel. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, useRun2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
.- Parameters:
queue
- the command queueglobalWorkSize
- the global work sizeworkGroupSize
- the work group sizeargs
- the kernel arguments
-
toString
-