public abstract class Kernel extends AbstractOpenCLObject
Terminology:
A Kernel is executed in parallel. In total number of parallel threads,
called work items, are specified by the global work size (of type
Kernel.WorkSize
). These threads are organized in a 1-D, 2-D or 3-D grid
(of course, this is only a logical view). Inside each kernel,
the id of each thread (i.e. the index inside this grid) can be requested
by get_global_id(dimension)
with dimension=0,1,2
.
Not all threads can always be executed in parallel because there simply might
not be enough processor cores.
Therefore, the concept of a work group is introduced. The work group
specifies the actual number of threads that are executed in parallel.
The maximal size of it can be queried by Device.getMaxiumWorkItemsPerGroup()
.
Again, the threads inside the work group can be organized in a 1D, 2D or 3D
grid, but this is also just a logical view (specifying how the threads are
indexed).
The work group is important for another concept: shared memory
Unlike the normal global or constant memory (passing a Buffer
object
as argument), shared memory can't be set from outside. Shared memory is
allocated by the kernel and is only valid within the kernel. It is used
to quickly share data between threads within a work group.
The size of the shared memory is specified by setting an instance of
Kernel.LocalMem
or Kernel.LocalMemPerElement
as argument.
Due to heavy register usage or other reasons, a kernel might not be able
to utilize a whole work group. Therefore, the actual number of threads
that can be executed in a work group can be queried by
getMaxWorkGroupSize(com.jme3.opencl.Device)
, which might differ from the
value returned from the Device.
There are two ways to launch a kernel:
First, arguments and the work group sizes can be set in advance
(setArg(index, ...)
, setGlobalWorkSize(...)
and setWorkGroupSize(...)
.
Then a kernel is launched by Run(com.jme3.opencl.CommandQueue)
.
Second, two convenient functions are provided that set the arguments
and work sizes in one call:
Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
and Run2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
.
Program.createKernel(java.lang.String)
Modifier and Type | Class and Description |
---|---|
static class |
Kernel.LocalMem
A placeholder for kernel arguments representing local kernel memory.
|
static class |
Kernel.LocalMemPerElement
A placeholder for a kernel argument representing local kernel memory per thread.
|
static class |
Kernel.WorkSize
The work size (global and local) for executing a kernel
|
OpenCLObject.ObjectReleaser
Modifier and Type | Field and Description |
---|---|
protected Kernel.WorkSize |
globalWorkSize
The current global work size
|
protected Kernel.WorkSize |
workGroupSize
The current local work size
|
releaser
Modifier | Constructor and Description |
---|---|
protected |
Kernel(OpenCLObject.ObjectReleaser releaser) |
Modifier and Type | Method and Description |
---|---|
abstract int |
getArgCount() |
Kernel.WorkSize |
getGlobalWorkSize() |
abstract long |
getMaxWorkGroupSize(Device device)
Returns the maximal work group size when this kernel is executed on
the specified device
|
abstract java.lang.String |
getName() |
Kernel.WorkSize |
getWorkGroupSize() |
Kernel |
register()
Registers this object for automatic releasing on garbage collection.
|
abstract Event |
Run(CommandQueue queue)
Launches the kernel with the current global work size, work group size
and arguments.
|
Event |
Run1(CommandQueue queue,
Kernel.WorkSize globalWorkSize,
java.lang.Object... args)
Sets the work sizes and arguments in one call and launches the kernel.
|
void |
Run1NoEvent(CommandQueue queue,
Kernel.WorkSize globalWorkSize,
java.lang.Object... args)
Sets the work sizes and arguments in one call and launches the kernel.
|
Event |
Run2(CommandQueue queue,
Kernel.WorkSize globalWorkSize,
Kernel.WorkSize workGroupSize,
java.lang.Object... args)
Sets the work sizes and arguments in one call and launches the kernel.
|
void |
Run2NoEvent(CommandQueue queue,
Kernel.WorkSize globalWorkSize,
Kernel.WorkSize workGroupSize,
java.lang.Object... args)
Sets the work sizes and arguments in one call and launches the kernel.
|
void |
RunNoEvent(CommandQueue queue)
Launches the kernel with the current global work size, work group size
and arguments without returning an event object.
|
abstract void |
setArg(int index,
Buffer t) |
abstract void |
setArg(int index,
byte b) |
abstract void |
setArg(int index,
java.nio.ByteBuffer buffer,
long size)
Raw version to set an argument.
|
abstract void |
setArg(int index,
double d) |
abstract void |
setArg(int index,
float f) |
abstract void |
setArg(int index,
Image i) |
abstract void |
setArg(int index,
int i) |
abstract void |
setArg(int index,
Kernel.LocalMem t) |
abstract void |
setArg(int index,
Kernel.LocalMemPerElement t) |
abstract void |
setArg(int index,
long l) |
void |
setArg(int index,
Matrix3f mat) |
abstract void |
setArg(int index,
Matrix4f mat) |
void |
setArg(int index,
java.lang.Object arg)
Sets the kernel argument at the specified index.
The argument must be a known type: LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int,
long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f . |
abstract void |
setArg(int index,
Quaternion q) |
abstract void |
setArg(int index,
short s) |
abstract void |
setArg(int index,
Vector2f v) |
abstract void |
setArg(int index,
Vector4f v) |
void |
setGlobalWorkSize(int size)
Sets the global work size to a 1D grid
|
void |
setGlobalWorkSize(int width,
int height)
Sets the global work size to be a 2D grid
|
void |
setGlobalWorkSize(int width,
int height,
int depth)
Sets the global work size to be a 3D grid
|
void |
setGlobalWorkSize(Kernel.WorkSize ws)
Sets the global work size.
|
void |
setWorkGroupSdize(int width,
int height,
int depth)
Sets the work group size to be a 3D grid
|
void |
setWorkGroupSize(int size)
Sets the work group size to be a 1D grid
|
void |
setWorkGroupSize(int width,
int height)
Sets the work group size to be a 2D grid
|
void |
setWorkGroupSize(Kernel.WorkSize ws)
Sets the work group size
|
void |
setWorkGroupSizeToNull()
Tells the driver to figure out the work group size on their own.
|
java.lang.String |
toString() |
finalize, getReleaser, release
protected final Kernel.WorkSize globalWorkSize
protected final Kernel.WorkSize workGroupSize
protected Kernel(OpenCLObject.ObjectReleaser releaser)
public Kernel register()
OpenCLObject
OpenCLObjectManager
, you have to release it manually
by calling OpenCLObject.release()
.
Without registering or releasing, a memory leak might occur.
this
to allow calls like
Buffer buffer = clContext.createBuffer(1024).register();
.register
in interface OpenCLObject
register
in class AbstractOpenCLObject
this
public abstract java.lang.String getName()
public abstract int getArgCount()
public Kernel.WorkSize getGlobalWorkSize()
public void setGlobalWorkSize(Kernel.WorkSize ws)
ws
- the work size to setpublic void setGlobalWorkSize(int size)
size
- the size in 1Dpublic void setGlobalWorkSize(int width, int height)
width
- the widthheight
- the heightpublic void setGlobalWorkSize(int width, int height, int depth)
width
- the widthheight
- the heightdepth
- the depthpublic Kernel.WorkSize getWorkGroupSize()
public void setWorkGroupSize(Kernel.WorkSize ws)
ws
- the work group size to setpublic void setWorkGroupSize(int size)
size
- the size to setpublic void setWorkGroupSize(int width, int height)
width
- the widthheight
- the heightpublic void setWorkGroupSdize(int width, int height, int depth)
width
- the widthheight
- the heightdepth
- the depthpublic void setWorkGroupSizeToNull()
Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
implicitly calls this method.public abstract long getMaxWorkGroupSize(Device device)
device
- the devicepublic abstract void setArg(int index, Kernel.LocalMemPerElement t)
public abstract void setArg(int index, Kernel.LocalMem t)
public abstract void setArg(int index, Buffer t)
public abstract void setArg(int index, Image i)
public abstract void setArg(int index, byte b)
public abstract void setArg(int index, short s)
public abstract void setArg(int index, int i)
public abstract void setArg(int index, long l)
public abstract void setArg(int index, float f)
public abstract void setArg(int index, double d)
public abstract void setArg(int index, Vector2f v)
public abstract void setArg(int index, Vector4f v)
public abstract void setArg(int index, Quaternion q)
public abstract void setArg(int index, Matrix4f mat)
public void setArg(int index, Matrix3f mat)
public abstract void setArg(int index, java.nio.ByteBuffer buffer, long size)
size
bytes of the provided byte buffer are copied to the kernel
argument. The size in bytes must match exactly the argument size
as defined in the kernel code.
Use this method to send custom structures to the kernelindex
- the index of the argumentbuffer
- the raw buffersize
- the size in bytespublic void setArg(int index, java.lang.Object arg)
LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int,
long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f
.
float16
(row major).index
- the index of the argument, from 0 to getArgCount()
-1arg
- the argumentjava.lang.IllegalArgumentException
- if the argument type is not one of the listed onespublic abstract Event Run(CommandQueue queue)
RunNoEvent(com.jme3.opencl.CommandQueue)
might bring a better performance.queue
- the command queuesetGlobalWorkSize(com.jme3.opencl.Kernel.WorkSize)
,
setWorkGroupSize(com.jme3.opencl.Kernel.WorkSize)
,
setArg(int, java.lang.Object)
public void RunNoEvent(CommandQueue queue)
Run(com.jme3.opencl.CommandQueue)
.queue
- the command queuesetGlobalWorkSize(com.jme3.opencl.Kernel.WorkSize)
,
setWorkGroupSize(com.jme3.opencl.Kernel.WorkSize)
,
setArg(int, java.lang.Object)
public Event Run1(CommandQueue queue, Kernel.WorkSize globalWorkSize, java.lang.Object... args)
setArg(int, java.lang.Object)
.queue
- the command queueglobalWorkSize
- the global work sizeargs
- the kernel argumentsRun2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
public void Run1NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, java.lang.Object... args)
setArg(int, java.lang.Object)
.
The generated event is directly released. Therefore, the performance
is better, but there is no way to detect when the kernel execution
has finished. For this purpose, use
Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
.queue
- the command queueglobalWorkSize
- the global work sizeargs
- the kernel argumentsRun2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
public Event Run2(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, java.lang.Object... args)
queue
- the command queueglobalWorkSize
- the global work sizeworkGroupSize
- the work group sizeargs
- the kernel argumentspublic void Run2NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, java.lang.Object... args)
Run2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...)
.queue
- the command queueglobalWorkSize
- the global work sizeworkGroupSize
- the work group sizeargs
- the kernel argumentspublic java.lang.String toString()
toString
in class java.lang.Object