Class Kernel

All Implemented Interfaces:
OpenCLObject
Direct Known Subclasses:
LwjglKernel

public abstract class Kernel extends AbstractOpenCLObject
Wrapper for an OpenCL kernel, a piece of executable code on the GPU.

Terminology:
A Kernel is executed in parallel. In total number of parallel threads, called work items, are specified by the global work size (of type Kernel.WorkSize). These threads are organized in a 1-D, 2-D or 3-D grid (of course, this is only a logical view). Inside each kernel, the id of each thread (i.e. the index inside this grid) can be requested by get_global_id(dimension) with dimension=0,1,2.
Not all threads can always be executed in parallel because there simply might not be enough processor cores. Therefore, the concept of a work group is introduced. The work group specifies the actual number of threads that are executed in parallel. The maximal size of it can be queried by Device.getMaxiumWorkItemsPerGroup(). Again, the threads inside the work group can be organized in a 1D, 2D or 3D grid, but this is also just a logical view (specifying how the threads are indexed). The work group is important for another concept: shared memory Unlike the normal global or constant memory (passing a Buffer object as argument), shared memory can't be set from outside. Shared memory is allocated by the kernel and is only valid within the kernel. It is used to quickly share data between threads within a work group. The size of the shared memory is specified by setting an instance of Kernel.LocalMem or Kernel.LocalMemPerElement as argument.
Due to heavy register usage or other reasons, a kernel might not be able to utilize a whole work group. Therefore, the actual number of threads that can be executed in a work group can be queried by getMaxWorkGroupSize(com.jme3.opencl.Device), which might differ from the value returned from the Device.

There are two ways to launch a kernel:
First, arguments and the work group sizes can be set in advance (setArg(index, ...), setGlobalWorkSize(...) and setWorkGroupSize(...). Then a kernel is launched by Run(com.jme3.opencl.CommandQueue).
Second, two convenient functions are provided that set the arguments and work sizes in one call: Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...) and Run2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...).

See Also:
  • Field Details

    • globalWorkSize

      protected final Kernel.WorkSize globalWorkSize
      The current global work size
    • workGroupSize

      protected final Kernel.WorkSize workGroupSize
      The current local work size
  • Constructor Details

  • Method Details

    • register

      public Kernel register()
      Description copied from interface: OpenCLObject
      Registers this object for automatic releasing on garbage collection. By default, OpenCLObjects are not registered in the OpenCLObjectManager, you have to release it manually by calling OpenCLObject.release(). Without registering or releasing, a memory leak might occur.
      Returns this to allow calls like Buffer buffer = clContext.createBuffer(1024).register();.
      Specified by:
      register in interface OpenCLObject
      Overrides:
      register in class AbstractOpenCLObject
      Returns:
      this
    • getName

      public abstract String getName()
      Returns:
      the name of the kernel as defined in the program source code
    • getArgCount

      public abstract int getArgCount()
      Returns:
      the number of arguments
    • getGlobalWorkSize

      public Kernel.WorkSize getGlobalWorkSize()
      Returns:
      the current global work size
    • setGlobalWorkSize

      public void setGlobalWorkSize(Kernel.WorkSize ws)
      Sets the global work size.
      Parameters:
      ws - the work size to set
    • setGlobalWorkSize

      public void setGlobalWorkSize(int size)
      Sets the global work size to a 1D grid
      Parameters:
      size - the size in 1D
    • setGlobalWorkSize

      public void setGlobalWorkSize(int width, int height)
      Sets the global work size to be a 2D grid
      Parameters:
      width - the width
      height - the height
    • setGlobalWorkSize

      public void setGlobalWorkSize(int width, int height, int depth)
      Sets the global work size to be a 3D grid
      Parameters:
      width - the width
      height - the height
      depth - the depth
    • getWorkGroupSize

      public Kernel.WorkSize getWorkGroupSize()
      Returns:
      the current work group size
    • setWorkGroupSize

      public void setWorkGroupSize(Kernel.WorkSize ws)
      Sets the work group size
      Parameters:
      ws - the work group size to set
    • setWorkGroupSize

      public void setWorkGroupSize(int size)
      Sets the work group size to be a 1D grid
      Parameters:
      size - the size to set
    • setWorkGroupSize

      public void setWorkGroupSize(int width, int height)
      Sets the work group size to be a 2D grid
      Parameters:
      width - the width
      height - the height
    • setWorkGroupSdize

      public void setWorkGroupSdize(int width, int height, int depth)
      Sets the work group size to be a 3D grid
      Parameters:
      width - the width
      height - the height
      depth - the depth
    • setWorkGroupSizeToNull

      public void setWorkGroupSizeToNull()
      Tells the driver to figure out the work group size on their own. Use this if you do not rely on specific work group layouts, i.e. because shared memory is not used. Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...) implicitly calls this method.
    • getMaxWorkGroupSize

      public abstract long getMaxWorkGroupSize(Device device)
      Returns the maximal work group size when this kernel is executed on the specified device
      Parameters:
      device - the device
      Returns:
      the maximal work group size
    • setArg

      public abstract void setArg(int index, Kernel.LocalMemPerElement t)
    • setArg

      public abstract void setArg(int index, Kernel.LocalMem t)
    • setArg

      public abstract void setArg(int index, Buffer t)
    • setArg

      public abstract void setArg(int index, Image i)
    • setArg

      public abstract void setArg(int index, byte b)
    • setArg

      public abstract void setArg(int index, short s)
    • setArg

      public abstract void setArg(int index, int i)
    • setArg

      public abstract void setArg(int index, long l)
    • setArg

      public abstract void setArg(int index, float f)
    • setArg

      public abstract void setArg(int index, double d)
    • setArg

      public abstract void setArg(int index, Vector2f v)
    • setArg

      public abstract void setArg(int index, Vector4f v)
    • setArg

      public abstract void setArg(int index, Quaternion q)
    • setArg

      public abstract void setArg(int index, Matrix4f mat)
    • setArg

      public void setArg(int index, Matrix3f mat)
    • setArg

      public abstract void setArg(int index, ByteBuffer buffer, long size)
      Raw version to set an argument. size bytes of the provided byte buffer are copied to the kernel argument. The size in bytes must match exactly the argument size as defined in the kernel code. Use this method to send custom structures to the kernel
      Parameters:
      index - the index of the argument
      buffer - the raw buffer
      size - the size in bytes
    • setArg

      public void setArg(int index, Object arg)
      Sets the kernel argument at the specified index.
      The argument must be a known type: LocalMemPerElement, LocalMem, Image, Buffer, byte, short, int, long, float, double, Vector2f, Vector4f, Quaternion, Matrix3f, Matrix4f.
      Note: Matrix3f and Matrix4f will be mapped to a float16 (row major).
      Parameters:
      index - the index of the argument, from 0 to getArgCount()-1
      arg - the argument
      Throws:
      IllegalArgumentException - if the argument type is not one of the listed ones
    • Run

      public abstract Event Run(CommandQueue queue)
      Launches the kernel with the current global work size, work group size and arguments. If the returned event object is not needed and would otherwise be released immediately, RunNoEvent(com.jme3.opencl.CommandQueue) might bring a better performance.
      Parameters:
      queue - the command queue
      Returns:
      an event object indicating when the kernel is finished
      See Also:
    • RunNoEvent

      public void RunNoEvent(CommandQueue queue)
      Launches the kernel with the current global work size, work group size and arguments without returning an event object. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, use Run(com.jme3.opencl.CommandQueue).
      Parameters:
      queue - the command queue
      See Also:
    • Run1

      public Event Run1(CommandQueue queue, Kernel.WorkSize globalWorkSize, Object... args)
      Sets the work sizes and arguments in one call and launches the kernel. The global work size is set to the specified size. The work group size is automatically determined by the driver. Each object in the argument array is sent to the kernel by setArg(int, java.lang.Object).
      Parameters:
      queue - the command queue
      globalWorkSize - the global work size
      args - the kernel arguments
      Returns:
      an event object indicating when the kernel is finished
      See Also:
    • Run1NoEvent

      public void Run1NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Object... args)
      Sets the work sizes and arguments in one call and launches the kernel. The global work size is set to the specified size. The work group size is automatically determined by the driver. Each object in the argument array is sent to the kernel by setArg(int, java.lang.Object). The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, use Run1(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...).
      Parameters:
      queue - the command queue
      globalWorkSize - the global work size
      args - the kernel arguments
      See Also:
    • Run2

      public Event Run2(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args)
      Sets the work sizes and arguments in one call and launches the kernel.
      Parameters:
      queue - the command queue
      globalWorkSize - the global work size
      workGroupSize - the work group size
      args - the kernel arguments
      Returns:
      an event object indicating when the kernel is finished
    • Run2NoEvent

      public void Run2NoEvent(CommandQueue queue, Kernel.WorkSize globalWorkSize, Kernel.WorkSize workGroupSize, Object... args)
      Sets the work sizes and arguments in one call and launches the kernel. The generated event is directly released. Therefore, the performance is better, but there is no way to detect when the kernel execution has finished. For this purpose, use Run2(com.jme3.opencl.CommandQueue, com.jme3.opencl.Kernel.WorkSize, com.jme3.opencl.Kernel.WorkSize, java.lang.Object...).
      Parameters:
      queue - the command queue
      globalWorkSize - the global work size
      workGroupSize - the work group size
      args - the kernel arguments
    • toString

      public String toString()
      Overrides:
      toString in class Object