Discussion:
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
(too old to reply)
Yaxun Liu via Phabricator via cfe-commits
2017-08-04 16:51:49 UTC
Permalink
yaxunl created this revision.
Herald added a subscriber: tpr.

Currently Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL read_pipe builtin,
with appended type size and alignment arguments, where 2 or 4 indicates the original
number of arguments.

For certain targets (e.g. amdgpu), there are optimized version of __read_pipe_2/__read_pipe_4
when the type size and alignment has the same power of 2 value. It is desired that Clang
emits a different function for these cases.

This patch let Clang emits __read_pipe_2_N for such cases where N is the size in bytes of
the type. (N = 1,2,4,8,...,128), so that the target runtime can use an optimized version of
read_pipe.

The same with __read_pipe_4, __write_pipe_2 and __wirte_pipe_4.

This optimization is controlled by TargetCodeGenInfo::hasOptimizedPipeBuiltin, which returns
false by default. Each target can override this function to turn on this optimization.


https://reviews.llvm.org/D36327

Files:
lib/CodeGen/CGBuiltin.cpp
lib/CodeGen/TargetInfo.cpp
lib/CodeGen/TargetInfo.h
test/CodeGenOpenCL/pipe_builtin.cl
Alexey Bader via Phabricator via cfe-commits
2017-08-07 10:27:19 UTC
Permalink
bader added a comment.

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey


https://reviews.llvm.org/D36327
Yaxun Liu via Phabricator via cfe-commits
2017-08-07 15:36:52 UTC
Permalink
yaxunl added a comment.
Post by Alexey Bader via Phabricator via cfe-commits
Hi Sam,
What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.
Thanks,
Alexey
Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.


https://reviews.llvm.org/D36327
Anastasia Stulova via Phabricator via cfe-commits
2017-08-07 17:00:43 UTC
Permalink
Anastasia added a comment.
Post by Yaxun Liu via Phabricator via cfe-commits
Post by Alexey Bader via Phabricator via cfe-commits
Hi Sam,
What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.
Thanks,
Alexey
Hi Alexey,
The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.
My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?


https://reviews.llvm.org/D36327
Brian Sumner via Phabricator via cfe-commits
2017-08-08 13:14:46 UTC
Permalink
b-sumner added a comment.
Post by Anastasia Stulova via Phabricator via cfe-commits
Post by Yaxun Liu via Phabricator via cfe-commits
Post by Alexey Bader via Phabricator via cfe-commits
Hi Sam,
What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.
Thanks,
Alexey
Hi Alexey,
The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.
My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?
It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been.

We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress.


https://reviews.llvm.org/D36327
Anastasia Stulova via Phabricator via cfe-commits
2017-08-08 18:03:49 UTC
Permalink
Anastasia added a comment.
Post by Brian Sumner via Phabricator via cfe-commits
Post by Anastasia Stulova via Phabricator via cfe-commits
Post by Yaxun Liu via Phabricator via cfe-commits
Post by Alexey Bader via Phabricator via cfe-commits
Hi Sam,
What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.
Thanks,
Alexey
Hi Alexey,
The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.
My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?
It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been.
We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress.
It would be nice to understand why it has not been accepted and whether we could try to argument using this case as an example. It seems like a useful feature for toolchains with the IR linking.


https://reviews.llvm.org/D36327
Yaxun Liu via Phabricator via cfe-commits
2017-08-08 18:28:27 UTC
Permalink
yaxunl added a comment.
Post by Anastasia Stulova via Phabricator via cfe-commits
Post by Brian Sumner via Phabricator via cfe-commits
Post by Anastasia Stulova via Phabricator via cfe-commits
Post by Yaxun Liu via Phabricator via cfe-commits
Post by Alexey Bader via Phabricator via cfe-commits
Hi Sam,
What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.
Thanks,
Alexey
Hi Alexey,
The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.
My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?
It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been.
We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress.
It would be nice to understand why it has not been accepted and whether we could try to argument using this case as an example. It seems like a useful feature for toolchains with the IR linking.
The original review is here:

https://reviews.llvm.org/D20682

To cite the reason why it was rejected:

"I fundamentally do not believe that the TargetMachine should be involved in fixing language semantics issues with a "pre linking" pass at the LLVM level. Why? Because there is nothing "target" about this.

There needs to be a fundamentally more principled way of handling this at the language and frontend level IMO."

However, until now, we could not find "a fundamentally more principled way of handling this at the language and frontend level".


https://reviews.llvm.org/D36327
Alexey Bader via Phabricator via cfe-commits
2017-08-08 18:46:20 UTC
Permalink
bader added a comment.

@rsmith do you have an opinion on what would be the right place for the kind of proposed optimization?
It looks like it can be implemented as target independent optimization, acting only for target with specified properties - in this case target must provide required built-in functions.


https://reviews.llvm.org/D36327
Yaxun Liu via Phabricator via cfe-commits
2017-08-11 14:11:19 UTC
Permalink
yaxunl added a comment.

John, do you have any comments? Thanks.


https://reviews.llvm.org/D36327
John McCall via Phabricator via cfe-commits
2017-08-12 02:56:23 UTC
Permalink
rjmccall added a comment.

Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?


https://reviews.llvm.org/D36327
Yaxun Liu via Phabricator via cfe-commits
2017-08-14 14:02:46 UTC
Permalink
yaxunl added a comment.
Post by John McCall via Phabricator via cfe-commits
Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?
Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls.


https://reviews.llvm.org/D36327
Alexey Bader via Phabricator via cfe-commits
2017-08-14 14:43:30 UTC
Permalink
bader added a comment.
Post by Yaxun Liu via Phabricator via cfe-commits
Post by John McCall via Phabricator via cfe-commits
Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?
Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls.
If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it works before linking and LLVM codegen (e.g. InstCombine passes run this transformation). This pass is doing something similar to what you are trying to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x).


https://reviews.llvm.org/D36327
Yaxun Liu via Phabricator via cfe-commits
2017-08-14 15:45:25 UTC
Permalink
yaxunl added a comment.
Post by Alexey Bader via Phabricator via cfe-commits
Post by Yaxun Liu via Phabricator via cfe-commits
Post by John McCall via Phabricator via cfe-commits
Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?
Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls.
If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it works before linking and LLVM codegen (e.g. InstCombine passes run this transformation). This pass is doing something similar to what you are trying to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x).
Thanks. I will take a look.


https://reviews.llvm.org/D36327
Yaxun Liu via Phabricator via cfe-commits
2017-09-08 15:33:00 UTC
Permalink
yaxunl abandoned this revision.
yaxunl added a comment.

We implemented this optimization through some target specific llvm pass.


https://reviews.llvm.org/D36327

Loading...