网站公告列表

  没有公告

加入收藏
设为首页
联系本站
您现在的位置: AnalogCN安诺电子 >> 文章 >> 技术交流 >> 文章正文
  Optimizing on BlackFin 代码优化之BlackFin           ★★★ 【字体:
Optimizing on BlackFin 代码优化之BlackFin
作者:yygoing    文章来源:http://yygoing.spaces.live.com/blog/    点击数:    更新时间:2008-9-5    
1 Function Arguments Transferring
three arguments or less:
        Use R2:0 to transfer. R0 as return value
 more than three arguments.
        first three:    R2:0
        fourth:       [FP+20]          note: a,0x14 = 20; b, LINK/UNLINK
        return value:  R0
Function Prototype:
int test(int a, int b, int c)
Parameters Passed as: 
a in R0,
b in R1,
c in R2 
Return Location in R0
int test(char a, char b, char c, char d, char e) 
a in R0,
b in R1,
c in R2,
d in [FP+20],
e in [FP+24] 
Return Location in R0
Details:  plz go to the Visual DSP’s Help=>function arguments, transferring
2 Optimizing Step by Step
Step 1: Design the Structure of your Function
a. the algorithm of your function
            eg: de-quant of mpeg
         data[i] = (coeff[i] * default_intra_matrix[i] * quant2) >> 4;
 dequant_mpeg_intra_c(int16_t * data,
 const int16_t * coeff,
 const uint32_t quant,
 const uint32_t dcscalar,
 const uint16_t * mpeg_quant_matrices)
b. the use of Vector Operations:   Two calculations per instruction
 Two (int16_t * int16_t) 
=> R7.L = R7.L * R6.L,R7.H = R7.H * R6.H(IS);
c. the use of other effective Operations like pixel instructions etc.
  Step 2:   Implement the function
      <ps: at this stage, u may disregard the use of the Parallel Instructions.             However, you should consider the Parallel Instructions as much as     possible>
  Step 3:   Use the Parallel Instructions.
         An multi-issue instruction is 64 bits in length
   <that’s why use .ALIGN 8 in code segment>
  An 64 bits multi-issue instruction  = 
                             32 bits ALU/MAC instruction + 2 * 16 bits instructions
                16 bits instructions includes
  a. Ireg’s add, modify, sub
  b. Load
                                c.  Store
       details: ADSP-BF53x BF56x Blackfin Processor Programming Reference.pdf
       提高Blackfin系列DSP中代码的并行性.kdh

Step 4:   Adjust the sequence of instructions for pipeline.
 How to see whether the pipeline has a conflict:
  a. pipeline viewer
  b. build message.
     eg:  xx   requires one extra cycle.
 LSETUP(DEQNT_INTRA_START, DEQNT_INTRA_END) LC0 = P0;
DEQNT_INTRA_START:
 R7.L = R7.L * R2.H,R7.H = R7.H * R2.H(IS) || [I0++] = R4 || R5 = [I1++];
----------------------------------------------------------------------------------
 R7.L = R7.L * R6.L,R7.H = R7.H * R6.H(IS);
 R6 = R7 >>>4(V) || R4 = [I2++] || NOP;
 R5.L = R5.L * R2.H,R5.H = R5.H * R2.H(IS) || [I0++] = R6 || R7 = [I1++];
----------------------------------------------------------------------------------
 R5.L = R5.L * R4.L,R5.H = R5.H * R4.H(IS) || R6 = [I2++] || NOP;
DEQNT_INTRA_END:R4 = R5 >>>4(V);        
===================================>>>>>>>>>>>>>
 LSETUP(DEQNT_INTRA_START, DEQNT_INTRA_END) LC0 = P0;
DEQNT_INTRA_START:
 R4 = R5 >>>4(V);
 R7.L = R7.L * R6.L,R7.H = R7.H * R6.H(IS) || [I0++] = R4 || R5 = [I1++];
----------------------------------------------------------------------------------
 R6 = R7 >>>4(V)|| R4 = [I2++] || NOP;
 R5.L = R5.L * R2.H,R5.H = R5.H * R2.H(IS) || [I0++] = R6 || R7 = [I1++];
 R7.L = R7.L * R2.H,R7.H = R7.H * R2.H(IS) || R6 = [I2++]; 
 R5.L = R5.L * R4.L,R5.H = R5.H * R4.H(IS) || NOP;
DEQNT_INTRA_END:R4 = R5 >>>4(V);
3 Effective Optimizing Tricks
  Trick 1: Make use of  LINK and UNLINK in Pairs.
      Transferring less than three arguments <including three> can disregard link and unlink.
  Trick 2: Don’t Let your store/load/modify IReg instruction feel lonely.
      If it exists, that ’s because you do not use parallel instructions as much as possible.
eg:
[I0] = R6;
R0 =  0;  
=>
R0 =R0 -|- R0 || [I0] = R6;
  Trick 3: Pay Great attention to your instructions in loops.
  Solve LOOP:
     plan 1:    expand it if the loop count is not very large.
     plan 2:    use the hardware loop.  <how? To see the 4.0 PROGRAM SEQUENCER>
  Decrease your instructions in loops AS MUCH AS POSSIBLE.

  Trick 4: Combine your “if then else” as much as possible
      eg:    if (coeff[i] < 0)
               { int32_t level = -coeff[i]; 
         level = (( 2 * level + 1 ) * inter_matrix[i] * quant) >> 4;
         data[i] = (level <= 2048 ? -level : -2048);
              } else {
                 uint32_t level = coeff[i];   
                 level = (( 2 * level + 1 ) * inter_matrix[i] * quant) >> 4;
                 data[i] = (level <= 2047 ? level : 2047);
              }
   if  negative    =>   -1
   if  positive     =>   +1
   ==========?
   // R5 = level     two levels
    R5 = R5 << 2 (V,S);
   // if negative 1111 1111 1111 1111
   // if positive  0000 0000 0000 0000
    R1 = R5 >>>15(V,S)
    BITSET(R1,0);
  Trick 5:  How to saturate a integer which is not 16 bits or 32 bits
       plan A :    first shift left, then saturate ,at last shift right back.
            eg:     saturate to <-2048,2047>  2047 = 0b1111 1000 0000 0000
 so first shift left 4 bits, then saturate, at last shift right 4bits back .
     
       plan B:     use MAX( ),MIN( ),MAX( )(V),MIN( )(V).
            eg:      saturate to <-2048,2047>
                          MIN(2047,XX);  MAX(-2048,XX);
        plan C:    use subs and adds
            eg:      saturate to <-2048,2047>  
           32767_minus_2047 = 32767 – 2047;
                          XX = XX + 32767_minus_2047 (S);
           XX = XX - 32767_minus_2047 (S);
           32767_minus_2048 = 32767 – 2048;
           XX = XX - 32767_minus_2048 (S);
                          XX = XX + 32767_minus_2048 (S);                        
       Which Plan do We choose???   That depends.
  Trick 6:  The More parallel ,vector, pixel instructions, The Better.
  Trick 7:   About Align.
     Word Align:  4 bytes 32 bits
      Half Word Align: 2 bytes,16 bits
      Byte Align: 1 byte ,8 bits.
    
     [I/Preg++] = Rx;
     W[I/Preg++] = Rx.L;
     B[Preg++] =Rx  
文章录入:admin    责任编辑:admin 
  • 上一篇文章:

  • 下一篇文章:
  • 发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
    最新热点 最新推荐 相关文章
    前置放大器在移动医疗服务系
    便携式多通道大容量生理信号
    防腐监测仪的设计与应用
    基于AD1674的酶标仪的设计
    基于C/S模式的JRTPLIB库的测
    ffmpeg与jrtplib相结合应用
    blackfin模拟摄像头驱动中的
    可编程逻辑在数字信号处理系
    发现VDSP4.5一个BUG:单步调
    VDSP5.0双核工程下sml3中的变
      网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!)
    版权所有:AnalogCN安诺电子 湘ICP备06016315号