![]() |
|
||||||||||||||
| | 首页 | 新闻 | 文库 | 方案 | 技术 | 独家 | 座谈 | 下载 | 图库 | 开发板 | 仿真器 | 邮购 | VIP会员 | 芯片代购 | 客户评价 | | ||
|
||
|
||||||||||||||||||||
| uclinux-2008R1-RC8(bf561)到VDSP5的移植(38):cache与spinlock | ||||||||||||||||||||
作者:快乐虾 文章来源:http://blog.csdn.net/lights_joy 点击数: 更新时间:2008-5-22 ![]() |
||||||||||||||||||||
|
在以前的实现中,直接用adi_acquire_lock来实现spinlock,在没有启用cache的时候,没有任何问题,但是在启用了icache之后,会出现死循环。
vdsp5中adi_acquire_lock的实现在ccblkfn.h文件中,如下所示:
#pragma inline
#pragma always_inline
static void adi_acquire_lock(testset_t *_t) {
int tVal;
csync();
#ifdef __WORKAROUND_L2_TESTSET_STALL
tVal = __builtin_testset_05000248((char *) _t);
#else
tVal = __builtin_testset((char *) _t);
#endif
while (tVal == 0) {
csync();
#ifdef __WORKAROUND_L2_TESTSET_STALL
tVal = __builtin_testset_05000248((char *) _t);
#else
tVal = __builtin_testset((char *) _t);
#endif
}
}
在VDSP5文档中查一下__WORKAROUND_L2_TESTSET_STALL,它是这样描述的:
Enables workaround for the anomaly 05-00-0248: “TestSet operation causes stall of the other core.” The avoidance is enforced by the compiler automatically issuing a write to an L2-defined variable immediately following a TESTSET instruction. This is done as part of the code generated for a __builtin_testset() call.
The compiler defines the macro __WORKAROUND_L2_TESTSET_STALL at the compile, assembly, and link build stages when this workaround is enabled.
再查一下VisualDSP++ Silicon Anomaly Support,对anomaly 05-00-0248有更进一步的描述:
再进一步看一下__build_in_testset()的汇编代码:
P1 = [FP + -40];
P0.L = 0x2C;
P0.H = 0xfeb0;
TESTSET (P1);
[P0] = P0;
当没有启用cache的时候,这段代码没有任何问题,但是在启用cache的时候,当PC指针指向TESTSET (P1),在执行之前,[P1]指向的内容已经变成了0x80(流水线的原因),再执行TESTSET的时候,当然返回值为0,所以造成了adi_acquire_lock之后的死循环!
再仔细看一遍TESTSET的文档,有这样的语句:
The software designer is responsible for executing atomic operations in the proper cacheable / non-cacheable memory space. Typically, these operations should execute in non-cacheable, off-core memory. In a chip implementation that requires tight temporal coupling between processors or processes, the design should implement a dedicated, non-cacheable block of memory that meets the data latency requirements of the system.
原来居然都没留意。
知道原因之后,最先想到的就是关闭cache,但是这样必然造成系统效率的严重降低。另一种办法是在TESTSET指令之前中断流水线,不让它运行,试写了下面一段代码:
static inline void uclinux_acquire_lock(testset_t *_t) {
int tVal;
csync();
asm("\
p0 = %0; \
csync; \
testset (p0); \
if cc jump .out; \
.loop: \
csync; \
testset (p0); \
if !cc jump .loop; \
.out: \
nop;\
" ::"d"(_t)
);
}
因为在uclinux内核中实际并没有使用L2 CACHE,所以在这里并没有像文档所说的插入在L2CACHE中读写的语句,仅仅在testset指令之前插入一条csync指令。看看csync的作用:
Use CSYNC to enforce a strict execution sequence on loads and stores or to conclude all transitional core states before reconfiguring the core modes. For example, issue CSYNC before configuring memory-mapped registers (MMRs). CSYNC should also be issued after stores to MMRs to make sure the data reaches the MMR before the next instruction is fetched.
Typically, the Blackfin processor executes all load instructions strictly in the order that they are issued and all store instructions in the order that they are issued. However, for performance reasons, the architecture relaxes ordering between load and store operations. It usually allows load operations to access memory out of order with respect to store operations. Further, it usually allows loads to access memory speculatively. The core may later cancel or restart speculative loads. By using the Core Synchronize or System Synchronize instructions and managing interrupts appropriately, you can restrict out-of-order and speculative behavior.
嘿嘿,试验后搞定!
|
||||||||||||||||||||
| 文章录入:admin 责任编辑:admin | ||||||||||||||||||||
| 【发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口】 | ||||||||||||||||||||
| 最新热点 | 最新推荐 | 相关文章 | ||
| 前置放大器在移动医疗服务系 便携式多通道大容量生理信号 防腐监测仪的设计与应用 基于AD1674的酶标仪的设计 基于C/S模式的JRTPLIB库的测 ffmpeg与jrtplib相结合应用 blackfin模拟摄像头驱动中的 可编程逻辑在数字信号处理系 发现VDSP4.5一个BUG:单步调 VDSP5.0双核工程下sml3中的变 |
| 网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!) |
| | 本站介绍 | 合作联络 | 欢迎投稿 | 广告业务 | 网站地图 | 设为首页 | 加入收藏 | 友情链接 | 网站公告 | 联系我们 | | |||
|