Eclipse 的字符串分区共享优化机制-技术开发专区

Eclipse 的字符串分区共享优化机制

作者：starshus 编辑：阿雪 2007-05-21 11:17

对需要优化的类型 SaveManager 来说，只需要实现 IStringPoolParticipant 接口，并在被调用的时
候提交自己与子元素的需优化字符串即可。其子元素甚至都不需要实现 IStringPoolParticipant 接口，只需将提交行为一
级一级传递下去即可，如:

　　代码:

//
// org.eclipse.core.internal.resources.SaveManager
//

public class SaveManager implements ..., IStringPoolParticipant
{
　protected ElementTree lastSnap;
　public void shareStrings(StringPool pool)
　{
　　lastSnap.shareStrings(pool);
　}
}
//
// org.eclipse.core.internal.watson.ElementTree
//
public class ElementTree
{
　protected DeltaDataTree tree;
　public void shareStrings(StringPool set) {
　　tree.storeStrings(set);
　}
}
//
// org.eclipse.core.internal.dtree.DeltaDataTree
//
public class DeltaDataTree extends AbstractDataTree
{
　private AbstractDataTreeNode rootNode;
　private DeltaDataTree parent;
　public void storeStrings(StringPool set) {
　　//copy field to protect against concurrent changes
　　AbstractDataTreeNode root = rootNode;
　　DeltaDataTree dad = parent;
　　if (root != null)
　　　root.storeStrings(set);
　　if (dad != null)
　　　dad.storeStrings(set);
　}
}
//
// org.eclipse.core.internal.dtree.AbstractDataTreeNode
//
public abstract class AbstractDataTreeNode
{
　protected AbstractDataTreeNode children[];
　protected String name;
　public void storeStrings(StringPool set) {
　　name = set.add(name);
　　//copy children pointer in case of concurrent modification
　　AbstractDataTreeNode[] nodes = children;
　　if (nodes != null)
　　　for (int i = nodes.length; --i >= 0;)
　　　　nodes[i].storeStrings(set);
　}
}

所有的需优化字符串，都会通过 StringPool.add 方法提交到统一的字符串缓冲池中。而这个缓冲池的左右，与 JVM 级的字符串表略有不同，它只是在进行字符串缓冲分区优化时，起到一个阶段性的整理作用，本身并不作为字符串引用的入口存在。因此在实现上它只是简单的对 HashMap 进行包装，并粗略计算优化能带来的额外空间，以提供优化效果的度量标准。

　　代码:

//
// org.eclipse.core.runtime.StringPool
//

public final class StringPool {
　private int savings;
　private final HashMap map = new HashMap();
　public StringPool() {
　　super();
　}
　public String add(String string) {
　　if (string == null)
　　　return string;
　　Object result = map.get(string);
　　if (result != null) {
　　　if (result != string)
　　　　savings += 44 + 2 * string.length();
　　　return (String) result;
　　}
　　map.put(string, string);
　　return string;
　}
　// 获取优化能节省多少空间的大致估算值
　public int getSavedStringCount() {
　　return savings;
　}
}

不过这里的估算值在某些情况下可能并不准确，例如缓冲池中包括字符串 S1，此时提交一个与之内容相同但物理位置不同的字符串 S2，则如果 S2 被提交多次，会导致错误的高估优化效果。当然如果需要得到精确值，也可以对其进行重构，通过一个 Set 跟踪每个字符串优化的过程，获得精确优化度量，但需要损失一定效率。

　　在了解了需优化字符串的提交流程，以及字符串提交后的优化流程后，我们接着看看 Eclipse 核心是如何将这两者整合到一起的。

　　前面提到 Workspace.open 方法会调用 InternalPlatform.addStringPoolParticipant 方法，将一个字符串缓冲池分区的根节点，添加到全局性的优化任务队列中。

　　代码:

//
// org.eclipse.core.internal.runtime.InternalPlatform
//

public final class InternalPlatform {
　private StringPoolJob stringPoolJob;
　public void addStringPoolParticipant(IStringPoolParticipant participant, 
ISchedulingRule rule) {
　if (stringPoolJob == null)
　　stringPoolJob = new StringPoolJob(); // Singleton 模式
　　stringPoolJob.addStringPoolParticipant(participant, rule);
　}
}
//
// org.eclipse.core.internal.runtime.StringPoolJob
//

public class StringPoolJob extends Job
{
　private static final long INITIAL_DELAY = 10000;//five seconds
　private Map participants = Collections.synchronizedMap(new HashMap(10));
　public void addStringPoolParticipant(IStringPoolParticipant participant, 
ISchedulingRule rule) {
　participants.put(participant, rule);
　if (sleep())
　　wakeUp(INITIAL_DELAY);
　}
　public void removeStringPoolParticipant(IStringPoolParticipant participant) {
　　participants.remove(participant);
　}
}

此任务将在合适的时候，为每个注册的分区进行共享优化。

　　StringPoolJob 类型是分区任务的代码所在，其底层实现是通过 Eclipse 的任务调度机制。关于 Eclipse 的任务调度，有兴趣的朋友可以参考 Michael Valenta (IBM) 的 On the Job: The Eclipse Jobs API 一文。

　　这里需要了解的是 Job 在 Eclipse 里，被作为一个异步后台任务进行调度，在时间或资源就绪的情况下，通过调用其 Job.run 方法执行。可以说 Job 非常类似一个线程，只不过是基于条件进行调度，可通过后台线程池进行优化罢了。而这里任务被调度的条件，一方面是任务自身的调度时间因素，另一方面是通过 ISchedulingRule 接口提供的任务资源依赖关系。如果一个任务与当前正在运行的任务传统，则将被挂起直到冲突被缓解。而 ISchedulingRule 接口本身可以通过 composite 模式进行组合，描述复杂的任务依赖关系。

　　在具体完成任务的 StringPoolJob.run 方法中，将对所有字符串缓冲分区的调度条件进行合并，以便在条件允许的情况下，调用 StringPoolJob.shareStrings 方法完成实际工作。

　　代码:

//
// org.eclipse.core.internal.runtime.StringPoolJob
//

public class StringPoolJob extends Job
{
　private static final long RESCHEDULE_DELAY = 300000;//five minutes
　protected IStatus run(IProgressMonitor monitor)
　{
　　//copy current participants to handle concurrent additions and removals to map
　　Map.Entry[] entries = (Map.Entry[]) participants.entrySet().toArray(new Map.Entry[0]);
　　ISchedulingRule[] rules = new ISchedulingRule[entries.length];
　　IStringPoolParticipant[] toRun = new IStringPoolParticipant[entries.length];
　　for (int i = 0; i < toRun.length; i++) {
　　　toRun[i] = (IStringPoolParticipant) entries[i].getKey();
　　　rules[i] = (ISchedulingRule) entries[i].getValue();
　　}
　　// 将所有字符串缓冲分区的调度条件进行合并
　　final ISchedulingRule rule = MultiRule.combine(rules);
　　// 在调度条件允许的情况下调用 shareStrings 方法执行优化
　　try {
　　　Platform.getJobManager().beginRule(rule, monitor); // 阻塞直至调度条件允许
　　　shareStrings(toRun, monitor);
　　} finally {
　　　Platform.getJobManager().endRule(rule);
　　}
　　// 重新调度任务自己，以便进行下一次优化
　　long scheduleDelay = Math.max(RESCHEDULE_DELAY, lastDuration*100);
　　schedule(scheduleDelay);
　　return Status.OK_STATUS;
　}
}

StringPoolJob.shareStrings 方法只是简单的遍历所有分区，调用其根节点的 IStringPoolParticipant.shareStrings 方法，进行前面所述的优化工作，并最终返回分区的优化效果。而缓冲池本身，只是作为一个优化工具，完成后直接被放弃。

　　代码:

private int shareStrings(IStringPoolParticipant[] toRun, IProgressMonitor monitor) {
　final StringPool pool = new StringPool();
　for (int i = 0; i < toRun.length; i++) {
　　if (monitor.isCanceled()) // 操作是否被取消
　　　break;
　　final IStringPoolParticipant current = toRun[i];
　　Platform.run(new ISafeRunnable() { // 安全执行
　　　public void handleException(Throwable exception) {
　　　　//exceptions are already logged, so nothing to do
　　　}
　　　public void run() {
　　　　current.shareStrings(pool); // 进行字符串重用优化
　　　}
　　});
　}
　return pool.getSavedStringCount(); // 返回优化效果
}
}

通过上面的分析我们可以看到，Eclipse 实现的基于字符串缓冲分区的优化机制，相对于 JVM 的 String.intern() 来说:

　　1.控制的粒度更细，可以指定要对哪些对象进行优化;

　　2.优化效果可度量，可以大概估算出优化能节省的空间;

　　3.不存在性能瓶颈，不存在集中的字符串缓冲池，因此不会因为大量字符串导致性能波动;

　　4.不会长期占内存，缓冲池只在优化执行时存在，完成后中间结果被抛弃;

　　5.优化策略可选择，通过定义调度条件，可选择性执行不同的优化策略

第1页：第一页第2页：第二页

关注我们