`
xuela_net
  • 浏览: 489408 次
文章分类
社区版块
存档分类
最新评论

Oracle的latch机制源代码解析——借postgresql猜测Oracle的latch

 
阅读更多

由于我手里并没有Oracle的源代码(而兄弟伙又未必敢冒着进去的风险把手里的源代码给我看),所以只能借用postgresql来分析一下Oracle的latch机制了(为什么是latch而不是mutex?)。

在这里可以下载到postgresql的7.4.30源代码:http://www.postgresql.org/ftp/source/v7.4.30/

为什么会借用postgresql来分析Oracle的latch机制呢?

Oracle与postgresql都采用了共享内存以及多进程模型,两者在共享内存内的数据结构同步与互斥上,面临的问题是极其接近的。

postgresql的spinlock与latch机制相仿,解决的问题类似,只是名字和代码实现有些差异。

postgresql的spinlock实现在postgresql-7.4.30\src\backend\storage\lmgr\s_lock.c

/*
 * s_lock(lock) - platform-independent portion of waiting for a spinlock.
 */
void
s_lock(volatile slock_t *lock, const char *file, int line)
{
	/*
	 * We loop tightly for awhile, then delay using select() and try
	 * again. Preferably, "awhile" should be a small multiple of the
	 * maximum time we expect a spinlock to be held.  100 iterations seems
	 * about right.  In most multi-CPU scenarios, the spinlock is probably
	 * held by a process on another CPU and will be released before we
	 * finish 100 iterations.  However, on a uniprocessor, the tight loop
	 * is just a waste of cycles, so don't iterate thousands of times.
	 *
	 * Once we do decide to block, we use randomly increasing select()
	 * delays. The first delay is 10 msec, then the delay randomly
	 * increases to about one second, after which we reset to 10 msec and
	 * start again.  The idea here is that in the presence of heavy
	 * contention we need to increase the delay, else the spinlock holder
	 * may never get to run and release the lock.  (Consider situation
	 * where spinlock holder has been nice'd down in priority by the
	 * scheduler --- it will not get scheduled until all would-be
	 * acquirers are sleeping, so if we always use a 10-msec sleep, there
	 * is a real possibility of starvation.)  But we can't just clamp the
	 * delay to an upper bound, else it would take a long time to make a
	 * reasonable number of tries.
	 *
	 * We time out and declare error after NUM_DELAYS delays (thus, exactly
	 * that many tries).  With the given settings, this will usually take
	 * 3 or so minutes.  It seems better to fix the total number of tries
	 * (and thus the probability of unintended failure) than to fix the
	 * total time spent.
	 *
	 * The select() delays are measured in centiseconds (0.01 sec) because 10
	 * msec is a common resolution limit at the OS level.
	 */
#define SPINS_PER_DELAY		100
#define NUM_DELAYS			1000
#define MIN_DELAY_CSEC		1
#define MAX_DELAY_CSEC		100

	int			spins = 0;
	int			delays = 0;
	int			cur_delay = MIN_DELAY_CSEC;
	struct timeval delay;

	while (TAS(lock))
	{
		if (++spins > SPINS_PER_DELAY)
		{
			if (++delays > NUM_DELAYS)
				s_lock_stuck(lock, file, line);

			delay.tv_sec = cur_delay / 100;
			delay.tv_usec = (cur_delay % 100) * 10000;
			(void) select(0, NULL, NULL, NULL, &delay);

#if defined(S_LOCK_TEST)
			fprintf(stdout, "*");
			fflush(stdout);
#endif

			/* increase delay by a random fraction between 1X and 2X */
			cur_delay += (int) (cur_delay *
			  (((double) random()) / ((double) MAX_RANDOM_VALUE)) + 0.5);
			/* wrap back to minimum delay when max is exceeded */
			if (cur_delay > MAX_DELAY_CSEC)
				cur_delay = MIN_DELAY_CSEC;

			spins = 0;
		}
	}
}

代码里的select函数就是IO多路复用里的select,你没有看错,就是IO多路复用里的select(最新的已经改成pg_usleep()了),在这里,select仅仅做了休眠的工作,跟IO多路复用没有任何关系。select在这里仅仅起到把当前进程丢入操作系统内核的wait queue链表里去直到delay这个结构体描述的时间之后,再把这个进程从wait queue取出来挂接到running queue里去。

但是,遍历两个链表(时间复杂度是圈N),以及队列指针修改之后,CPU跟着做的context switch是非常耗费CPU资源的工作,而这样的工作,发生的频率是毫秒(millisecond)级别的。所以,这种函数(select这种会引起context switch)尽量少用。这也是,spinlock相对于传统的排队型锁的优势。

这种spinlock的test and set、spin这种操作的本身是比较容易的。直接写个while循环,不断执行"=="这样的C语言语句就行了。不过,spinlock里有个机制是sleep操作,就是当某进程spin了很长时间之后,发现还是无法获取到资源,这个时候就会让这个进程sleep

关键点就在这里了。恩。Oracle或者postgresql只是一个应用软件,它不具备让一个进程休眠的能力,应用软件(用C或者比C更软的语言)最多只能操作内存和写逻辑,它要么在跑,要么跑完了退出,绝对不会有别的状态(其实,所谓的进程的状态是操作系统内核意淫出来的,就是task_struct结构体里的state字段的值是多少。从CPU的角度来讲,它并不care),如果想要有别的状态,就必须调用系统api,通过0x80号陷阱进入操作系统内核的代码,修改该进程(或者别的进程,只要具备足够的权限)的task_struct的state的数值(0,1,2,3,4...)。
所以,无论是Oracle或者是postgresql或者是别的什么数据库或者别的什么server软件,只要它想实现一种锁机制,而这种锁机制具备sleep的功能,它就必须要借用操作系统的api,也就是说,这种锁是操作系统api之上的衍生品,这么说起来,即便Oracle也是操作系统之上的衍生品。

当然,从理论上来讲,也可以自己写一个数据库,这个数据库不建立在操作系统之上,它具备任务、文件系统的管理功能,它是无敌的。是的。理论上来讲是这样的。不过,如果有人这样做了,那么他的脑袋一定被门夹了。
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics