URL, URN, URI, IRI

Posted on 2012-01-14 by baojie

“网址”到底是什么？一般的理解是URL（Uniform resource locator）

在RDF/OWL1/OWL2中却使用了不同的概念

RDF和OWL 1使用了URI （Uniform resource identifier，也就是最初的语义网层次蛋糕的第一层）
OWL 2使用了IRI（Internationalized Resource Identifier）

还有一个相关概念 URN（Uniform Resource Name）。他们有什么区别？

简述如下：

URL是这样的形式：

scheme://domain:port/path?query_string#fragment_id

如本页的编辑页面是 https://blog.baojie.org:80/wp-admin/post-new.php?post_type=post#

URI是URL的扩展，形式是：

(adsbygoogle = window.adsbygoogle || []).push({});

<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

例：
foo://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose
Wikipedia上列有官方和非官方的URI scheme，如about, ed2k, doi, skype，都是。

从挖地雷到分布式文件存储(删去编码)

Posted on 2011-05-28 by baojie

erasure channel一般翻译为删去信道或消去信道，Erasure code一般翻译为删去编码或者存疑编码。

（1）删去信道

删去信道是会以一定概率丢失比特或者信包packet的信道。例如二进删去信道Binary erasure channel。输入信号符号是{0,1}，输出符号是{0,1,e}，e代表信号丢失。对下图BEC，信道容量是1-p_e。

信道容量的证明：以1-p的概率，传一次就成功。如不成功，再传一次成功，概率为p(1-p)，也就是用2次。这样，要以极小出错概率传输，需要传

笔记：概率时空逻辑

Posted on 2011-05-18 by baojie

参前文

【会议版】Austin Parker, Guillaume Infantes, V. S. Subrahmanian, John Grant: An AGM-Based Belief Revision Mechanism for Probabilistic Spatio-Temporal Logics. AAAI 2008: 511-516 [bibtex]

【期刊版】John Grant, Francesco Parisi, Austin Parker, V. S. Subrahmanian: An AGM-style belief revision mechanism for probabilistic spatio-temporal logics. Artif. Intell. 174(1): 72-104 (2010)[bibtex]<

本文可以看成概率域态逻辑（probabilistic context logic, PCL）的一种特例。

语义网与推荐（2）基本思路

Posted on 2011-05-17 by baojie

续《语义网与推荐（1）乱拳打死老师傅》

看了几篇和语义技术在推荐系统中的应用。总的感觉是不靠谱。我的感觉是工业界对这些想法现在还没有多少兴趣。假如前提是有linked data和ontology，那这个代价非常高，而且数据的质量很成问题，不是说DBPedia的数据拿来就能用的。就算能用，速度也是大问题，SPARQL的速度还不能做到实时响应。

Recommender大体分为

content-based
collaborative or social
knowledge-based
trust-based

从机器学习的角度，推荐是一个分类或者聚类的过程；从信息检索的角度，推荐是一个发现对象和对象之间相关度的过程。逻辑推理在这里面基本没有什么作用。如果有，也大概就是分类树。

笔记：描述逻辑的云计算(4)Aslani方法

Posted on 2011-05-15 by baojie

Mina Aslani, Volker Haarslev: Towards Parallel Classifcation of TBoxes. Description Logics 2008 [bibtex] 【sound but not complete，略过】

Mina Aslani, Volker Haarslev: TBox Classification in Parallel: Design and First Evaluation. Description Logics 2010 [bibtex] 【ECAI文章的workshop版，也可略过】

Mina Aslani, Volker Haarslev: Parallel TBox Classification in Description Logics – First Experimental Results. ECAI 2010: 485-490 [bibtex]

这篇文章是reasoner level并行，而不是proof level并行。在对一个TBox做分类（classification）时，如果有n个概念，就有n(n+1)/2个子类关系测试。本文分析如何将这些测试分给多个独立的线程（thread）。在内存使用上，基于global tree（全局树）。

笔记：描述逻辑的云计算(3)Liebig 2007

Posted on 2011-05-15 by baojie

Thorsten Liebig, Felix Müller: Parallelizing Tableaux-Based Description Logic Reasoning. OTM Workshops (2) 2007: 1135-1144 [bibtex]

This paper describes our approach for concurrent computation of the nondeterministic choices inherent to the standard tableau procedure.

Thorsten Liebig, Andreas Steigmiller and Olaf Noppens. Scalability via Parallelization of OWL Reasoning In Workshop on NeFoRS: New Forms of Reasoning for the Semantic Web: Scalable & Dynamic 2010

So far the design is tailored to a SMP (symmetric multi processor) architecture, where all processing cores have access to one main memory.

笔记：描述逻辑的云计算(1)背景

Posted on 2011-05-14 by baojie

Description Logic in the Cloud 这是很扯蛋的说法

或者说描述逻辑的并行计算（Parallel Computing with Description Logic），主要是指查询和推理两种任务。

对于RDFS或者OWL-RL的某个子集，利用MapReduce或者其他基于集群的（cluster-based）的计算，工作不少。不过一般都是基于规则(rule-based)的推理，不保证推理的完备性(completeness)。很多只支持非常有限的推理，比如BBN的SHARD工作。

模块化本体（modular ontology）语言，如Distributed Description Logics, E-Connections and Package-based Description Logics，基于非经典局域语义(Local Model Semantics)，可做分布式推理。但是局域语义的复杂性，使它们不适合现在的工程应用。

笔记：域态逻辑的语义（3）QLC 1996

Posted on 2011-05-09 by baojie

Buvac, S. (1996). Quantificational logic of context. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI).

概述：本文扩展PLC（Propositional Logic of Context）到一阶逻辑。Context本身也可以作为一阶逻辑的对象。

Context语义上是一组真值赋值。每个QLC的model，把每个context上对应一组一阶结构（first-order structures）。也就是普通一阶模型会属于一个或多个context。

ist(c, f) is true, iff f is true in all first-order structures of c. 这都和PLC一样。

笔记：域态逻辑的语义（1）PLC 1993

Posted on 2011-05-08 by baojie

参前：域态逻辑的模型论，为什么要用模型论语义

McCarthy的初始文章：

A LOGICAL AI APPROACH TO CONTEXT [最简单介绍]
Notes on Formalizing Context [IJCAI93简介]
Formalizing Context (Expanded Notes) [97比较详细的发展]

以上文章都没有正式的语义。

Buvac和Mason的形式化:

Propositional Logic of Context [AAAI 1993]

语法: 命题逻辑加上ist(c, f) – c是context，f是公式

ist ([c1,c2],f) := ist(c1, ist(c2, f))

另外，有些命题proposition可能只在某些context中有意义。所以，对每个context，有一个相关的词汇vocabulary。只有这些词汇才会被做语义解释。

语义：一个模型model是一个从context序列到真值赋值集合（partial truth assignment）的映射。对命题逻辑，真值赋值就是一个命题的集合（set of propositions）. 例

笔记：加注(Annotated) RDF (7) Guha 2004

Posted on 2011-05-08 by baojie

Ramanathan V. Guha, Rob McCool, Richard Fikes: Contexts for the Semantic Web. International Semantic Web Conference 2004: 32-46

Guha是Context建模的先驱。这个文章，无非是McCarthy和Guha的博士论文工作在语义网上的自然应用。我感兴趣的，是第六节Model Theory。

在一般的RDF语义中， IS是一个从URI到IR v IP 的映射（我把它称为“释名函数”）。Guha的语义里，IS是一个从URIxURI到IR v IP 的映射，第一个URI是资源（class, property等），第二个URI是context. 最关键的语义条件是这一条：

笔记：加注(Annotated) RDF (6) Sahoo 2010

Posted on 2011-05-08 by baojie

S.S. Sahoo, O. Bodenreider, P. Hitzler, A. Sheth, K., Thirunarayan, “Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.”,in the 22nd International Conference on Scientific and Statistical Database Management (SSDBM) 2010

讲RDF的Context（或者annotation）建模的文章虽多（几十篇总是有的），涉及语义的很少。这一篇，定了“源谱域实体”Provenance Context Entity (PaCE) [注：provenance=源谱, context=域]。

语法：Provenance当然是很重要的一种context。本文中，provenance用provenir来建模，本身就是一个RDF/OWL文档。[Provenir是Sahoo自己提出来的。关于不同的源谱模型的比较，参Li Ding和我的这篇文章 (IPAW2010)]

笔记：加注(Annotated) RDF (5) APT Logic

Posted on 2011-05-08 by baojie

Paulo Shakarian, Austin Parker, Gerardo I. Simari, V. S. Subrahmanian: Annotated probabilistic temporal logic. ACM Trans. Comput. Log. 12(2): 14 (2011)

参从概率时态逻辑到概率域态逻辑（Probabilistic Context Logic）。今天重读此文。另参前篇笔记：加注(Annotated) RDF (3) Dekhtyar 2001及续。

和前述各篇不同的是，APT Logic中的概率不简单视为标注（annotation），而是有possible world semantics。一个线程是一个状态变化的流（时间到模型model的映射），所有可能的线程构成概率空间，概率分布定义在线程上。文章的核心贡献是频率函数frequency function，在线程内部可以做概率推理。

笔记：加注(Annotated) RDF (4) Dekhtyar 2001续

Posted on 2011-05-07 by baojie

Alex Dekhtyar, Robert B. Ross, V. S. Subrahmanian: Probabilistic temporal databases, I: algebra. ACM Trans. Database Syst. 26(1): 41-95 (2001)

【续笔记：加注(Annotated) RDF (3) Dekhtyar 2001】

概率分布函数：在时间域（在本文中是calendar）上，每个时间点的概率值。注意，本文只讨论离散的分布函数。

常见的分布函数：

均匀（uniform）分布，例如“下雨”这件事，按星期一到星期日算，差不多是均匀分布。
几何（geometric）分布，p_i = p (1-p)ⁱ
二项（bionominal）分布 p_i = C(n,i) pⁱ (1-p)^n-i
几何（geometric）分布 p_i = e^-λ λⁱ / i!

如果已知e1和e2的概率，那e1∧e2的概率是多少？有conjunctive和disjunctive两种策略。具体看section 2.4 and 2.5

笔记：加注(Annotated) RDF (3) Dekhtyar 2001

Posted on 2011-05-07 by baojie

Alex Dekhtyar, Robert B. Ross, V. S. Subrahmanian: Probabilistic temporal databases, I: algebra. ACM Trans. Database Syst. 26(1): 41-95 (2001)

正文44页，共67页。

背景：这个似乎和Annotated RDF无关。不过，temporal information是一种最常见的annotation，而在tuple上的工作，自然也可以用在triple上。Alexander Dekhtyar是Subrahmanian 2000年毕业的PhD。后面的APT (Annotated Probabilistic Temporal) Logic [Shakarian 2011]看似是这个工作的扩展

基本建模对象：Data tuple d is in relation R at some point of time in the interval [ti, tj ] with probability between p1 and p2. 例：

笔记：加注(Annotated) RDF (2) Udrea 2010

Posted on 2011-05-07 by baojie

Octavian Udrea, Diego Reforgiato Recupero, V. S. Subrahmanian: Annotated RDF. ACM Trans. Comput. Log. 11(2): (2010) 【本文的会议版在ESWC2006】

这个文章里，annotation也是一种偏序结构。也提到有不确定性uncertainty（模糊fuzzy？）, 时态temporal and 源谱provenance等几种应用。

aRDF = annotated RDF

历史：Annotated logic [Kifer ans Subramanian 1992] （要求标注是一个半格semilattice）[Leach and Lu 1996] [Fitting 1991]

语法：每个property属于某个偏序关系。一个aRDF triple是(s, p:a, o)。例如(Jie memberOf:[2001,2008], IowaStateUniversity)。画成图，就是每条边上多出来个annotation：例如（文中Fig 1(a)）

笔记：加注(Annotated) RDF (1) Straccia 2010

Posted on 2011-05-07 by baojie

Umberto Straccia, Nuno Lopes, Gergely Lukacsy and Axel Polleres. A General Framework for Representing and Reasoning with Annotated Semantic Web Data. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10).
Abstract . Bibitem . Paper . Slides

读这系列的文章，是为了扩展做Contextual RDF。

RDF的Annotation: (s, p, o) : t – t可以是时间（time），置信度（trust），源谱（provenance）

形形色色的Annotated RDF, contextual RDF, temporal RDF，无非是要描述

笔记：AURA

Posted on 2011-05-01 by baojie

David Gunning, Vinay K. Chaudhri et al. Project Halo Update – Progress Toward Digital Aristotle. 33-58. AI Magazine, Volume 31, Number 3, Fall 2010

【下载】http://www.cs.utexas.edu/users/pclark/papers/AURA-AIMag.pdf
【项目主页】http://www.ai.sri.com/project/aura
【参考】语义网的公司（5）Vulcan: Project Halo，笔记: Inquire for iPad （平板上的教科书）

为了和沃森系统比较，又看了SRI的AURA系统。他们主页上有很多文章，其实看上面这一篇就够了。没时间，做一个很简略的笔记

笔记：DeepQA (IBM沃森)（4）结果与心得

Posted on 2011-05-01 by baojie

David A. Ferrucci et al,Building Watson: An Overview of the DeepQA Project. AI Magazine 31(3): 59-79 (2010).

==结果==

偷论文里的一张图

==工作方式==

所有人在一个大屋war room，以方便交流。逐步进步。【唔，how about 中国的团队？】

==作者心得==

Q/A三要素：precision, conﬁdence, and speed

systems-level approach：综合运用多种方法。这可能是对一般AI问题都有意义的。

快速实验，高性能计算平台的重要性。

==我的其他心得==

笔记：DeepQA (IBM沃森)（3）答案之生成

Posted on 2011-05-01 by baojie

David A. Ferrucci et al,Building Watson: An Overview of the DeepQA Project. AI Magazine 31(3): 59-79 (2010).

==假设生成==

Hypothesis generation takes the results of question analysis and produces candidate answers by searching the system’s sources and extracting answer-sized snippets from the search results

这一步是要收集尽可能多的假设： the system generates the correct answer as a candidate answer for 85 percent of the questions somewhere within the top 250 ranked candidates

笔记：DeepQA (IBM沃森)（2）问题之分析

Posted on 2011-05-01 by baojie

David A. Ferrucci et al,Building Watson: An Overview of the DeepQA Project. AI Magazine 31(3): 59-79 (2010).

==DeepQA方式==

一句话总结：a massively parallel probabilistic evidence-based architecture

多种方法的集成：we use more than 100 different techniques for analyzing natural language, identifying sources, ﬁnding and generating hypotheses, ﬁnding and scoring evidence, and merging and ranking hypotheses.

四项基本原则：massive parallelism, many experts, pervasive conﬁdence estimation, and integration of shallow and deep knowledge.

Page 1 of 212

语义噪声

by Jie Bao, Big Knowledge Scientist

Category Archives: 笔记

URL, URN, URI, IRI

从挖地雷到分布式文件存储(删去编码)

笔记：概率时空逻辑

语义网与推荐（2）基本思路

笔记：描述逻辑的云计算(4)Aslani方法

笔记：描述逻辑的云计算(3)Liebig 2007

笔记：描述逻辑的云计算(1)背景

笔记：域态逻辑的语义（3）QLC 1996

笔记：域态逻辑的语义（1）PLC 1993

笔记：加注(Annotated) RDF (7) Guha 2004

笔记：加注(Annotated) RDF (6) Sahoo 2010

笔记：加注(Annotated) RDF (5) APT Logic

笔记：加注(Annotated) RDF (4) Dekhtyar 2001续

笔记：加注(Annotated) RDF (3) Dekhtyar 2001

笔记：加注(Annotated) RDF (2) Udrea 2010

笔记：加注(Annotated) RDF (1) Straccia 2010

笔记：AURA

笔记：DeepQA (IBM沃森)（4）结果与心得

笔记：DeepQA (IBM沃森)（3）答案之生成

笔记：DeepQA (IBM沃森)（2）问题之分析