实例：Neo4j图形数据库搭建会议数据集-技术开发专区

实例：Neo4j图形数据库搭建会议数据集

作者：小野编译编辑：王玉圆 2012-09-07 00:05 IT168网站原创

　　【IT168 技术】大数据时代最重要的标志就是数据类型的多样性，各种非结构化数据逐渐成为企业数据的主流。据Gartner预测，企业数据将在五年内增加800%，其中80%是非结构化的，来自团体、社区，以及社交网络的非业务数据会成为这种趋势中的大部分。非结构化数据的爆炸式增长，使传统数据库面临巨大挑战，新型数据管理工具的作用日益凸显。

　　在这些新型管理工具中，Hadoop和NoSQL是最关键的两类。其中，图形数据库(graphic database)是本文重点讨论的内容。图形数据库是NoSQL的一种，即非关系型数据库，它应用图形理论存储实体之间的关系信息。最常见的是社会网络中人与人之间的关系，这种关系网络用传统关系型数据库存储的效果并不好，其查询复杂、缓慢、超出预期，而图形数据库的独特设计恰恰弥补了这个缺陷。

　　常见的图形数据库包括Neo4j、FlockDB、AllegroGraph、GraphDB和InfiniteGraph。其中Neo4j是一个用Java实现、完全兼容ACID的图形数据库。数据以一种针对图形网络进行过优化的格式保存在磁盘上。Neo4j的内核是一种极快的图形引擎，具有数据库产品期望的所有特性，如恢复、两阶段提交、符合XA等。

　　2012年NoSQL Now大会于8月21-23日在美国圣何塞(San Jose)举行，会上Neo科技公司的Andreas Kollegger利用午餐会的时间向大家介绍了Neo4j数据库以及如何利用工具快速建立图形数据库的方法。

实例：Neo4j图形数据库搭建会议数据集

　　该实例选用了NoSQL Now 2012的会议内容作为数据集，如图所示：

实例：Neo4j图形数据库搭建会议数据集

首先，新建一个Heroku实例，连接到Neo4j数据库。使用带有neography传感器的ruby脚本，将社区明星Max De Marzi的信息录入数据库中。通过示例数据网站，可以将graph.db目录下载到本地Neo4j服务器。代码如下：

require 'rubygems'
require 'neography'

def neo
  @neo ||= Neography::Rest.new("http://localhost:7474")
end

def has_rel(node, dir, type)
  res = neo.get_node_relationships(node, dir, type)
  return res && res.size > 0
end

def add_talk(slot, title, speakers,audience,tags)
  root = neo.get_root()
  talk = neo.create_node({:title => title})
  slot = neo.create_unique_node(:slots, :slot, slot, { :slot => slot})
  neo.create_relationship(:at, talk, slot)
  speakers.each do |name, from|
    speaker = neo.create_unique_node(:speakers, :name, name, { :name => name})
    neo.create_relationship(:presents, speaker, talk)
    company = neo.create_unique_node(:companies, :company, from, { :company => from})
    neo.create_relationship(:works_at, speaker, company) unless has_rel(speaker, :out, :works_at)
  end
  tags.each do |name|
    tag = neo.create_unique_node(:tags, :tag, name, { :tag => name})
    neo.create_relationship(:tagged, talk, tag)
    neo.create_relationship(:tag, root, tag) unless has_rel(tag,:in, :tag)
  end
  who = neo.create_unique_node(:audience, :audience, audience, { :audience => audience})
  neo.create_relationship(:for, talk, who)
end

neo.execute_query("start n=node(*) match n-[r?]-m where ID(n)<>0 delete n,r")

[:slots, :speakers, :companies, :tags, :audience].each do |name|
  neo.create_node_index(name, :exact, :lucene)
end

add_talk("08:30 AM - 09:00 AM",'The Journey to Amazon DynamoDB: From Scaling by Architecture to Scaling by Commandment',
  {'Swami Sivasubramanian'=>'Amazon Web Services'}, 'Technical - Introductory', [ 'Cloud Computing',"NoSQL Architecture and Design"])
add_talk("09:00 AM - 09:45 AM", 'Then Our Buildings Shape Us: A new way to think about NoSQL technology selection',
  {'Tim Berglund'=>'GitHub'}, 'Business / Non-Technical', [ 'NoSQL Architecture and Design', "NoSQL Technology Evaluation"])
add_talk("09:45 AM - 10:00 AM",'Create Powerful New Applications with Graphs',
  {'Emil Eifrem'=>'Neo Technology'}, 'Business / Non-Technical', [ 'Graph Databases'])
add_talk("10:30 AM - 11:15 AM",'Why and When You Should Use Redis',
  {'Josiah Carlson'=>'ChowNow Inc.'}, 'Technical - Introductory', [ 'NoSQL Technology Evaluation'])
...
add_talk("10:30 AM - 11:15 AM",'Intro to Graph Databases 101',
  {'Andreas Kollegger'=>'Neo Technology'}, 'Technical - Introductory', [ 'Graph Databases'])
...
add_talk("01:15 PM - 02:00 PM",'Lunch N Learn with Neo Technology and Neo4j',
  {'Andreas Kollegger'=>'Neo Technology'}, 'Technical - Introductory', [ 'Graph Databases'])
add_talk("02:15 PM - 03:00 PM", 'Using Graph Databases to Analyze Relationships, Risks and Business Opportunities - A Case Study',
  {'Jans Aasman'=>'Franz Inc'}, 'Technical - Introductory', [ 'Graph Databases'])
add_talk("04:15 PM - 04:45 PM", 'High performance graph database using cache, cloud, and standards',
  {'Bryan Thompson'=>'SYSTAP, LLC'}, 'Technical - Advanced', [ 'Graph Databases'])
....
add_talk("04:15 PM - 04:45 PM", 'Introducing Hadoop and Big Data into a Healthcare Organization: A True Story and Learned Lessons',
  {'Vladimir Bacvanski'=>'SciSpike'}, 'Technical - Intermediate', [ 'Big Data'])
add_talk("04:15 PM - 04:45 PM", 'NoSQL Data Modelling for Scalable eCommerce',
  {'Dipali Trivedi'=>'Staples.com'}, 'Technical - Intermediate', [ 'NoSQL Architecture and Design'])

add_talk("05:30 PM - 06:30 PM",'The NoSQL "C Panel"', {"Robert Scoble"=>"RackSpace",
                                                      "Bob Wiederhold"=>"Couchbase",
                                                      "Dwight Merriman"=>"10gen",
                                                      "Emil Eifrem"=>"Neo Technology",
                                                      "Jay Jarrell"=>"Objectivity, Inc.",
                                                      "Kirk Dunn"=>"Cloudera, Inc."},
                                                      "Business / Non-Technical",
                                                      ["Graph Databases", "Hadoop", "MongoDB"])

　　Andreas的演讲幻灯片名为《NoSQL Now Zero To Hero》，介绍了图形数据库Neo4j和Cypher。

${PageNumber}

　　为了激发读者的创造力，NEO还准备了更多基于这些数据集的高级查询。通过Neo4j的web云平台，即可进行查询并浏览数据。下图为Neo4j web云平台的截图：

实例：Neo4j图形数据库搭建会议数据集

Index lookup:

    start abk=node:speakers(name="Andreas Kollegger")
    return abk;

return properties & id:

    start abk=node:speakers(name="Andreas Kollegger")
    return abk.name, id(abk);

follow relationships:

    start abk=node:speakers(name="Andreas Kollegger")
    match abk-[:presents]->talk
    return talk.title;

    start abk=node:speakers(name="Andreas Kollegger")
    match abk-[:presents]->talk-[:at]->slot
    return talk.title,slot.slot;

which other talks are during those slots:

    start abk=node:speakers(name="Andreas Kollegger")
    match abk-[:presents]->talk-[:at]->slot<-[:at]-other
    return talk.title,slot.slot, other.title;

group them into a collection, and count them

    start abk=node:speakers(name="Andreas Kollegger")
    match abk-[:presents]->talk-[:at]->slot<-[:at]-other
    return talk.title,slot.slot, collect(other.title) as others, count(*) as cnt;

only see those where there is more than one competing slot

    start abk=node:speakers(name="Andreas Kollegger")
    match abk-[:presents]->talk-[:at]->slot<-[:at]-other
    with talk, count(*) as cnt
    where cnt>1
    return talk.title,cnt;

slots are connected with a next relationship, show all slots

    start n=node(2)
    match p=n-[:next*0..]->current
    return current.slot;

show the talks at the slot

    start n=node(2)
    match p=n-[:next*0..]->current<-[:at]-talk
    return current.slot, talk.title;

all talks with the tag Graph Databases

    start tag=node:tags(tag="Graph Databases")
    match tag<-[:tagged]-talk
    return talk;

which companies talk about graph databases

    start tag=node:tags(tag="Graph Databases")
    match tag<-[:tagged]-talk<-[:presents]-speaker-[:works_at]->company
    return talk,speaker,company;

which companies speak about graph databases (with a surprise)

    start tag=node:tags(tag="Graph Databases")
    match tag<-[:tagged]-talk<-[:presents]-speaker-[:works_at]->company
    return distinct company.company;

关注我们