技术开发 频道

实例:Neo4j图形数据库搭建会议数据集

  【IT168 技术】大数据时代最重要的标志就是数据类型的多样性,各种非结构化数据逐渐成为企业数据的主流。据Gartner预测,企业数据将在五年内增加800%,其中80%是非结构化的,来自团体、社区,以及社交网络的非业务数据会成为这种趋势中的大部分。非结构化数据的爆炸式增长,使传统数据库面临巨大挑战,新型数据管理工具的作用日益凸显。

  在这些新型管理工具中,Hadoop和NoSQL是最关键的两类。其中,图形数据库(graphic database)是本文重点讨论的内容。图形数据库是NoSQL的一种,即非关系型数据库,它应用图形理论存储实体之间的关系信息。最常见的是社会网络中人与人之间的关系,这种关系网络用传统关系型数据库存储的效果并不好,其查询复杂、缓慢、超出预期,而图形数据库的独特设计恰恰弥补了这个缺陷。

  常见的图形数据库包括Neo4j、FlockDB、AllegroGraph、GraphDB和InfiniteGraph。其中Neo4j是一个用Java实现、完全兼容ACID的图形数据库。数据以一种针对图形网络进行过优化的格式保存在磁盘上。Neo4j的内核是一种极快的图形引擎,具有数据库产品期望的所有特性,如恢复、两阶段提交、符合XA等。

  2012年NoSQL Now大会于8月21-23日在美国圣何塞(San Jose)举行,会上Neo科技公司的Andreas Kollegger利用午餐会的时间向大家介绍了Neo4j数据库以及如何利用工具快速建立图形数据库的方法。

实例:Neo4j图形数据库搭建会议数据集

  该实例选用了NoSQL Now 2012的会议内容作为数据集,如图所示:

实例:Neo4j图形数据库搭建会议数据集

首先,新建一个Heroku实例,连接到Neo4j数据库。使用带有neography传感器的ruby脚本,将社区明星Max De Marzi的信息录入数据库中。通过示例数据网站,可以将graph.db目录下载到本地Neo4j服务器。代码如下:

 

require 'rubygems'
require 'neography'

def neo
  @neo ||
= Neography::Rest.new("http://localhost:7474")
end

def has_rel(node, dir, type)
  res
= neo.get_node_relationships(node, dir, type)
  return res
&& res.size > 0
end

def add_talk(slot, title, speakers,audience,tags)
  root
= neo.get_root()
  talk
= neo.create_node({:title => title})
  slot
= neo.create_unique_node(:slots, :slot, slot, { :slot => slot})
  neo.create_relationship(:at, talk, slot)
  speakers.each
do |name, from|
    speaker
= neo.create_unique_node(:speakers, :name, name, { :name => name})
    neo.create_relationship(:presents, speaker, talk)
    company
= neo.create_unique_node(:companies, :company, from, { :company => from})
    neo.create_relationship(:works_at, speaker, company) unless has_rel(speaker, :out, :works_at)
  
end
  tags.each
do |name|
    tag
= neo.create_unique_node(:tags, :tag, name, { :tag => name})
    neo.create_relationship(:tagged, talk, tag)
    neo.create_relationship(:tag, root, tag) unless has_rel(tag,:in, :tag)
  
end
  who
= neo.create_unique_node(:audience, :audience, audience, { :audience => audience})
  neo.create_relationship(:
for, talk, who)
end

neo.execute_query(
"start n=node(*) match n-[r?]-m where ID(n)<>0 delete n,r")

[:slots, :speakers, :companies, :tags, :audience].each
do |name|
  neo.create_node_index(name, :exact, :lucene)
end


add_talk(
"08:30 AM - 09:00 AM",'The Journey to Amazon DynamoDB: From Scaling by Architecture to Scaling by Commandment',
  {'Swami Sivasubramanian'=>'Amazon Web Services'}, 'Technical - Introductory', [ 'Cloud Computing',"NoSQL Architecture and Design"])
add_talk("09:00 AM - 09:45 AM", 'Then Our Buildings Shape Us: A new way to think about NoSQL technology selection',
  {'Tim Berglund'=>'GitHub'}, 'Business / Non-Technical', [ 'NoSQL Architecture and Design', "NoSQL Technology Evaluation"])
add_talk("09:45 AM - 10:00 AM",'Create Powerful New Applications with Graphs',
  {'Emil Eifrem'=>'Neo Technology'}, 'Business / Non-Technical', [ 'Graph Databases'])
add_talk("10:30 AM - 11:15 AM",'Why and When You Should Use Redis',
  {'Josiah Carlson'=>'ChowNow Inc.'}, 'Technical - Introductory', [ 'NoSQL Technology Evaluation'])
...
add_talk(
"10:30 AM - 11:15 AM",'Intro to Graph Databases 101',
  {'Andreas Kollegger'=>'Neo Technology'}, 'Technical - Introductory', [ 'Graph Databases'])
...
add_talk(
"01:15 PM - 02:00 PM",'Lunch N Learn with Neo Technology and Neo4j',
  {'Andreas Kollegger'=>'Neo Technology'}, 'Technical - Introductory', [ 'Graph Databases'])
add_talk("02:15 PM - 03:00 PM", 'Using Graph Databases to Analyze Relationships, Risks and Business Opportunities - A Case Study',
  {'Jans Aasman'=>'Franz Inc'}, 'Technical - Introductory', [ 'Graph Databases'])
add_talk("04:15 PM - 04:45 PM", 'High performance graph database using cache, cloud, and standards',
  {'Bryan Thompson'=>'SYSTAP, LLC'}, 'Technical - Advanced', [ 'Graph Databases'])
....
add_talk(
"04:15 PM - 04:45 PM", 'Introducing Hadoop and Big Data into a Healthcare Organization: A True Story and Learned Lessons',
  {'Vladimir Bacvanski'=>'SciSpike'}, 'Technical - Intermediate', [ 'Big Data'])
add_talk("04:15 PM - 04:45 PM", 'NoSQL Data Modelling for Scalable eCommerce',
  {'Dipali Trivedi'=>'Staples.com'}, 'Technical - Intermediate', [ 'NoSQL Architecture and Design'])


add_talk(
"05:30 PM - 06:30 PM",'The NoSQL "C Panel"', {"Robert Scoble"=>"RackSpace",
                                                      "Bob Wiederhold"=>"Couchbase",
                                                      
"Dwight Merriman"=>"10gen",
                                                      
"Emil Eifrem"=>"Neo Technology",
                                                      
"Jay Jarrell"=>"Objectivity, Inc.",
                                                      
"Kirk Dunn"=>"Cloudera, Inc."},
                                                      
"Business / Non-Technical",
                                                      [
"Graph Databases", "Hadoop", "MongoDB"])
 

  Andreas的演讲幻灯片名为《NoSQL Now Zero To Hero》,介绍了图形数据库Neo4j和Cypher。

${PageNumber}

  为了激发读者的创造力,NEO还准备了更多基于这些数据集的高级查询。通过Neo4j的web云平台,即可进行查询并浏览数据。下图为Neo4j web云平台的截图:

实例:Neo4j图形数据库搭建会议数据集

Index lookup:

    start abk
=node:speakers(name="Andreas Kollegger")
    return abk;

return properties
& id:

    start abk
=node:speakers(name="Andreas Kollegger")
    return abk.name, id(abk);

follow relationships:

    start abk
=node:speakers(name="Andreas Kollegger")
    match abk
-[:presents]->talk
    return talk.title;

    start abk
=node:speakers(name="Andreas Kollegger")
    match abk
-[:presents]->talk-[:at]->slot
    return talk.title,slot.slot;

which other talks are during those slots:

    start abk
=node:speakers(name="Andreas Kollegger")
    match abk
-[:presents]->talk-[:at]->slot<-[:at]-other
    return talk.title,slot.slot, other.title;

group them into a collection,
and count them

    start abk
=node:speakers(name="Andreas Kollegger")
    match abk
-[:presents]->talk-[:at]->slot<-[:at]-other
    return talk.title,slot.slot, collect(other.title)
as others, count(*) as cnt;

only see those where there
is more than one competing slot

    start abk
=node:speakers(name="Andreas Kollegger")
    match abk
-[:presents]->talk-[:at]->slot<-[:at]-other
    
with talk, count(*) as cnt
    where cnt
>1
    return talk.title,cnt;

slots are connected
with a next relationship, show all slots

    start n
=node(2)
    match p
=n-[:next*0..]->current
    return current.slot;
                                                                        
show the talks at the slot

    start n
=node(2)
    match p
=n-[:next*0..]->current<-[:at]-talk
    return current.slot, talk.title;

all talks
with the tag Graph Databases

    start tag
=node:tags(tag="Graph Databases")
    match tag
<-[:tagged]-talk
    return talk;

which companies talk about graph databases

    start tag
=node:tags(tag="Graph Databases")
    match tag
<-[:tagged]-talk<-[:presents]-speaker-[:works_at]->company
    return talk,speaker,company;

which companies speak about graph databases (
with a surprise)

    start tag
=node:tags(tag="Graph Databases")
    match tag
<-[:tagged]-talk<-[:presents]-speaker-[:works_at]->company
    return distinct company.company;
1
相关文章