Skip to main content

Google has reveal its Big Data secrets

Google’s search engine makes it wonderfully easy to locate stuff on the web, whether it’s in a news article, a corporate website, or a video on YouTube. But that only begins to describe Google’s ability to find information. Inside the company, engineers use several uniquely powerful tools for searching and analyzing its own massive trove of data.
One of those is Dremel, a tool that helps Google’s employees analyze data stored across thousands of machines, at unusually fast speeds. What’s more, Dremel lets the Google team to manipulate all of this data using a language very similar to SQL, short for Structured Query Language, the standard way of grabbing information from databases.
Like most of its custom-built tools, Dremel is only available inside Google. But now, the rest of the world can hack data a little more like Google does, thanks to Quest, a Dremel-like query engine created by Theo Vassilakis, one of the lead developers of Dremel at Google, and Toli Lerios, a former engineer at Facebook. The tool is one of a growing number of that seek to mimic the way web giants like Google and Facebook rapidly analyze enormous amounts of online information stored across hundreds or even thousands of machines. This includes everything from a projectcalled Drill, from a company called MapR, to a sweeping open source platformcalled Spark.
Vassilakis and Lerios cooked up the idea for Quest in 2012. “We were looking inside of Google and Facebook at how hard it is to get data and combine data and produce useful results,” Vassilakis says. “And we thought about what’s going on at all these companies without 15,000 engineers.” So they quit their jobs and started their own company, Metanautix, and set about building Quest. Today, after two years of development, the product is now available to any company that would like to use it.
The idea behind Quest is to make it simple for analysts to query data from anywhere in a company with a single tool, regardless of where that data is stored, without the need to learn new programming languages. Using Quest, analysts can query traditional sources such as Oracle’s flagship database, “big data” storage systems like Hadoop, log files, Word documents, images and media files, and more. But it isn’t just a search engine.
Just like Dremel, Quest lets you query data using a SQL-like language. “Our view is that if you can show people the traditional metaphors that they’re used to, such as tables and SQL queries, that’s the easiest way for them to get started,” he says. “We’re trying to support all the traditional metaphors without teaching people new things.”
Quest isn’t a database. It doesn’t store data. And although Quest can be used to move data around from system to system, it can also analyze data without moving it, making copies of the data and shuttling these copies through its own memory system. To accomplish all of this, Metanautix built connectors for several major storage systems, including Oracle, Hadoop and Amazon S3. And thanks to its use of the Java Virtual Machine, it can interface with just about any data source you can think of.
You could use it to correlate data from purchase orders stored a data warehousing system in your own data center with product photos stored in the cloud, for example, or analyze web analytics data stored in Hadoop with customer profiles stored in an Oracle database, and throw in some information laying around in Word documents on the company shared drive for good measure.
It can also keep track of the changes you make to your data. That’s a big part of what sets Quest apart from many other big data tools, says Mark Madsen, founder of the analyst firm Third Nature. Companies in regulated industries—from health care to finance to pharmaceuticals—need to be able to provide an audit trail to prove their compliance with the law. That’s not something that many new age data analytics tools account for, Madsen says.
There are a few other Dremel clones out there already, such as Cloudera’s Impalaand MapR’s Drill. But these other projects are more concerned with collecting data, says Madsen, while Quest is focused on manipulating data. “Data in its raw form isn’t that useful,” he says. “You have to do things to it. You have to shape, and discard the stuff you don’t need.”

Comments

Popular posts from this blog

StandAlone Apache Storm Installation in Ubuntu 14.04 LTS

Deploying Apache Storm  1) If Java 7 is not present, please install it. 2) Zookeeper Installation Download zookeeper-3.4.6 from Apache site       $ tar -xvf zookeeper-3.4.6.tar.gz       $ cd zookeeper-3.4.6/       $ cp conf/zoo_sample.cfg conf/zoo.cfg       $ bin/zkServer.sh start 3) Storm Cluster Installation Download the storm tarball from official Apache Mirror.  Untar it.       $ tar xzvf apache-storm-0.9.2-incubating.tar.gz Move to new directory.       $ sudo cp -R apache-storm-0.9.2-incubating /usr/lib/ Go to /usr/lib/apache-storm-0.9.2-incubating/ and configure storm.yaml present in conf folder, add the followuing line, this folder must have write permissions too. storm.zookeeper.servers:     - "localhost" storm.zookeeper.port: 2181 nimbus.host: "localhost" storm.local.dir: "/var/stormtmp"    ...

Multi Node Apache Kafka 0.9 in Linux

Manually Deploying Apache Kafka Please Install Java version 8 before proceeding the Kafka installation. 1) Edit the  /etc/hostname in each  node.         $  vi /etc/hostname                    kafka1 or kafka2 or kafka3 2) Check hostname.         $ hostname 3)  Confirm everything went right.      $ sudo hostname -F /etc/hostname 4) Now edit /etc/hosts to point one node to other nodes and do the same in each node.      $ nano /etc/hosts          127.0.0.1       localhost 127.0.1.1       ubuntu <IP of node1 >   kafka1 <IP of node2>    kafka2 <IP of node3>    kafka3 # The following lines are desirable for IPv6 capable hosts ::1     localhost ip6-localhost ip6-loopba...

Google2Ubuntu speech recognition tool for linux Ubuntu 13.10

Google2Ubuntu is a tool that lets you control your computer using voice commands via the Google speech recognition API. Install Google2Ubuntu Google2Ubuntu is available in a PPA for all supported Ubuntu versions. Add the PPA and install it using the following commands: sudo add-apt-repository ppa:benoitfra/google2ubuntu sudo apt-get update sudo apt-get install google2ubuntu I tested the application under Ubuntu 13.10 and 14.04 so I'm not sure if it works properly with older Ubuntu versions. How to configure and use Google2Ubuntu 1. Once installed, you need to set up a keyboard shortcut for triggering Google2Ubuntu. When you use this keyboard shortcut, the Google2Ubuntu speech recognition will be activated, listening for your command (a sound and a notification will be displayed, telling you when to speak). Let's add the keyboard shortcut: - in Unity/GNOME , open System Settings > Keyboard > Shortcuts , then click on Custom Shortcuts on the ...