Skip to content

Recent Articles

23
Nov

New Features in Hypertable 0.9.5.3

0.9.5.3 is a patch release, but it also includes a few new features that are worth mentioning.

Pagination with OFFSET and CELL_OFFSET

With this release you can specify an OFFSET or CELL_OFFSET option to a SELECT clause to skip a number of rows or cells in the query. This is often used in combination with LIMIT and CELL_LIMIT to implement pagination, i.e. when you only display the first 20 results of a query in a web page and then let users navigate to the next or previous pages.

    SELECT * FROM table OFFSET 40 LIMIT 20;

The OFFSET and CELL_OFFSET options are also available for the C++ API (ScanSpecBuilder::set_row_offset and ScanSpecBuilder::set_cell_offset) and the Thrift APIs.

Chronological Timestamps

By default, Hypertable sorts Timestamps in reverse chronological order (newest on top). Many users have expressed wishes to reverse this order, therefore we added a new column family option TIME_ORDER DESC. A SELECT of a column with this option will always return the oldest values of each cell. See below how you can use this behavior to create unique user IDs or to simulate "AUTO_INCREMENT" fields.

    CREATE TABLE test (cf1 TIME_ORDER DESC);

For completeness' sake we also added TIME_ORDER ASC which specifies the default behavior.

Unique Values (for a scalable "AUTO_INCREMENT")

A common usage scenario is to create unique IDs, i.e. for users signing up to a web page, for items being stored in your catalogue etc. Traditional SQL databases have options like "AUTO_INCREMENT" which automatically assign a new ID to a row whenever you insert one. Usually they use a counter which is then incremented. Such a counter would be inefficient in a distributed environment since it always has to be synchronized with the other nodes. Therefore we came up with a new design, one that scales and does not require synchronization.

The first step is to create a column for the unique values. A unique value can never be overwritten once it was created, therefore we use TIME_ORDER DESC in combination with MAX_VERSIONS 1 (because we're not interested in other values than the oldest one).

    CREATE TABLE user_profiles (user_ids TIME_ORDER DESC MAX_VERSIONS 1);

The actual insert operation consists of two parts: first insert a unique key; second verify that it was really inserted (to make sure that no other node in the cluster has inserted an identical key in the meantime).

The following HQL tries to create a unique User ID for user "alice".

    INSERT INTO user_profiles VALUES ("alice", "user_ids", "random_unique_id");
    SELECT user_ids FROM user_profiles WHERE ROW = "alice";
    # now verify that the SELECT returned cell value "random_unique_id"

For convenience we added a new HQL function GUID() which creates globally unique IDs and which can be used to create row keys or cell values:

    INSERT INTO user_profiles VALUES ("alice", "user_ids", GUID());

For even more convenience we created a new helper library (HyperAppHelper) which can create GUIDs and insert unique values. The functions are declared in HyperAppHelper/Unique.h. These functions are also exported to the Thrift interface. Our PHP microblogging sample, which implements much of Twitter's functionality, uses this function when new users sign up.

Posted By: Christoph Rupp, Senior Software Developer, Hypertable Inc.

30
Aug

Hypertable Regression Testing

Now that Hypertable version 0.9.5.0 is out the door, I thought I would take a moment to describe some of the testing that goes into the Hypertable release process to ensure the highest product quality.  The number one priority of the Hypertable development team is to make Hypertable 100% stable and correct.  As part of that effort, an extensive suite of tests have been built to verify that code changes do not destabilize the product or introduce correctness problems.

Hypertable undergoes many levels of testing during the course of its development.  For example, prior to each release, long running system verification tests are run to verify that the system performs correctly under various realistic workloads.  In addition to these long running tests, an exhaustive set of regression tests are run prior to every code check-in.  These regression tests are a mix of unit, functional, performance, and system tests and provide good test coverage.  By running these tests frequently (i.e. prior to every check-in) bugs are caught sooner in the development cycle which minimizes their impact and speeds overall development progress.

Induced Failures

To verify that the various server processes maintain correctness under failure scenarios, we've instrumented the code with failure induction points.  These are places in the code that can be forced to fail by supplying a –induce-failure switch to server startup command, for example:

Hypertable.RangeServer --induce-failure=metadata-split-4:exit:0

The format of the value for this switch is <tag>:<errortype>:<iteration>.  <tag> identifies the failure point, <errortype> indicates the type of failure (exit or throw), and <iteration> indicates after how many iterations of hitting the failure point the error should occur.  We use this mechanism to rigorously test for correctness of the range split process as can be seen in the test output listing below.

Test Output Listing

Running tests...
Start processing tests
Test project /opt/hypertable/build/release
  1/122 Testing Common-Exception                 Passed
  2/122 Testing Common-Logging                   Passed
  3/122 Testing Common-Serialization             Passed
  4/122 Testing Common-ScopeGuard                Passed
  5/122 Testing Common-InetAddr                  Passed
  6/122 Testing Common-PageArena                 Passed
  7/122 Testing Common-Properties                Passed
  8/122 Testing Common-init                      Passed
  9/122 Testing MD5-Base64                       Passed
 10/122 Testing Common-StatsSystem-serialize     Passed
 11/122 Testing Common-StringCompressor          Passed
 12/122 Testing Common-TimeInline                Passed
 13/122 Testing Common-BloomFilter               Passed
 14/122 Testing Common-Hash                      Passed
 15/122 Testing HyperComm                        Passed
 16/122 Testing HyperComm-datagram               Passed
 17/122 Testing HyperComm-timeout                Passed
 18/122 Testing HyperComm-timer                  Passed
 19/122 Testing HyperComm-reverse-request        Passed
 20/122 Testing BerkeleyDbFilesystem             Passed
 21/122 Testing MasterOperation-TestSetup        Passed
 22/122 Testing MasterOperation-Proccessor       Passed
 23/122 Testing MasterOperation-Initialize       Passed
 24/122 Testing MasterOperation-SystemUpgrade    Passed
 25/122 Testing MasterOperation-CreateNamespac   Passed
 26/122 Testing MasterOperation-DropNamespace    Passed
 27/122 Testing MasterOperation-CreateTable      Passed
 28/122 Testing MasterOperation-RenameTable      Passed
 29/122 Testing MasterOperation-MoveRange        Passed
 30/122 Testing FileBlockCache                   Passed
 31/122 Testing QueryCache                       Passed
 32/122 Testing TableIdCache                     Passed
 33/122 Testing CellStoreScanner                 Passed
 34/122 Testing CellStoreScanner-delete          Passed
 35/122 Testing AG-garbage-tracker               Passed
 36/122 Testing Hypertable-Lib-TestSetup         Passed
 37/122 Testing Schema                           Passed
 38/122 Testing LocationCache                    Passed
 39/122 Testing LoadDataSource                   Passed
 40/122 Testing LoadDataEscape                   Passed
 41/122 Testing BlockCompressor-BMZ              Passed
 42/122 Testing BlockCompressor-LZO              Passed
 43/122 Testing BlockCompressor-NONE             Passed
 44/122 Testing BlockCompressor-QUICKLZ          Passed
 45/122 Testing BlockCompressor-ZLIB             Passed
 46/122 Testing CommitLog                        Passed
 47/122 Testing MetaLog                          Passed
 48/122 Testing Client-large-block               Passed
 49/122 Testing Client-async-api                 Passed
 50/122 Testing Client-future                    Passed
 51/122 Testing Client-row-delete                Passed
 52/122 Testing Client-periodic-flush            Passed
 53/122 Testing NameIdMapper                     Passed
 54/122 Testing StatsRangeServer-serialize       Passed
 55/122 Testing HyperDfsBroker                   Passed
 56/122 Testing Hyperspace                       Passed
 57/122 Testing Hypertable-shell                 Passed
 58/122 Testing Hypertable-shell-ldi-select      Passed
 59/122 Testing prune-tsv                        Passed
 60/122 Testing RangeServer                      Passed
 61/122 Testing ThriftClient-cpp                 Passed
 62/122 Testing ThriftClient-ruby                Passed
 63/122 Testing ThriftClient-perl                Passed
 64/122 Testing ThriftClient-python              Passed
 65/122 Testing ThriftClient-php                 Passed
 66/122 Testing ThriftClient-java                Passed
 67/122 Testing Balancing-mechanics              Passed
 68/122 Testing Balancing-load                   Passed
 69/122 Testing Client-scanner-abrupt-end        Passed
 70/122 Testing Client-future-abrupt-end         Passed
 71/122 Testing Client-future-mutator-cancel     Passed
 72/122 Testing Client-random-write-read         Passed
 73/122 Testing Client-no-log-sync               Passed
 74/122 Testing CellStore-garbage-collection     Passed
 75/122 Testing RangeServer-commit-log-gc        Passed
 76/122 Testing AG-garbage-collection-max-vers   Passed
 77/122 Testing AG-garbage-collection-deletes    Passed
 78/122 Testing AG-garbage-collection-ttl        Passed
 79/122 Testing Dual-instances                   Passed
 80/122 Testing RangeServer-load-exception       Passed
 81/122 Testing MasterClient-TransparentFailov   Passed
 82/122 Testing METADATA-split-recovery-0        Passed
 83/122 Testing METADATA-split-recovery-1        Passed
 84/122 Testing METADATA-split-recovery-2        Passed
 85/122 Testing METADATA-split-recovery-3        Passed
 86/122 Testing METADATA-split-recovery-4        Passed
 87/122 Testing METADATA-split-recovery-5        Passed
 88/122 Testing METADATA-split-recovery-6        Passed
 89/122 Testing METADATA-split-recovery-7        Passed
 90/122 Testing METADATA-split-recovery-8        Passed
 91/122 Testing METADATA-split-recovery-9        Passed
 92/122 Testing METADATA-split-recovery-10       Passed
 93/122 Testing METADATA-split-recovery-11       Passed
 94/122 Testing METADATA-split-recovery-12       Passed
 95/122 Testing METADATA-split-recovery-13       Passed
 96/122 Testing RangeServer-maintenance-thread   Passed
 97/122 Testing RangeServer-row-overflow         Passed
 98/122 Testing RangeServer-rowkey-ag-imbalanc   Passed
 99/122 Testing RangeServer-sequential-load      Passed
100/122 Testing RangeServer-split-recovery-1     Passed
101/122 Testing RangeServer-split-recovery-2     Passed
102/122 Testing RangeServer-split-recovery-3     Passed
103/122 Testing RangeServer-split-recovery-4     Passed
104/122 Testing RangeServer-split-recovery-5     Passed
105/122 Testing RangeServer-split-recovery-6     Passed
106/122 Testing RangeServer-split-recovery-7     Passed
107/122 Testing RangeServer-split-recovery-8     Passed
108/122 Testing RangeServer-split-merge-loop10   Passed
109/122 Testing RS-group-commit-split-recovery   Passed
110/122 Testing RS-group-commit-split-recovery   Passed
111/122 Testing RS-group-commit-split-recovery   Passed
112/122 Testing RS-group-commit-split-recovery   Passed
113/122 Testing RS-group-commit-split-recovery   Passed
114/122 Testing RS-group-commit-split-recovery   Passed
115/122 Testing RS-group-commit-split-recovery   Passed
116/122 Testing RS-group-commit-split-recovery   Passed
117/122 Testing RS-group-commit-split-recovery   Passed
118/122 Testing RangeServer-bloomfilter-rows     Passed
119/122 Testing RangeServer-bloomfilter-rows-c   Passed
120/122 Testing RangeServer-ScanLimit            Passed
121/122 Testing ThriftClient-reconnect-hypersp   Passed
122/122 Testing Thrift-table-refresh             Passed

100% tests passed, 0 tests failed out of 122
Built target alltests

real  47m25.639s
user  1m29.231s
sys   1m56.113s

By requiring that these regression tests pass prior to each code check-in, we catch most bugs very early in the development process when they are the easiest to track down.  This leads to a much more stable code base and allows us to make rapid forward progress.

The Hypertable project was started in early 2007 and has had the benefit of many real-world deployments, including Baidu (Nasdaq: BIDU), China's leading search engine and, Rediff.com (Nasdaq: REDF), the largest India-owned and operated web portal and one of the top ten e-mail providers worldwide.  Our dedication to quality along with years of real-world production deployment experience has made Hypertable the rock solid, production-quality database that it is today.  If you have need for high performance, scalable database, now would be a great time to give Hypertable a try.

Posted By:  Doug Judd, CEO, Hypertable Inc.

4
Aug

Opportunities for AI in Hypertable to be presented at AAAI 2011 Workshop

This Sunday, August 7th, the Hypertable development team will be presenting at AAAI 2011 Workshop – AI For Data Center Management and Cloud Computing, in San Francisco, CA.  The objective of this workshop is to bring together researchers and technologists from academia and industry to explore the applications of artificial intelligence to the most pertinent technical challenges in data center management and cloud computing.  The Hypertable team will be giving the following talks.

Opportunities for AI in a Cloud Database:  Hypertable
Doug Judd, Hypertable, Inc (invited speaker)
A significant percentage of cloud data center hardware is dedicated to running cloud storage services.  As the Big Data revolution continues and cloud databases become more sophisticated, the percentage of data center equipment dedicated to storage services is likely to grow.  In this talk, Doug Judd, the original creator of Hypertable, will present opportunities for AI in cloud databases, using Hypertable as an example.

Load Balancing in Hypertable
Gordon Rios, University College Cork, Constraint Computation Centre
In Hypertable ranges of table data are stored and accessed on different nodes and allows for flexible management of the underlying hardware. Overall performance is sensitive to the balance of range load across the cluster. The project developers aim to create a simple interface to allow researchers to design experimental load balancing strategies that incorporate machine learning and optimization. This talk introduces the load balancing problem and presents it as a challenge problem for AI and machine learning.

If you have an interest in constraint optimization, machine learning, or AI in general, and would like to apply these techniques in Hypertable, we encourage you to attend the workshop and/or get involved in the project by joining the Hypertable Developers mailing list.

Posted By:  Rebecca Ritter, VP Marketing and Corporate Communications, Hypertable Inc.
10
May

Stability Improvements in the Hypertable 0.9.5.0 pre-release

We recently announced the Hypertable 0.9.5.0 pre-release.  Even though we've labelled it as a "pre" release, it is one of the biggest and most important Hypertable releases to date.  Among other things, it includes a complete re-write of the Master, to fix some known stability problems.  It represents a significant amount of work as can be seen by the following code change statistics:

  • 512 files changed
  • 30,633 line insertions
  • 14,354 line deletions

The following describes problems that existed in prior releases and how they were solved, and highlights other stability improvements included in the 0.9.5.0 pre-release.

Duplicate range load.  In prior releases, when a Range Server decided to give up a range (e.g. after a split), it would inform the master by calling the Master::move_range() method and then record the move in its meta log (RSML).  Unfortunately, this logic contained a race condition.  If the range server called Master::move_range(), but died before it got a chance to record the move in the RSML, and then the Master was stopped (e.g. sysadmin restart of the system), all record of the move was lost.  When the RangeServer came back up, it would re-attempt to move the range, causing it to get loaded by two different range servers.  With the introduction of the Master MetaLog (MML) and a two-phase Master::move_range() operation, this problem has been resolved.

Overlapping ranges.  In prior releases, the Master would ask a range server to load a range by calling the RangeServer::load_range() method and would rely on the ALREADY_LOADED response code to handle situations where the acknowledgement was lost (e.g. range server or master died at an inopportune moment) and the RangeServer::load_range() call was re-issued.  This logic also contained a race condition.  When a range was loaded and the acknowledgement was lost, the loaded range could split before the Master re-attempted to load the range.  When RangeServer::load_range() call was re-issued, the RangeServer happily loaded the range because it no longer contained the range in its live set (due to the split).  With the introduction of a two-phase load range operation, this problem has been resolved.

Lost updates (multiple access groups).  This was a bug in how the system decided to remove commit log fragments for tables with multiple access groups.  The system computes an "earliest cached revision" value for each access group and only removes commit log fragments that contain cells whose revision number is less than that value.  In prior releases, there was a bug where the earliest cached revision of the last access group was taken for all the access groups.  In certain situations, this caused commit log fragments to be removed prematurely which resulted in data loss on system restart.  This bug has been fixed in the current release.
 
Transparent Master Failover.  In prior releases, if a client issued a request to the Master and the Master failed before delivering the results, the request would fail.  With the introduction of the Master MetaLog and a two-phase request sequence, a Master failover can occur mid-request and the request will complete successfully on the new Master, completely transparent to the requesting client.
 
Cloudera's CDH3 Hadoop Release.  A big source of stability issues in the past have stemmed from problems with HDFS.  The well-known "sync" issue has been the biggest trouble for Hypertable, causing critical log files to effectively disappear, resulting in data loss or worse, leaving the system in an inconsistent and inoperable state.  The CDH3 Hadoop release from Cloudera includes a number of patches to the 0.20.2 Apache release that appear to have solved the sync problem.  We tested it with Hypertable through the beta period and have found it to be stable.  The current Hypertable release is built against CDH3 and we recommend it for all Hypertable deployments.
 
Stability is our #1 priority followed closely by performance and scalability.  Hypertable has been in development since early 2007 and the feedback we've gotten over the years from our production deployments and open source community has helped Hypertable to stabilize and become a much more mature product.  We're aggressively working towards the 1.0 release and we look forward to seeing Hypertable become the infrastructure of choice for solving big data problems.  Follow us on Twitter @hypertable to keep up-to-date with the latest Hypertable developments.  Thank you.
 
Posted By:  Doug Judd, CTO, Hypertable, Inc.

22
Nov

Hypertable Leverages Bigtable’s High Performance Pattern Matching Code


We're thrilled to announce that as of the 0.9.4.3 release we've knocked out a popular feature request: regular expression based filtering.  Queries can now filter cells by regular expression matches on the row key, column qualifier, and value.

To implement this feature, we decided to use Google's RE2 regular expression engine. Although there are a number of excellent regular expression engines to choose from, RE2 fits perfectly with Hypertable's high performance philosophy and the fact that it powers Bigtable, Sawzall and a host of other Google projects made it a great candidate.  Our reasons for picking RE2:

  • RE2 supports most Perl/PCRE syntax and allows us to provide feature rich regular expression matching.
  • It is blazingly fast (guaranteed linear run time), so we don't have to worry about ad-hoc queries hogging up too much CPU.  With a 110MB dataset consisting of about 4.5M unique URLs and our tests showed RE2 was 3X-50X faster as compared to java.util.regex.Pattern.
  • It uses a small, fixed amount of memory so no query can bring down the server with unbounded memory usage.
  • RE2 allows Hypertable to deliver powerful pattern matching capability at the lowest hardware cost.
  • The fact that RE2 powers Bigtable, Sawzall and other critical Google code speaks for itself.

Example

Let's walk through an example using the DMOZ dataset.  This data contains a title, description, and category tags for a set of URLs.  The domain components of the URL have been reversed so that URLs from the same domain sort together.  In the schema, the row key is a URL and Title, Description and Topic are column families.  The DMOZ dataset is about 2GB uncompressed. Here's a small sample from the dataset:

com.awn.www     Title   Animation World Network
com.awn.www     Description     Provides information resources to the …
com.awn.www     Topic:Arts      
com.awn.www     Topic:Animation

Download the dataset, which is in the .tsv.gz format and can be directly loaded into Hypertable without unzipping

wget https://s3.amazonaws.com/hypertable-data/dmoz.tsv.gz

Create the dmoz table and populate it with the downloaded data file using the hypertable shell:

/opt/hypertable/current/bin/ht shell

USE "/";
CREATE TABLE dmoz(Description, Title, Topic, ACCESS GROUP topic(Topic));
LOAD DATA INFILE "dmoz.tsv.gz" INTO TABLE dmoz;

Row Filtering

Suppose you want a subset of the URLs where the phrase linux appears in the domain name, you could run:

SELECT Title FROM dmoz WHERE ROW REGEXP "^[^/]*linux" KEYS_ONLY;

ar.com.carreralinux.www

ar.com.linux-way.www

 

Column Filtering

To match all topics that start with write (case insensitive):

SELECT Topic:/(?i)^write/ FROM dmoz;

13.141.244.204/writ_den Topic:Writers_Resources

ac.sms.www/clubpage.asp?club=CL001003004000301311       Topic:Writers_Resources

 

Value Filtering

The next example shows how to query for data where the description contains the word game followed by either foosball or halo:

SELECT CELLS Description FROM dmoz WHERE
    VALUE REGEXP "(?i:game.*(foosball|halo)\s)";

com.armchairempire.www/Previews/PCGames/planetside.htm  Description     Preview by Mr. Nash. "So, on the one side the game is sounding pretty snazzy, on the other it sounds sort of like Halo at its core."com.digitaldestroyers.www       Description     Video game fans in Spartanburg, South Carolina who like to get together and compete for bragging rights. Also compete with other Halo / Xbox fan clubs

 

Conclusion

The mission of the Hypertable project is to couple a rich feature set with optimum performance. The incorporation of Google's RE2 engine is yet another example of our continuing efforts towards that end.  We'd like to thank Google and RE2 team for contributing this excellent code to the open source community.

 

Posted by: Sanjit Jhala, Lead Developer, Hypertable Inc.
11
Nov

Hypertable Use Case: Tribalytic

Tribalytic is a market research tool that delivers instant insight into brand and marketing campaigns by analyzing Twitter data.  Given any set of keywords, Tribalytic will provide analysis results on key market research metrics.

Let’s take the phrase "coffee machine" as an example. With Tribalytic, you can quickly see that people usually tweet about coffee machines during the following times — around 9:00, 14:00, 21:00.  This information is crucial for a marketer who can now identify the top keyword “broken” and the time frames this word should be linked to their coffee machine marketing campaign. The powerful Tribalytic research tool allows companies to see the percentage of all twitter users in Australia that have mentioned “coffee machine” within any given time period, and target their marketing campaigns not only based on key words, but on the exact times their audience is engaged.  

Information like that obtained in the example above provides crucial insight for the marketing decision making process —  i.e. how long a campaign should run, how to measure the campaign effectiveness, and whether the campaign has reached the majority of the target audiences.

Tribalytic is revolutionary because it requires no setup. Compared to our competitors, who require a sales call and manual setup per keyword monitoring. With Tribalytic, all you need to do is to type a keyword, check a few boxes and you can start getting benefits right away. This instantaneous feedback experience has been a 'killer' feature to our customers.

Hypertable is one of the key technologies that make "the slick Tribalytic experience" possible.  This post will talk about our system architecture, how we integrate Hypertable into our system, and why we chose Hypertable over HBase and MongoDB.

System Architecture

The latest tweets for the people in the sample pool are downloaded and stored into a MySQL database.  An indexer process reads the new tweets, parses them and stores the results into Hypertable.  When a user submits a query, data is read from Hypertable, an in-memory analysis is performed, and the results are presented back to the user.

Currently, we have only one table called "hits", which is essentially an inverted index for quick keyword lookups. Let's say the user wants to research “coffee” for the last two weeks.  All of the users who mentioned coffee during that period and their related tweets are loaded into memory so that our analytic engine can do its job.  The speed of reading data from disk into memory is the key to the overall performance.  After we switched the data store to Hypertable, plus rewriting the analytic engine in c++, we successfully reduced query time for coffee from 200 seconds to 0.5 second. 1000 times faster!

Tribalytic tracks 200,000 Australian twitter users. On a typical day, we will process up to 3,500,000 tweets and about 50,000,000 new records will get pushed into the 'Hits' table. If you consider that 9.837% people in Australia talk about coffee at least once a week, there is a lot of data loaded into memory for analysis.  Because Hypertable stores data for the same keyword consecutively on disk, reading them in one shot does not incur excessive disk seeks, which is usually the root cause for slow query speed. In our coffee example, we spend less than 250 ms to load all the data into memory, leaving another 350 to 550 ms for analysis and still have plenty time left for the normal 'web' stuff.

How we integrated Hypertable

The indexer is written in python. It saves data to Hypertable via the python thrift interface. Initially we found that Hypertable’s array version of the interface worked best for us, since it avoids unnecessary python object construction/destruction overhead.

The query and analytic part are both written in c++ using the Hypertable c++ client library. The analyzer exposes the query functionality via a thrift socket interface to front-end python/django processes.

Why we chose Hypertable

One of the primary reasons we chose a scalable NoSQL database was so that the system would be architected to scale up as we grow.  We can add more countries, increase sample sizes, and sustain more traffic without “hitting the wall”.

Our initial prototype was built using MongoDB.  Because MongoDB uses a B-tree as its underlying data store, it's just not the right technology for sequential data reading. In our coffee benchmark, it would take 50 – 75 seconds to read all data into memory.  We still use MongoDB to serve more traditional profile queries that require more random reads over small data sets.

Before we adopted Hypertable, we built a prototype using HBase. Given the apache "brand", and the Hadoop, Mahout, and Hive projects out there, how could we not?  The main reason we chose Hypertable instead of HBase was because of the increased performance Hypertable was able to deliver, which reduced our hardware footprint and thereby lowered cost.

Doug and his team chose c++ over Java for good reasons — to garner greater performance output. We've also observed similar significant performance boost when switching from HBase to Hypertable.  C++ offers much finer control over how the memory can be used more efficiently.  We use two Google open source libraries sparsehash and perftools extensively in our analytical engine to achieve our performance goal.

One mis-conception about C++ is the perceived inevitability of the segfault. We had an extensive testing suite built using boost.unit and gcover to make sure all the corners have been covered.  Since we deployed the new service 3 months ago, we haven't had any hiccups.

About the code and the team

The Hypertable team has been very supportive. Most questions were answered within 24 hours. The code has been exceptionally clean and well documented. Back in Feb, one of the Hypertable community members ran a c++ static analyzer on the code and only found two issues.  For more than half of my questions, I could find the answers by just sorting through the code.

Posted By: Alex Dong, CTO, BinaryPlex Pty. Ltd.

25
Aug

Hypertable T-Shirt Preview

In a week, we'll be printing high quality Hypertable t-shirts and will be passing them out to all of the attendees at the upcoming Hypertable meetup at the Facebook Headquarters in Palo Alto, CA.  As computer geeks, we obtain a substantial portion of our wardrobe at conferences and trade shows, so we understand the value of a good FREE t-shirt.  So, for this one, we spared no expense.  Check out the following mock-up and details below.

The front …

Hypertable T-Shirt Front

The back …
Hypertable T-Shirt Back

The Design

We hired the design firm Blue Coast Web to put together the design, the same firm that we used to put together the hypertable.org site.  They came up with a number of creative t-shirt design ideas which we spent a fair amount of time deliberating over.  We ended up going with a design that we felt would be appealing to broadest cross-section of Hypertable supporters.

The Shirt

We chose high quality American Apparel shirts (Combed fine jersey 4.3 oz, 100% cotton) at an additional cost of $2/shirt.  The reason we went with the American Apparel is because we like the cut (more stylish) and the fabric felt better than the other options.

The Printing Process

The shirts will be printed by Graphic Sportswear in San Francisco.  It will (likely) be a seven color screenprint (to achieve a four color process gradient look).  Colors will likely be:

  • black for tesseract outline
  • light green/grey of tesseract "panes"
  • light green
  • medium green
  • dark green
  • light grey in hypertable text, to achieve gradient to the…
  • dark grey in hypertable text

If you're interested in Hypertable and want to get your hands on one of these t-shirts, come join us at the upcoming Hypertable meetup at the Facebook office in Palo Alto on September 16th.  We'll be discussing the recently developed Hypertable+Hive extension and will present the results of the Hypertable vs. HBase Performance Evaluation.  Hope to see you there!

Posted By:  Rebecca Ritter, VP, Corporate Communications, Hypertable, Inc.

 

8
Aug

ArchCamp: Scalable Databases (NoSQL)

On Friday night, Sanjit Jhala (Lead Hypertable Engineer) and I attended the ArchCamp unconference at HackerDojo in Mountain View, CA put on by Adam Hitchcock from Skype and Zach Steindler from Olark.  I've been to a number of unconferences and this one was definitely one of the better ones.  There was plenty of pizza, beer, and great discussions.  I enjoyed the conversations I had with the other attendees, including Paco Nathan from KwizineLoc and Todd Hoff of High Scalability fame.  These are my notes from a session we led entitled, "Scalable Databases (NoSQL)"

The session started out free-form, but shaped up pretty quickly into a discussion of the popular open source scalable NoSQL databases and the architectural categories in which they belong.  The following is a description of each category that was considered.

Key/Value store – This category applies to Distributed Hash Table (DHT) technology, a scalable database architecture that primarily supports GET, SET, and DELETE operations on key/value pairs.  It's a bit of a misnomer because a system like Bigtable supports these operations, but the term is most commonly associated with DHT technology.

Column-oriented – This category refers to database systems that store data physically grouped on disk by column (see Column-oriented DBMS).  They can achieve very high throughput column selects because only the data for the selected columns is transferred from the disk subsystem.  Most people categorize Bigtable as a column oriented data store.  This is somewhat of a misnomer because Bigtable supports what's known as Locality Groups, giving the schema designer control of how column data is physically stored on disk.  Bigtable can be configured to be column-oriented by putting each column in its own locality group, or row-oriented by putting all columns in a single locality group, or some combination of the two (e.g. two columns in a locality group and the rest in another).

Document-oriented - This category refers to systems that facilitate the storage and retrieval of semi-structured documents.  Systems in this category provide APIs that allow applications to easily store documents in a format such as JSON and can auto-index fields.

In-memory – This category describes databases that store their entire data set in memory.  They can deliver excellent performance, but their capacity is limited by the aggregate amount of physical RAM available system-wide.  Many of these systems backup data to disk via a journal and/or snapshotting.

Auto-sharding – Sharding is what most people do when they have no scalable database (e.g. Facebook).  Table data is partitioned into horizontal "shards" and each shard is stored on a separate machine in a traditional database (e.g. MySQL).  To achieve high availability, each shard server can be replaced with a traditional master/slave replica group. Typical homegrown sharding systems require a fair amount of "glue code" to route requests to shard servers.  Auto-sharding systems automate this process. 

DynamoDynamo is a distributed hash table (DHT) architecture developed by Amazon.com specifically for their shopping cart system.  One of the key requirements behind the design was high availability writes, which was born out of the desire to avoid system failure when a customer adds an item to their cart.  To that end, it employs a technique called Evenutal Consistency whereby writes succeed when a subset of replicas acknowledge.  This can lead to inconsistencies in the face of machine failures which are reconciled during reads.

BigtableBigtable is a scalable database system developed by Google.  It is designed to operate within the context of their broader scalable computing stack which includes the Google File System (GFS) and Map/Reduce.  All state in Bigtable is persisted in the GFS.  The system provides "immediate" consistency at the expense of latency spikes during machine failures.

Database Architecture Implementation
Language
Hypertable Bigtable, Column-oriented C++
HBase Bigtable, Column-oriented Java
Cassandra Dynamo (with Bigtable data model) Java
MongoDB Auto-sharding C++
Riak Dynamo, Key/Value store Erlang
CouchDB Document-oriented (non-scalable) Erlang
Redis In-memory, Key/Value store C
Project Voldemort Dynamo Java
Tokyo Cabinet/Tyrant Key/Value store C
Dynomite Dynamo Erlang
Amazon S3 Hosted Key/Value store Java
Amazon RDS Hosted MySQL C

We scribbled notes up on a whiteboard that was like a conveyer belt turned on its end.  There was a button on a control panel below the board which, when pressed, caused the conveyer belt to rotate, scan the board, and dispense a printout of the board contents from a print roller attached to the control panel.  Here's the printout of the board for our session (sorry for the chicken scratch, I should've been a Doctor!):

Many thanks to Adam and Zach for organizing the conference.  We look forward to the next one!

Hypertable will be holding its next meetup at Facebook in Palo Alto, CA.  Please come join us for pizza, beer, and a free Hypertable t-shirt (be sure to RSVP with your size).

Posted By:  Doug Judd, CEO, Hypertable, Inc.

24
Jun

Hypertable vs. HBase Performance Evaluation

In our initial blog post, Why We Started Hypertable, Inc., we mentioned that achieving optimum performance has been one of the project’s most dominant guiding principles.  As a result, we made the decision to do the implementation of Hypertable in C++.  Today, we are pleased to publish our first set of benchmark results, comparing the performance of Hypertable with that of HBase.  These results demonstrate that implementation language matters, and that C++ delivers a significant performance advantage.  A detailed test report can be found through the following link.

Hypertable vs. HBase Performance Evaluation Test Report

A sample of the results are presented in the table below.

 Test  Hypertable Performance
 Improvement Relative to HBase (%)
 Random Read Uniform 80 GB  396
 Random Read Uniform 20 GB  424
 Random Read Uniform 2.5 GB  97
 Random Read Zipfian 80 GB  925
 Random Read Zipfian 20 GB  777
 Random Read Zipfian 2.5 GB  100
 Random Write 10000 byte values  51
 Random Write 1000 byte values  102
 Random Write 100 byte values  427
 Random Write 10 byte values  931
 Sequential Read 10000 byte values   1060
 Sequential Read 1000 byte values  68
 Sequential Read 100 byte values  129
 Scan 10000 byte values  2
 Scan 1000 byte values  58
 Scan 100 byte values  75
 Scan 10 byte values  220

The first observation that I'd like to make has to do with the performance difference of the two systems as the size of the value decreases.  One key target application for a Bigtable-like scalable database is Web-scale analytics.  This type of application typically operates on hundreds of millions or billions of very small values.  Think session click counts.  We feel that over time, this will be a principal driver of demand for the technology.  As can be seen by the results, Hypertable's performance relative to HBase grows considerably as the size of the value decreases.  This puts Hypertable at a distinct advantage over HBase when it comes to supporting this important class of application.

The second observation I would like to make has to do with the suitability of these systems in cloud environments.  Traditional datacenters see utilization rates that are often below 20%, whereas cloud computing providers routinely report utilization rates at nearly 80%.  A scalable, multi-tenant database service offering in a cloud environment can expect to experience similarly high utilization rates.  At these levels, performance wins translate directly into cost savings.  With these kinds of performance multiples, Hypertable can deliver multi-tenant database capacity at a fraction of the cost of a system like HBase.

And finally I would like to comment on the sustainability of these results. Some of the performance difference between the two systems may have to do with better design choices on the part of Hypertable, but the bulk of the performance difference can be attributed to the implementation language. Bigtable-like database systems are very memory intensive and CPU intensive. Java, in comparison to C++, is notoriously poor when it comes to leveraging these two resources.  As both of these systems evolve, they will provide more CPU intensive functionality (data types, aggregates, etc.).  This means that over time, the disparity in performance will continue to grow.

We're excited by these results and look forward to working with you to help build your next-generation, big data applications.

Posted By:  Doug Judd, CEO, Hypertable, Inc.

1
Jun

Why We Started Hypertable, Inc.

Welcome to the Hypertable blog.  We will be using this blog to communicate what’s happening with Hypertable – both the company and the project – and to express our opinions on issues being discussed in the world of “big data” applications.

It has been about six months since we formed Hypertable, Inc. so we thought, for our first blog post, it would be appropriate to discuss why we decided to start the company.  Back in 2006, I was brought on as an Architect at Zvents.  Our goal was to do for “local” search what Google had done for global search.  This meant collecting growing amounts of query log, click log, and crawl data, doing analytics over that data, and using the results to fuel our ranking, recommendation, and ad targeting systems.  One of the key requirements was to build a system that could keep pace with growth in traffic, and to that end, we decided to forgo traditional RDBMS technology and build the back-end system entirely on top of scalable computing technology.

At the time, Hadoop existed, which provided the scalable file system and a MapReduce framework, but there was no scalable indexing implementation for serving large data sets to live applications.  Google’s BigTable was the obvious choice, and the design jibed with my experience building large-scale crawling and indexing systems at Inktomi.  Not long after we decided to go down the BigTable path, the Dynamo paper was published.  We seriously considered how they handled inter-data center replication through the use of eventual consistency, but felt that it made more sense to employ these techniques at a higher level.  We felt that the BigTable design was much more versatile.  Not only could it handle simple key/value look-ups, but being a sorted data structure, it could handle range queries very efficiently, critical for analytics applications.  And being fully consistent, it was much simpler to reason about.  Fortunately for the project, the BigTable design has proven to work well in practice.  BigTable powers nearly every major service at Google (over 100 of them) and is one of their most important pieces of scalable computing infrastructure.

Achieving optimum performance has been one of the project’s most dominant guiding principles, and the reason is simple.  Efficiency gains scale linearly with the system, which dramatically reduces hardware requirements and lowers overall operational costs.  It was within this context that we decided to do the implementation in C++.  (We will be announcing some benchmark results soon which makes us think we made the right decision.)  In addition, since we were building general purpose technology that applied to much more than just local search, we decided to open source the project and reap all of the benefits that the open source development model has to offer.  You can see more details on the Hypertable project here.

As we saw Hypertable being downloaded and used by more and more companies – and big data applications becoming an integral and rapidly growing part of our economy – we realized it would be important to have an organization to provide the capabilities and support enterprises need for production implementations of Hypertable.  We also felt that, as the primary committers to the Hypertable project, we would be best equipped to address that need.  As a result, after much contemplation, we decided to leave Zvents to form Hypertable, Inc.

The mission of Hypertable, Inc. is to provide support for organizations that run Hypertable as part of their business critical applications.  The company provides a host of professional services including architecture consulting, training, and 24/7 support so that enterprises can have a service level guarantee that their scalable database infrastructure will be up and operational.

We look forward to using this blog to keep you updated.

Regards,

Doug Judd

CEO and Co-Founder

Hypertable, Inc.

  • Archives

  • Meta