Ben Lorica is the Senior Analyst in the Market Research Group at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, and Financial Engineering. At O'Reilly, Ben works in the open source data warehouse and analytics area.
An ex-academic, he was an Assistant Professor at U.C. Davis and was the founding Department Chair for Statistics and Mathematics at C.S.U. Monterey Bay.
|
|
|
|
|
Recent Posts | All O'Reilly Posts
Ben blogs at:
http://www.oreillynet.com/conferences/blog/web_20_summit/
http://radar.oreilly.com
http://strata.oreilly.com
HBase looks more appealing to data scientists
June 16 2013
When Hadoop users need to develop apps that are “latency sensitive”, many of them turn to HBase1. Its tight integration with Hadoop makes it a popular data store for real-time applications. When I attended the first HBase conference last year, … read moreIt’s getting easier to build Big Data applications
June 09 2013
Hadoop’s low-cost, scale-out architecture has made it a new platform for data storage. With a storage system in place, the Hadoop community is slowly building a collection of open source, analytic engines. Beginning with batch processing (MapReduce, Pig, Hive), Cloudera … read moreTracking the progress of large-scale Query Engines
June 04 2013
As organizations continue to accumulate data, there has been renewed interest in interactive query engines that scale to terabytes (even petabytes) of data. Traditional MPP databases remain in the mix, but other options are attracting interest. For example, companies willing … read moreHow signals, geometry, and topology are influencing data science
May 24 2013
I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so … read moreImproving options for unlocking your graph data
May 19 2013
The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open … read more11 Essential Features that Visual Analysis Tools Should Have
May 11 2013
After recently playing with SAS Visual Analytics, I’ve been thinking about tools for visual analysis. By visual analysis I mean the type of analysis most recently popularized by Tableau, QlikView, and Spotfire: you encounter a data set for the first … read moreScalable streaming analytics using a single-server
May 05 2013
For many organizations real-time1 analytics entails complex event processing systems (CEP) or newer distributed stream processing frameworks like Storm, S4, or Spark Streaming. The latter have become more popular because they are able to scale (ingest) massive amounts of data, … read moreTachyon: An open source, distributed, fault-tolerant, in-memory file system
April 28 2013
In earlier posts I’ve written about how Spark and Shark run much faster than Hadoop and Hive by1 caching data sets in-memory. But suppose one wants to share datasets across jobs/frameworks, while retaining speed gains garnered by being in-memory? An … read moreSimpler workflow tools enable the rapid deployment of models
April 21 2013
Data science often depends on data pipelines, that involve acquiring, transforming, and loading data. (If you’re fortunate most of the data you need is already in usable form.) Data needs to be assembled and wrangled, before it can be visualized … read moreSingle server systems can tackle big data
April 13 2013
About a year ago a blog post from SAP posited1 that when it comes to analytics, most companies are in the multi-terabyte range: data sizes that are well-within the scope of distributed in-memory solutions like Spark, SAP HANA, ScaleOut Software, … read moreThe re-emergence of time-series
April 09 2013
My first job after leaving academia was as a quant 1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and … read moreThe re-emergence of Time-series
April 05 2013
My first job after leaving academia was as a quant1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, … read moreData Science tools: Are you “all in” or do you “mix and match”?
March 31 2013
An integrated data stack boosts productivity As I noted in my previous post, Python programmers willing to go “all in”, have Python tools to cover most of data science. Lest I be accused of oversimplification, a Python programmer still needs … read morePython data tools just keep getting better
March 24 2013
Here are a few observations inspired by conversations I had during the just concluded PyData conference1. The Python data community is well-organized: Besides conferences (PyData, SciPy, EuroSciPy), there is a new non-profit (NumFOCUS) dedicated to supporting scientific computing and data … read moreData Science Tools: Fast, easy to use, and scalable
March 03 2013
Here are a few observations based on conversations I had during the just concluded Strata Santa Clara conference. Spark is attracting attention I’ve written numerous times about components of the Berkeley Data Analytics Stack (Spark, Shark, MLbase). Two Spark-related sessions … read moreMLbase: Scalable Machine-learning made accessible
February 22 2013
In the course of applying machine-learning against large data sets, data scientists face a few pain points. They need to tune and compare several suitable algorithms – a process that may involve having to configure a hodgepodge of tools, requiring … read moreAn update on in-memory data management
February 21 2013
By Ben Lorica and Roger Magoulas We wanted to give you a brief update on what we’ve learned so far from our series of interviews with players and practitioners in the in-memory data management space. A few preliminary themes have … read moreNeed speed for big data? Think in-memory data management
January 18 2013
By Ben Lorica and Roger Magoulas In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries … read moreGraphChi: Graph analytics over billions of edges using your laptop
December 12 2012
GraphChi is a spinoff project of GraphLab, an open source, distributed, in-memory software system for analytics and machine-learning. Designed specifically to run on a single computer with limited memory1 (DRAM), since its release a few months ago GraphChi has been … read moreShark: Real-time queries and analytics for big data
November 27 2012
Hadoop’s strength is in batch processing, MapReduce isn’t particularly suited for interactive/adhoc queries. Real-time1 SQL queries (on Hadoop data) are usually performed using custom connectors to MPP databases. In practice this means having connectors between separate Hadoop and database clusters. … read moreSpark 0.6 improves performance and accessibility
October 16 2012
In an earlier post I listed a few reasons why I’ve come to embrace and use Spark. In particular I described why Spark is well-suited for many distributed Big Data Analytics tasks such as iterative computations and interactive queries, where … read moreSeven reasons why I like Spark
August 21 2012
A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a … read moreActive Facebook users by region: November, 2010
November 16 2010
With Facebook unveiling an integrated messaging system for its more than 500 million users, I decided to update a few charts that breakdown its users by region. read moreHiring trends among the major platform players
November 15 2010
After recently re-reading Tim's post on the major internet platform players, I looked at recent hiring trends* among the companies he highlighted. First I examined year-over-year changes in number of job postings (from Aug to Oct 2009 vs. Aug to Oct 2010). Consistent with the recent flurry of articles about… read moreWindows Phone apps are more expensive than iPhone apps
November 05 2010
The Windows Marketplace for Mobile now has about 1,400 apps spread across 16 categories. In this short post I'll provide some basic statistics and compare it with the grandaddy of app stores: the U.S. iTunes store. read moreCrowdsourcing Specific Microtasks
October 25 2010
Since the first-ever Mechanical Turk meetup a year ago, there has been an explosion in crowdsourcing services and a well-attended conference in San Francisco. I remain enthusiastic about crowdsourcing, but the number of companies has me worried about quality of work. Fortunately specialization is already occurring, so for particular tasks… read moreAmazon's cloud platform still the largest, but others are closing the gap
August 31 2010
Tim's recent tweet on the growing demand for Google App Engine skills inspired me to measure the popularity of the major cloud computing platforms. Elance is one of many job boards in our data warehouse of U.S. job postings1 , and I wanted to measure demand across many more job… read moreThe number of Hadoop jobs continue to rise
August 08 2010
While still a small fraction1 of data management job postings, the number of job posts that mention "hadoop" continue to grow steadily. Year-over-year, there were 300% more such job posts2 in the first seven months of 2010 compared to the same period in 2009: The fraction of "hadoop" jobs posted… read moreWhich Social Gaming companies are Hiring
July 29 2010
Disney's announced purchase of Mountain View gaming startup Playdom, follows on the heels of EA's purchase of London-based Playfish last November. Based on active users Zynga remains by far the biggest online social gaming company, but what other independent companies are growing? To see which companies are expanding, I used… read moreWhere Facebook's half a billion users reside
July 21 2010
Facebook announced that they now reach 500 million active users (just five and half years after launching). But where do these half a billion users reside? Refreshing my post from February, the share of users from Asia continues to rise and now stands at 17% of all Facebook users. Over… read morePopular iPhone games stay highly-ranked only for a few weeks
June 30 2010
With 40,000+ Games to choose from, the list of Top 100 free and paid games are frequently scanned by iPhone gamers. In this short post, I'll share some basic statistics on popular games sold through the U.S. iTunes app store. read moreActually, half of all iPad Books are Fiction
May 05 2010
Suggestions to my previous post inspired me to normalize our metadata1 for titles available through the U.S. iBooks app. A comment prompted me to rollup iBooks publishers into publishing conglomerates2: Comments from other readers gave me the idea to map the 100+ iBooks categories to the more familiar BISAC categories.… read moreA few weeks in, a third of iPad Books are Fiction
April 29 2010
Measured in terms of number of titles, half of the over 46,000 (paid and free) books available through the iBooks app are from 6 categories1. Fiction & Literature alone account for close to a third of all available iBooks titles: The current set of titles is indicative of the publishers… read moreBig Data shakes up the Speech Industry
April 23 2010
I spent a few hours at the Mobile Voice conference and left with an appreciation of Google's impact on the speech industry. Google's speech offerings loomed over the few sessions I attended. Some of that was probably due to Michael Cohen's keynote1 describing Google's philosophy and approach, but clearly Google… read moreCookbooks: The highest priced iPad book category
April 21 2010
Just like the iTunes app store, the iBooks app on the iPad spotlights the Top Paid (and Top Free) books within each category. Here are some charts that compare the average price (by rank)1 across the major categories. The average price of the Top 50 titles across the major categories… read moreBig Data Analytics: From Data Scientists to Business Analysts
April 19 2010
The growing popularity of Big Data management tools (Hadoop; MPP, real-time SQL, NoSQL databases; and others1) means many more companies can handle large amounts of data. But how do companies analyze and mine their vast amounts of data? The cutting-edge (social) web companies employ teams of data scientists2 who comb… read moreApril 14 2010
I collected some interesting stats from today's presentations at Chirp. Over a thousand people attended the conference and the numbers below attest to how vibrant the Twitter platform is. Today's announced API enhancements will make the Twitter ecosystem even more interesting: 1. # of registered users: 105,779,710 (1,500% growth over… read moreGames & Entertaiment account for Half of all iPad apps
April 09 2010
98% of apps in the U.S. iTunes app store label themselves as "iPad compatible", but most were written for iPhones or iPods. One week into its launch there are about 2,300 apps† that run only on iPads. Measured in terms of number of unique apps, Games and Entertainment account for… read moreGoogle's New Marketplace Has over a Thousand Apps
March 17 2010
One week† into its public launch, the Google Apps Marketplace has just under 1,500 (enterprise) apps. Combined with Salesfore.com's app exchange (also with over a thousand apps), enterprises interested in moving to cloud apps have an increasing number of software tools to choose from. Popular apps (measured in terms of… read moreTwitter Users Most Followed by the Web 2.0 Summit Crowd - O'Reilly ...
October 28 2009
I took the set of users† who posted tweets containing the hashtag #w2s and determined who those users followed. Unlike the list of the most followed users in all of Twitter, the list isn't dominated by celebrities... read moreRecent Posts | All O'Reilly Posts
Buy Now and Save
Use discount code: OPC10

All orders over $29.95 qualify for free shipping within the US. See details.



