The Austin Hadoop User Group

Tuesday, April 16, 2013

Notice: This site and community has moved to The Austin Hadoop User Group at meetup.com

Wednesday, February 29, 2012

2012 - March Meeting

Our next meeting will be on Thursday the 8th of March at Bazaarvoice from 6:30 - 9pm.

As usual we'll have plenty of Pizza, Beer and Tacos. This event is free and open to everyone. We have lots of people that come that are new to Hadoop and Big Data.

Agenda

6:30 - 7:00 : Meet and Greet (Austin's Pizza, Quality Beer and Tacos)

7:00 - 7:30 : "IronFan" - Flip Kromer, CTO, InfoChimps - @mrflip

Joe will be presenting on IronFan which was recently covered on GigaOm and Wired Enterprise. IronFan is a systems provisioning and deployment tool which automates not only machine configuration, but entire systems configuration to enable the entire Big Data stack, including tools for data ingestion, scraping, storage, computation and monitoring.

7:30 - 8:15 : "Building the Social Business Index" - Jeremy Hanna, Jacob Perkins and John De Oliveira from The Dachis Group - @jeromatron @thedatachef @johndeo

Jeremy, Jacob and John will be presenting on building data products and how they designed and built The Dachis Group's Social Business Index. The Social Business Index analyzes signals from over one hundred million social sources globally and analyzes the performance of the largest global companies and thousands of those companies' brands.Through the use of natural language processing, semantic analysis, and machine learning algorithms, Dachis Group has built a machine learning engine based on their pacesetting Social Business Design framework and leveraging their experience in hundreds of social engagements and executions as the world's largest social business strategy organization.

8:15 - 9:00 : " Scalable Data Pipelines" - Josh Wills, Director of Data Science, Cloudera - @josh_wills

Most of the interesting applications of Hadoop, from building machine learning models to populating business intelligence dashboards, involve running a series of dependent MapReduce jobs. Over the past year, a number of libraries for JVM languages have emerged that make it easy to create pipelines that are testable, maintainable, and scalable. In this talk, we'll walk through the process of building a data pipeline in Crunch, a Java/Scala library for building pipelines that operate on complex data types, covering everything from the initial choice of data format through testing, debugging, and scaling.

Location

Bazaarvoice is located here (Map)

When you arrive at Bazaarvoice, you can park in any of the open spots outside the building or in the garage next to the building. The meetup will be on the second floor. Take the elevator up to the 2nd floor and then follow the signs to the meeting room.

Sponsored By

Monday, January 16, 2012

2012 - January Meeting

Our next meeting will be on Thursday the 26th of January at Bazaarvoice from 6:30 - 9pm.

As usual we'll have plenty of Pizza, Beer and Tacos. This event is free and open to everyone. We have lots of people that come that are new to Hadoop and Big Data.

Agenda

6:30 - 7:00 : Meet and Greet (Austin's Pizza, Quality Beer and Tacos)

7:00 - 7:25 : "Small and Big Data at Bazaarvoice" - Alex Pinkin, Lead Developer Data Infrastructure, Bazaarvoice- @apinkin

Alex will be presenting on the data infrastructure powering over 5 Billion page views per month. The presentation will cover the use of RDBMS, NoSQL and Hadoop at Bazaarvoice.

7:25 - 7:45 : "SOLR Power FTW" - Robby Morgan, Lead Developer, Bazaarvoice

Robbie will present a case study from a large scale deployment of Apache SOLR, an open source Lucene based search platform.

7:45 - 8:30 : "Gathering, Quantifying & Visualizing Private Equity Data" - Steve Watt, Hewlett-Packard - @wattsteve

Steve will present on a "Data Science" project where he mined the web, quantified the amount of venture capital invested in the USA from 2005 to 2010 and analyzed its distribution across sectors and by location. It will be presented in a tutorial format covering the use of Apache Nutch for web crawling, Apache Pig and Hadoop for analyzing the data and Protovis/D3 for creating the choropleth visualizations.

For more detail please see my blogposts here and here

Location

Bazaarvoice is located here (Map)

When you arrive at Bazaarvoice, you can park in any of the open spots outside the building or in the garage next to the building. The meetup will be on the second floor. Take the elevator up to the 2nd floor and then follow the signs to the meeting room.

Sponsored By

Sunday, June 12, 2011

2011 - July Meeting

Our next meeting will be on Wednesday July the 13th at CoSpace Austin from 6:30 - 9pm.

As usual we'll have plenty of Pizza, Beer and Tacos. This event is free and open to everyone. We have lots of people that come that are new to Hadoop and Big Data.

Agenda

6:30 - 7:00 : Meet and Greet (Austin's Pizza, Quality Beer and Tacos)

7:00 - 7:25 : "Dexy Demo" - Ana Nelson, OpenGamma - @ananelson

Ana is the author of Dexy, an open source tool that allows one to create elegant, reproducible documents which include graphs and analysis from raw data and code.

7:30 - 8:10 : "Data in the Digital Age" - Kaitlin Thaney, Digital Science - @kaythaney

Kaitlin will be presenting on the data ecosystem and discuss how the way we interact with data is changing in the digital age and the challenges and opportunities presented. The presentation will have a focus on scientific research.

8:10 - 9:00 : "How Bitly scales data storage and processing" - Hilary Mason, Bitly - @hmason

Hilary will be presenting on some of the unique data storage and processing challenges they encounter at Bitly and how they address them. She will also be covering some of the algorithms that are used at Bitly.

Location

Directions to CoSpace are available here

Sponsored By

2011 - June Meeting

Our June Meeting will be another joint meeting with GeekAustin and the Cassandra Austin Meetup. Tyler Hobbs from DataStax will be speaking on Brisk at Pervasive on June 20th from 6-9 PM.

Please sign up on EventBrite so we can get an idea of how many folks are attending.

DataStax Brisk integrates Apache Hadoop with Apache Cassandra which provides you with the ability to run Map/Reduce Analytics workloads on top of the Apache Cassandra real-time Columnar Data Store.

Other Big Data Related Events in June

June 27th : The Hive Happy Hour in Santa Clara, CA

June 28th : Big Data Camp in Santa Clara, CA

June 29th : The Yahoo Hadoop Summit in Santa Clara, CA

Location

Pervasive Software - Map

12365-B Riata Trace Parkway Austin, TX 78727

Thursday, April 7, 2011

2011 April Meeting

Our April Meeting will be a joint meeting with GeekAustin who is hosting a "Machine Learning with Mahout" meetup at J.J. Pickle Research Campus on April 21st from 7-9 PM. Please sign up on EventBrite so we can get an idea of how many folks are attending.

Apache Mahout provides a set of machine learning libraries that run on top of Apache Hadoop. This is interesting in that it allows you to gain deeper insights into your data (Clustering, Classification) when contrasted with your typical text extraction analytics that are run with Pig or Hive.

Location

Big Tex Auditorium
J.J. Pickle Research Campus
10100 Burnet Road
Austin, TX 78758

Friday, March 4, 2011

2011 March Meetups

March is here and that means one thing... SXSW ! We won't be having an official meeting this month due to the fact that we have SXSW Interactive and The Data Cluster Party. The sessions/events below are all organized by Austinites, so please come out and support them. There are other big data sessions but I wanted to highlight the ones that were local. Our regular meetings will be picking up again in April.

Note: The Data Cluster Party does not require a badge, but the rest of the events do.

March 12 - Big Data, Hadoop and You: How to rock out with Big Data - http://www.facebook.com/event.php?eid=195893600434618

March 13 - Data Cluster Party - http://dataclusterparty2011.eventbrite.com/

March 14 - Big Data For Everyone - http://schedule.sxsw.com/events/event_IAP7475

March 14 - A Billion Columns? No problem: an Introduction to the Cassandra Database - http://schedule.sxsw.com/events/event_IAP7965