منتديات نورة الحربي



العودة   منتديات نورة الحربي > مواضيع منقولة من مواقع اخرى2

إضافة رد
أدوات الموضوع انواع عرض الموضوع
قديم 11-09-2012, 08:19 PM
المشاركات: n/a
افتراضي How I came to love big data (or at least ack**wledge its existence)

“Big data” is all the rage these days – there are conferences, journals, and a million consultants. Until a few weeks ago, I was k**wn to mock the term mercilessly. I don’t mock it anymore.

**t a “big” data problem

Facebook has a big data problem. Google has a big data problem. Even MySpace probably has a big data problem. Most businesses, including 37signals, don’t.

I would guess that among our “peer group” (SaaS businesses), we probably handle more data than most, but our volume of data is still relatively small: we generate around a terabyte of assorted log data (Rails, Nginx, HAproxy, etc.) every day, and a few gigabytes of higher “density” usage and performance data. I’m strictly talking about **n-application data here – **t the core data that our apps use, but all of the tangential data that’s helpful to improve the performance and functionality of our products. We’ve only even attempted to use this data in the last couple of years, but it’s invaluable for informing design decisions, finding performance hot spots, and otherwise improving our applications.

The typical analytical workload with this data is a few gigabytes or tens of gigabytes – sometimes big e**ugh to fit in RAM, sometimes **t, but generally within the realm of possibility with tools like MySQL and R. There are some predictable workloads to optimize for (add indexes for data stored in MySQL, instrument in order to work with more condensed data, etc.), but the majority aren’t things that you ordinarily plan for particularly well. Querying this data can be slow, but it’s all offline, **n-customer facing applications, so latency isn’t hugely important.

**ne of this is an insurmountable problem, and it’s all pretty typical of “medium” data – e**ugh data you have to think about the best way to manage and analyze it, but **t “big” data like Facebook or Google.

Tech**logy changes everything

The challenges of this medium amount of data are, however, e**ugh that I occasionally wish for a better solution for extracting information from logs or a way to enable more interaction with large amounts of data in our internal analytics application.

A few months ago, Taylor spun up a few machines on AWS to try out Hive with some log data. While it was kind of exciting to see queries running split across machines in a cluster, the performance of simple queries on a moderately sized dataset (a few gigabytes) on these virtualized instances wasn’t particularly impressive – in fact, it was much slower than using MySQL with the same dataset.

A couple weeks ago Cloudera released Impala, with the promise of Hive like functionality with much lower latency. I decided to give Hadoop-based SQL-like tech**logies a**ther shot.

We set up a couple of machines in a cluster, pulled together a few sample datasets, and ran a few benchmarks comparing Impala, Hive, and MySQL, and the results were encouraging for Impala.

Workload Impala Query Time Hive Query Time MySQL Query Time 5.2 Gb HAproxy log – top IPs by request count 3.1s 65.4s 146s 5.2 Gb HAproxy log – top IPs by total request time 3.3s 65.2s 164s 800 Mb parsed rails log – slowest accounts 1.0s 33.2s 48.1s 800 Mb parsed rails log – highest database time paths 1.1s 33.7s 49.6s 8 Gb pageview table – daily pageviews and unique visitors 22.4s 92.2s 180s These aren’t scientific benchmarks by any means (**thing’s been especially tuned or optimized), but I think they’re indicative e**ugh: on real hardware, it’s certainly possible to dramatically improve on tools like MySQL for these sorts of analytical workloads. With larger datasets, you’d expect the performance differential to grow (**ne of these datasets exceeded the buffer pool size in MySQL).

We’re still in the early stages of actually putting this to use – setting up the data flows, monitoring, reporting, etc., and there are sure to be many ups-and-downs as we continue to dig deeper. So far though, I’m thrilled to have been proven wrong about the utility of these tech**logies for businesses at sub-Facebook scale, and I’m more than happy to eat crow in this case.

came love data least ack**wledge came love data least ack**wledge
came love data least ack**wledge


How I came to love big data (or at least ack**wledge its existence)

رد مع اقتباس
إضافة رد

مواقع النشر (المفضلة)

أدوات الموضوع
انواع عرض الموضوع

تعليمات المشاركة
لا تستطيع إضافة مواضيع جديدة
لا تستطيع الرد على المواضيع
لا تستطيع إرفاق ملفات
لا تستطيع تعديل مشاركاتك

BB code is متاحة
كود [IMG] متاحة
كود HTML معطلة

الانتقال السريع

Bookmark and Share


الساعة الآن 07:28 PM

أقسام المنتدى

الشريعة و الحياة | المنتدى العام | مـنـتـدى الـعـرب الـمـسافـرون | أزياء - فساتين - عبايات - ملابس نسائية - موضة | مكياج - ميك اب - عطورات - تسريحات شعر - العناية بالبشرة | װ.. أطآيبُ ״ شّهية ]●ะ | اعشاب طبية - الطب البديل - تغذية - رجيم - العناية بالجسم | ديكور - اثاث - غرف نوم - اكسسوارات منزلية | صور - غرائب - كاريكاتير | {.. YouTube..} | منتدى الاسئله والاستفسارات والطلبات | مواضيع منقولة من مواقع اخرى | اخبار التقنية | مواضيع منقولة من مواقع اخرى2 | منتدى تغذيات | منتدى تغذيات الكلي | قِصصْ الأنْبِيَاء | قِسمْ الرَّسُول الكَرِيمْ والصَّحَابة الكِرامْ | القِسمْ الإِسْلاَمِي العَامْ | قِسمْ المَواضِيْع العَامَّة | قِسمْ الشِعرْ والشُعَرَاء | قِسمْ الخَوَاطِر المَنقُولَة | قِسمْ الصُوَرْ | قِسمْ القِصصْ والرِوَايَات | قِسمْ حَوَّاءْ | قِسمْ الطِب والصِحَّة | قِسمْ الصَوتِيَّات والمَرئيَّات الإسْلاميَّة | قِسمْ مَطبَخ صَمْت الوَدَاعْ | قِسمْ الدِيكورْ | قِسمْ كٌرَة القََدَمْ العَرَبِيَّة والعَالمِيَّة | قِسمْ المكْياجْ والإكْسسْوارَات | قِسمْ المَاسِنجرْ | قِسمْ الِسِياحَة والسَفرْ | قِسمْ أَفْلاَمْ الكَرتُونْ | قِسمْ الفِيدْيو المُتَنَوعْ | منتديات الرياضة | اخبار | مواضيع منقولة من مواقع اخرى3 | موقع اجنبي | كوكو فرنسا | كوكو هندية | كوكو روسي | كوكو دنمارك | كوكو ياباني | اخر اخبار العولمة | عالم الحياة الزوجية | عرض أحدث عمليات البحث الشائعة |

Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2018, KingFromEgypt Ltd Elassal
Adsense Management by Losha
This Forum used Arshfny Mod by islam servant