boo-box blog

New infrastructure of web servers on boo-box system

Posted by thyagoliberalli

When you work at a company like boo-box, you must have a team capable of developing extraordinary technological solutions in a short amount of time.

In the past few months, boo-box has undergone phenomenal growth:

grafico_en

As you can see, ad impressions rose by 85% in the last month alone! To handle this growth, we had to adapt and upgrade our systems.
Before going into further details, let me sum up our strategy: we constantly perform check-ups, use cache as much as possible, and take really good care of our database :)

Going into details

In order to stand out as an online advertisement genius, you have to follow a few basic precepts:

  1. Speed: users are not going to sit around waiting for an ad to load.
  2. Scalability: viralizations, accelerated growth, etc. If your application is not highly scalable, you should either find another job or be ready to give up on sleep :)
  3. Monitoring: monitor everything! Make your monitor check-ups automatic whenever possible.
  4. Procedures: create, and follow, procedures for various situations that may arise.

With this in mind, our Ninjas refactored the heart of the system: the Application Layer.

The goal was to test the limit of language and platforms nginx + Ruby + Merb + MySQL, with the most advanced algorithms and topology.

To this end, we changed the logic, synchronicity and cache of the key parts. Some topology was changed from the last post “The infrastructure of web servers of the boo-box system”:

infra

We also implemented monitoring methods that gave us greater control over the application.

Server Names and Release Versions
As we explained in our last post about our infrastructure, our servers are named after characters from the anime cartoon Dragon Ball Z, such as Korin, Kami, Cell, Raditz, Trunks, Goku, and Gohan, among others.

Lately, we’ve also started to name our releases — this time, after classic films. The first two releases were:

  1. Metropolis, named after the 1927 Fritz Lang sci-fi classic;
  2. Hannibal, 2001 box-office hit.

Cluster Application
This layer is made up of all the servers that process our business protocol. These are the servers that decide which ads will appear in the windows, what happens when a user clicks on an ad, and what to do with the data registered by a new publisher in the system.

This layer underwent the most changes. The refactoring focused on these four essential points:

  1. Logic: we optimized the processing algorithm, making it more efficient for this new phase of the system.
  2. Memcached: we raised the use of the memcached by 80%. Always respecting our business protocol, we were able to cache large amounts of information, thus optimizing response time. This change directly affected the application database layer, because it allowed the application to consult this database considerably less.
  3. Log registry: we made log registration asynchronous. This means that the application layer no longer directly communicates with our MySQL logs. Instead it beanstalks the log, (Beanstalkd) which, later, will be consumed by a Ruby program, which in turn will be responsible for storing it in the data base.
  4. Number of Workers: we performed a few beanchmarks that indicated the ideal number of workers. Before, we used to use a complex logic that measured the total processing capacity of each server to find the ideal number of workers, which, at the time, was very high. However, our beanchmarks indicate this number shouldn’t be very high. The higher the number of workers, the more server cores will be busy and therefore the slower their individual processing power will be. However, if the number of workers is too low, the load balancer might not find available workers, even if they process rapidly. In our case, the ideal number is between 20 and 30 workers.
  5. We also tried running the run_later method of the MERB. But we came to the conclusion that we would have greater managing power using the Beanstalkd method. Besides which, Beanstalkd also has a persistence aspect.

Cluster Log and BI
All requests made on boo-box are registered in our Cluster Log. Windows shopped, ads clicked, activities on partner sites; everything is registered.

So as I said at the beginning of the post: take good care of your database! It could deprive you of sleep :)

In this layer, it was important to combine tuning with asynchronous processing:

  1. Tuning: pay close attention to these configurations: key_buffer, max_connections, table_cache, thread_concurrency, innodb_buffer_pool_size and, of course, to their index structures on the charts. We also introduced a new partitioning system, from MySQL, which optimizes the storage structure on the HD (MySQL Reference Manual – Chapter 18. Partitioning)
  2. Asynchronous processing: as we explained under the Cluster Application, the application no longer communicates directly with the logs database, but instead with the beanstalk queue. Therefore, one works independently of the other, which greatly improved the response time in our shop windows.
    This optimization also affected our BI, which is responsible for processing all the data received that will be used at a later date (reports, measurements, projections, etc.)

Cluster data source
This layer is responsible for storing specific system information.

With all the upgrades we performed on the other layers, (especially the increased used of the Memcached), there was no need to make great changes here. Quite the contrary: we reduced the number of servers on this layer.

Cluster products cache
A large part of the ads displayed on boo-box windows are products offered by partnering e-commerce. Since product information does not have to be stored for a long time, we made a temporary cache for the data.
With the changes to the Cluster Application, we were able to considerably drop the amount of servers in this layer.

Remember: every time you use CouchDB or MySQL, it is important to have rapid storage machines to achieve a better I/O performance. Therefore, Cloud Computing may not be a good option for this layer.

Load balancer and Static Files
The servers receive the user requests and direct the load to one of the application servers.

On the web server nginx, we used the round-robin algorithm. When we raised scalability, it turned out this wasn’t the best logic for us because when one of our workers was slow to respond, it affected all of the requests coming in. So we switched to the Fair module, which permits nginx to send requests to the application servers that have the least amount of processing at the moment.

Monitor
To guarantee the safety of the systems, it’s important to constantly monitor several layers and all the applications involved. However, this requires more than tools. It is essential to install manual and automatic procedures.

Right now, we use tools such as: Munin, Monit, Webalizer, Pingdom and our own tools.

We have hourly, daily, weekly and monthly checks, which generate data based upon which we make important decisions about possible changes to or maintenance of our structure.

It is important to note that making these procedures and check-ups automatic is ideal, but shouldn’t be imperative. Many technicians and infrastructure managers neglect regular check-ups because they hadn’t automated them, and ended up causing a lot more problems than solving them. So if necessary, introduce constant manual check-ups. Discipline is your biggest ally.

Conclusion
To build a highly scalable system you need more than technological solutions.

Effective management will make all the difference. Without getting lost in bureaucracy, you must create well-defined work procedures and practices that will allow your team to combine effective long-term solutions (slow implementation) with short-term solutions (rapid implementation), and deal with any problems that might arise from such a complex system.

Who makes it happen

Leave a Comment

boo-box system’s web server infrastructure

Posted by boo-box team

During the last couple of years, a series of architectural patterns for web design software have consolidated and become popular through frameworks[bb] , which facilitate the development and maintenance of these systems. Simultaneously, servers had increased accessibility and reduced costs. Creating a web project became a far easier, faster and cheaper thing to do. However, for the project to succeed, there is still one obstacle that is not easy to overcome: scalability.

boo-box has successfully conquered this obstacle and currently has its infrastructure set up in layers capable of being interwoven horizontally, and that are sufficiently robust to serve thousands of requests per minute. Throughout this post we will present some of the practices currently in use to guarantee a good performance from Communications Systems for Social Media such as Ruby, MERB, CouchDB, Thin, Nginx, Beanstalkd. Screenshot do glTail rodando no server Kami

The boo-box infrastructure

Our infrastructure is a combination of different software. Open Source software that have been consolidated for years, such MySQLalong with other more recent ones, which generally speaking have fewer resources, are simpler, or simply more adequate to the specific case.

It is important to note that this post reflects the current infrastructure (May 2009). The rate of new Publishers joining the System and the growth in the number of visits by the Publishers already in the system leads to weekly changes in the structures of the servers, adding new computers or modifying application components. Infraestrutura de servidores web boo-box

Server Identification

Naming servers is always a difficult decision for the development team. Some like to use planets for names, elements of the periodic table, the Greek alphabet. We like to use characters from the anime Dragon Ball Z.

Static Files

Static files are those that do not depend on server processing, such as images, CSSs[bb], JavaScripts[bb]. At boo-box, they are located in a sub-domain that leads directly to a dedicated server, thus relieving the load from our load balancers.

Our static files are previously uploaded to the RAM memory, increasing the system’s response time. This system runs on the HTTP server nginx.

Load Balancers

Load balancers are the servers that receive user requests and redirect the load to one of the application servers. At boo-box there are two load balancers, both linked to DNS for boo-box.com. The load balancer must respond very quickly, and therefore it does not process business rulesets. Each of the servers runs on the HTTP server nginx.

Application clusters

The application cluster is composed of the group of servers that process our Business rule. It is this cluster that decides what ad will be shown in the window, what happens when there is a user click, and what to do with the data of a new Publisher registering in the system.

Each server runs roughly 100 instances do Ruby framework MERB using the HTTP server Thin (no, we do not use RubyOnRails :).

Database Cluster

When information needs to be registered in our system, such as registering a new Publisher, changes in preference settings, blocking an advertiser, and so on, this information is stored in our database.

The boo-box database cluster contains the Vegeta (Master) server, which receives information to be recorded in the database, and also secondary Bulma and Ubb (Slave) servers, where the application servers read the information stored in the database.

As a reader pointed out, the character’s name is actually Uub, but the ninja who named the server was typing with his toes because our arms were raised gathering energy for a massive Genki Dama. Typing with your toes is hard to do, and he hit the wrong key, and the server name has stuck as Ubb :)

Splitting the writing up and reading of data onto different servers was one of the most efficient solutions we took during the last few months to improve uptime performance of the boo-box system.

The database cluster runs on a MySQL base, split between the servers writing up data and those reading it.

The true story of a company contributing to the Free Software community :)

We use the Sequel as ORM in the communication between the application and the database. When we needed to duplicate the database, recreating the Master and Slave structures, the Sequel was unable to read Slave, no matter how carefully we followed protocol.

We got in touch with the ORM developers on the IRC channel, and after running tests for a few hours; we were able to solve the problem working together.

This is only the most recent example in which boo-box has contributed to the Free Software[bb] dcommunity in different ways. We are active in this area because we truly believe that our technology here at boo-box is a fruit of the labors of Open Source.

System log cluster

All operations occurring on our servers are recorded in the system log. Which windows were seen, ads clicked, actions taken on partner websites: every action is recorded on our log.

From time to time we process the log raw, we generate statistics and we create a backup. Thus we free up theo log raw to receive more data without losing past data, and simultaneously keep a good system performance.

We use Analogger as a log component. However, problems with performance and scalability led us to look for another solution. Currently the system log is being transferred to a MySQL structure, and being split between Master (data writing) and Slave (data reading) servers.

Cache Products

The majority of ads displayed on boo-box windows are for products commercialized through eCommerce. As product information does not need to be kept for a long time we create a temporary cache of product information.

The cache lends the system solidity, and allows the system to continue functioning even if the eCommerce page is slow in responding or goes out of business.

Our cache products structure is composed of two main components:

Queue

We use Beanstalkd as a queue service for product requests. Each boo-box window has associated tags, and each new tag not yet cached is inserted into this queue. This queue will be consumed in the next few seconds, and thus will interfere with the application’s functionality.

There is an independent service that consumes the queue, going to eCommerce websites to search for products related to each tag and placing that data in the cache servers.

Product Cache cluster

Each server that stores product data runs CouchDB, a database of JSON documents.

The main resource consumed by these servers is HD space – they occupy hundreds of gigabytes in just a few days, especially because of the diverse nature of the offers displayed on the boo-box system; there are literally millions of different products.

The result

Screenshot do glTail rodando no servidor Korin

During the last few weeks boo-box’s response time and uptime has visibly improved due to the above-mentioned measures taken. These measures are the result of the hard work and experience of our ninjas.

If you have any suggestions or questions, please feel free to comment; the suggestion box is there for your use :)

Posted by Marco Gomes and Mauricio Maia.

Comments (2)

Marco Gomes at the Prêmio Jovem Brasileiro – Young Brazilian Award – InterCon and Peixe Grande

Posted by boo-box team

Marco Gomes receives the Prêmio Jovem Brasileiro – Young Brazilian Award

Marco Gomes, co-founder of boo-box, received the Prêmio Jovem Brasileiro – Young Brazilian Award, this Monday, September 22nd 2008, for his work in the corporate arena.

The Prêmio Jovem Brasileiro, presented by Serginho Groisman, is conceded based on merit and active participation of the youth in society. The nomination is awarded after an in-depth selection, performed by a committee formed by journalists, columnists, critics and research conducted with the young users on the event’s official website.

Marco Gomes, the Director of Technology at boo-box, receives this award for his performance as an entrepreneur, helping transform what was a late-night project into an innovative and successful company, well placed in the Internet Marketing sector.

Musicians CéU, Ivete Sangalo, Teatro Mágico and Negra Li, the films Cidade Dos Homens and Cidade de Deus, comedians Rafinha Bastos and Rodrigo Scarpa (o Repórter Vesgo) and athletes Sandro Dias and Karen Jones are some of the people who have been previously honored with this Award.

Marco Gomes at FF’08 (part of the InterCon)

Marco Gomes will be one of the lecturers at FF’08, an event happening within the Intercon 2008.

IMAGINE A CYCLE OF LECTURES IN RAVE FORMAT. HAVE YOU IMAGINED IT? WELL, FORGET IT. IT’S SO MUCH MORE. FF’08. THE INNOVATIVE IMASTERS INTERCON 2008 EVENT.

The FF’08 is a series of fast, provocative, and intelligent lectures pertaining to the digital revolution in Brazil, no beating around the bush. They will be 35 minute lectures, moderated by Luli Radfahrer, PhD in Digital Communication.

The event also will also witness the participation of other Brazilian Internet Greats, such as Cris Dias (our partner with Vilago), Fábio Seixas, Ariel Alexandre, Manoel Lemos and Frederick van Amstel.

Jury Member at the Peixe Grande Award

Marco Gomes was also invited to be a member of the jury at the Peixe Grande competition, run by the WebDesign Magazine, to elect the best Brazilian digital cases.

Leave a Comment

How to use the booboxfy plugin to earn more with boo-box

Posted by Marco Gomes

How to Install

Download the .zip file from the booboxfy plugin.

Unzip the file.

file.png

Using an FTP software, place the Folder in the WordPress plugins directory: wp-content/plugins

window.png

Head to the plugins administration page and activate booboxfy.

picture-45.png

Open the boo-box plugin settings page.

picture-46.png

First fill out your boo-profile email. Then edit the options by filling in your chosen partner, your affiliate code (in the case of BuscaPé it’s the site_origem code), and maximum limit of products for each widget. The width field is optional.

picture-47.png

Congratulations! Your booboxfy plugin is configured and working!

How it works

To insert offers for contextualized products into your content, open a Post edit page.

There are two ways to add offers to your content once you are on this page:

picture-48.png

Adding Offers Inserted into Content

Click on the “Monetize” button to see examples of images and videos that can be monetized trough boo-box.

wp5.png

Fill out the required fields with commercial tags, which will associate products to related images or videos. Sort multiple tags using commas to display more than one type of product. Click on the “Apply” button on the bottom of the screen.

picture-49.png

From now on, once and image or video is clicked on, it will display the products contextualized in your content.

boo-boxcom.jpeg

Adding a contextualized widget at the end of a post

Fill out the required “tags to boo-box widget” field with the key-words that will link to the best products related to your content. Sort multiple tags using commas to display more than one type of product.

picture-50.png

Save the post

wp5.png

A boo-widget with products related to your content will be inserted at the end of the post.

widget.png

See an example of a post using booboxfy: I’m part of the Revolution.

Leave a Comment

Softpedia 100% clean

Posted by Kamila Brenha

Boo-box has received the 100% Clean seal from the website Softpedia.

Softpedia is an online library that contains in its archives over 70,000 free software’s. All the library’s software’s are periodically tested and analyzed to guarantee the quality of their code.

Softpedia’s 100% Clean seal is just one more guarantee that the boo-box plugin for firefox does not contain viruses, malware or adware, and can be installed in any computer, without restrictions.

To earn this seal we performed important updates to our tagging-tool, such as:

a. improving security
b. correcting bugs
c. checking compatibility with Firefox 3

softpedia_clean.gif

boo-box always aims to improve the quality of the service it renders, and this seal is a just reward for the excellent wok done by the ninjas!

Comments (1)

boo-box, an example of successful business entrepreneurship!

Posted by Kamila Brenha

boo-box has been cited in several publications as an example of successful entrepreneurship that is increasingly conquering more and more devotees around the globe each day.

One of the first magazines to mention boo-box was Webdesign. In an interview with them, Marco Gomes outlined his reasons for leaving behind a successful career at AgênciaClick to invest in boo-box, launching a new phase in his own career. As a good entrepreneur, he states:

(…) I was seeking new challenges. When the boo-box project began to gain momentum, I began to understand the scope of what we could really do based on the concepts we had created, and I was really, really excited by the prospect. I wanted to dedicate myself full time to the project.” The rest of the interview goes on to detail the company’s first steps, seeking out the investment partner Monashees, and following through to the goals and dreams boo-box foresees in its future.

Você S/A magazine also cited boo-box in their April 2008 edition, in an interview with Marcos Tanaka, the company’s CEO, who declares having a “taste for risk-taking” as he, just like Marco Gomes, left behind a successful career as a consultant to associate himself to boo-box, even knowing that this would mean radically altering his lifestyle. He concludes with:

(…) I wanted to be a pioneer, an entrepreneur, I wanted to create more.

Marcos Tanaka –CEO of boo-box

Marcos Tanaka –CEO of boo-box

boo-box’s most recent venture into printed press was in Exame magazine. In a recent article about the internet, the publication cites boo-box as one of the many examples of successful businesses. In the article, Marco Gomes presents relevant data on boo-box’s performance and asserts that it was the interest and coverage of specialized media that drew the attention of investor partners Monashees.

The Ikwa Team — a company in which Monashees have chosen to invest as well — was also cited in the interview as an example of successful business entrepreneurship.

The Ikwa Team — a company in which Monashees have chosen to invest as well — was also cited in the interview as an example of successful business entrepreneurship.

boo-box’s first television apparition was on the news interest segment Mundo S/A, on GloboNews channel, in a very interesting bulletin featuring Marco Gomes (founder) and Marcos Tanaka (CEO), who explained the emergence of boo-box in the market and as well as the company’s expectations for the future. Eric Acher, managing partner of Monashees, also spoke about their investment in the company, citing the reasons why boo-box drew their attention, and which of their characteristics were vital to the final decision of investing.

(…) we saw young entrepreneurs with enormous potential, who are entrepreneurs by choice. They could be following careers in more established corporations but instead chose to go out on a limb and face the challenge of following their own vision.

boo-box has been earning a name for itself, slowly working its way towards its main goal: Enriching your content!

Comments (1)