A Wandering Engineer's Remarks: 2011

11/06/2011

Wine Party with Opus One

I'm not into Opus One, which is a premium wine more expensive than its value. One of my friends nevertheless asked me to buy it last month. I remember he asked me Kistler chardonnay before. Buying premium wine bottles is a bit embarrassing. It's like buying Louis Vuitton bags during a business trip in Paris.

Anyway, I got one bottle, and donated it to a wine party yesterday.

Five wine addicts started with Pertois Moriset Grand Cru Brut, then tried to blind wine tasting a usual.
My score was two points. Here is a memo for my memory.

Opus One 2008, Cabernet Sauvignon 86%, Petit Verdot 8%, Merlot 4%, Cabernet Franc 1%, and Malbec 1% from Napa
Chambolle-Musigny Premer Cru Les Noirots 2003, Pinot Noir from Chambolle-Musigny
Chateau Calon-Segur 2003, Cabernet Sauvignon 60%, Merlot30%, Cabernet Franc 10% from Saint-Estephe,
Lisini Brunello di Montalcino 2005, from Montalcino,Toscana

Guessing Pino Noir was so easy, then I got one point. Nrinello confused me so much. It tasted like thin but elegant French Bordeaux. I took it for Chateau Calon-Segur. Opus one showed traces of clove and dark chocolate. Then I scored the second correct guess.

11/05/2011

Emergencce of Big Data found in Web 2.0

October 13, 2011

translated from my Japanese version in Wireless Wire News

From the author’s understanding, during the past decade, "social media was born through information shared instantly via human networks, and likewise, the era of 'information socialization' has arrived, in which scattered information is extensively collected, assigned value, and provided." When Tim O'Reilly proposed the concept of "What is Web 2.0" in September 2005, his insight was fresh.

One of the concepts is a database that grows in conjunction with users. As the amount of user data increases, services are enhanced, pulling in more user data. When data exceeds critical mass, a service with great value is created, against which other companies cannot compete. Typical examples are various Google services. Data is an asset; the principle asset of competitive power. O'Reilly said that "Data is the Next Intel Inside" and how to design places where data is generated is important. He showed the direction of Internet services in the Web era.

Around this same time, former Google CEO Eric Schmidt used the word "cloud" to describe a large, global scale server group, which came into being about a year later on August 9, 2006. Then, two weeks later, the Amazon EC2 service was introduced on August 24. This was not due to simple happenstance. The iPhone was launched in the US the following year, on June 29, 2007, and smartphones emerged that provide services in collaboration with the cloud. The introduction of Android, which followed the iPhone, clarified the functions of the device, that is, it generates real global data and is a cloud device.

In short, as the concept that data is an important asset for corporate activities was shown by Web 2.0, SNS using accumulated data, and Internet services such as media accumulation, distribution, and search, etc., have advanced dramatically due to the emergence of the cloud, cloud devices, and large scale database processing technology. "Information socialization", in which public services are provided on a global scale by linking as many as 100 million computers, created the value-added resource called "Global Brain," to borrow from Tim O’Reilly’s term. Examples include Google voice recognition, machine translation, Facebook, and Twitter data analysis recommendations, etc. The situation in 2011 can be expressed using the following formula.

Professor Maruyama, chairman of Japan Android Group, described the size of data accumulation and processing happening on a global scale as "Web-Scale" (2009).1

What is Web-Scale data? At this time, Web-Scale data includes server logs, sensor information, images/video, SNS data, blogs, and social graphs from Twitter and Facebook, etc. I call those items “Big Data.” In general, the characteristics of such data are that they are large-scale, their structure is not constant 2, and a quick response is required. Furthermore, much of the data has a historical meaning and thus, in many cases, it cannot be thinned out. The challenge is how to process Big Data. There are two aspects to this: algorithms and systems.

1. If it is not Web-Scale, it is not a cloud. A cloud is a system technology or platform that supports the Web-Scale explosion of information and expansion of users, and simply calling the data warehouse of a company a "private cloud" cannot be considered to be grasping its true nature.

2. Because the structure is not constant, one idea is to use NoSQL. However, since data modeling allows data to be handled as structured data, I think it is proper to handle data in SQL. Also, it is necessary to use NoSQL+Hadoop when you want statistical data, and to use SQL by placing importance on consistency when you want to reproduce data itself. I think the spread of Hadoop will depend on how popular the statistical use of Big Data becomes.

Designing the "place" for Big Data collection as part of a service

Machine learning technology that automatically learns useful rules and statistical models from data, as well as pattern recognition technology for identifying data from acquired rules or statistical models, have been researched to date. Pattern recognition researchers were interested in methodology and algorithms themselves, such as how to convert voice data to text, how to automatically enter handwritten text into a computer, and how to automatically follow images of the human face. They could easily write research papers if they conceived a good algorithm and simply conducted experiments implementing real data.

Until 2005, there was no Big Data. However, after 2006, the time came when people began trying to create and improve services by applying machine learning and pattern recognition to Big Data. You can look at the success of Google. There have been many case examples of "More Data beats Better Algorithms (MDbBA)". Google's auto driving demonstration is one good example. Without relying on combinations of complicated algorithms, the demonstration showed that a car was able to automatically drive from San Francisco to Los Angeles using collected map data and combinations of distance measurements and image sensors.

Automatic driving is an example of a service that exceeds a critical point involving machine learning when there is sufficient data. That being said, there are also many successful case studies for introducing machine learning frameworks. Machine learning is a framework in which the system, when there is correct data, automatically makes adjustments in order to obtain the appropriate answers. Therefore, if good learning algorithms are designed for a given problem class--in other words, a service--the data correction and performance improvement loops correctly.
Such machine learning frameworks are carried out by character recognition, voice recognition, machine translation, landmark recognition, or facial recognition and are provided as Internet services. For example, the level of machine translation has almost reached a practical usage level between related language families, such as English, French and Spanish. There is still room for improvement in translation algorithms, and this is a good opportunity for publishing research papers. But the important thing is to actually design a location for Big Data collection in which the machine learning framework is incorporated into part of the service. We must be aware that the pattern recognition study environment has changed significantly in the past decade.

Finding what comes next in facial recognition

In 2001, at a facial recognition conference, Paul Viola and Michael Jones gave a presentation about an object detector called boosting. The announcement of this algorithm was the moment when the service called facial identification entered the area of "More Data beats Better Algorithms (MDbBA)". After that, there were many algorithm improvements to greatly improve performance, but facial region tracking used in digital cameras was based on the method used in this announcement.

▼ Presentation at face detector conference[PDF]

Researchers are requested to do two things.

1. Develop the next MDbBA area by inventing new algorithms and methods. Become the second Viola and Jones.

2. In the MDbBA area, study platforms for Big Data processing, in addition to algorithms. Engineering is designing both the algorithm and the platform.

IT engineers and entrepreneurs are requested to do the following:

• Be the first to find out what is ready for commercialization in the MDbBA area and use it as a device for Internet service.

Silicon Valley is not the only outlet for Internet service innovation. The advent of the cloud, database processing technology, and cloud devices gave Internet design opportunities to all people. In general, it is important to think about devices that collect Big Data, but this is not limited to pattern recognition applications, since there are pattern recognition technologies scattered throughout Japan. By all means, I want them to adjust their focus.

The following five topics can be highlighted:

1. What is the next area for Big Data beats Better Algorithms? Achievement of character, voice, and facial recognition, and then the next thing is food? How about predicting the weather or consumer behavior?

2. To what degree can the latest algorithms, such as Bayesian modeling, scale for Big Data?

3. What are the facts and fictions about Big Data? How effective is it for social networking analysis?

4. How popular can Hadoop and NoSQL become?

5. What will the Global Brain look like in 10 years?

Big Data is used in a wide range of areas, including marketing, financial security, social infrastructure optimization, and medical care. This conference cannot possibly cover them all. However, above all, we would like you to focus on this conference as a place for exchange between pattern recognition researchers and the IT industry.

Minoru Etoh, Ph.D.
Director at NTT DOCOMO, Service & Solution Development Department, and President & CEO of DOCOMO Innovations, Inc. (Palo Alto, CA). Visiting Professor at Cybermedia Center Osaka University. He has been engaging in research and development in the multimedia communication and mobile network field for 25 years. He entered Matsushita Electric Industrial Co., Ltd. (currently Panasonic Corporation) in 1985 and researched moving image coding at the Central Research Laboratory and pattern recognition for ATR. He entered NTT DOCOMO in 2000 and wasCEO of DOCOMO USA Labs (Silicon Valley) from 2002 to 2005. Currently, he is in charge of development related to data mining, media understanding, smartphones, home ICT, machine communication, and information search.

1/10/2011

An Italian Bar near Shimbashi Station

Yesterday, I joined Android Bazaar and Conference 2011 Winter which was held in University Tokyo. After the conference, I stopped by an Italian Bar called "Italian Bar UOKIN" which is being operated by UOKIN restaurant group.

Someone might say this is not Italian but Spanish, maybe another one says this is not Italian nor Spanish but Japanese. That implies this restaurant serves a fusion cuisine from Italian, Spanish and Japanese.
The food is delicious, Aqua Pazza, Terrine, Carpaccio, Bagna Cauda, Smoked Oysters, Steamed Mushroom and Whelk with very affordable prices, say JPY500-800 (USD6-10) for each plate. They have a wide selection of wines also, you can try various reasonable prices.
This restaurant is very small and busy. Within 2 hours, you are asked to leave so as to welcome other guests. In that sense, the atmosphere is casual enough to rush into four or five small plates with a bottle of red wine.

Tokyo has many Italian and Spanish bars; I'd appreciate living here.

1/04/2011

New Year's Thoughts

The next decade will show another landscape which is completely different from what the last decade has shown to us. During the last decade, we have been experiencing the successful innovation of i-mode derived in 1999 from the combination of ACCESS micro-browser and DOCOMO’s always-on mobile packet network that had been already existed in 1998, years ahead GSM GPRS packet network development.

After having the success, rules of the game are changing from local to global, from walled-gardens to open markets, and from pay services to fee-on-free serves. Telecommunication operators need to act otherwise they are doomed. Given no status quo, a rolling stone gathers no moss (in American Interpretation).

The i-mode business is now hitting on a plateau on the first ‘S’ curve, thus DOCOMO R&D needs to jump onto a second ‘S’ curve.

(http://anphase.com/wp-content/uploads/2010/11/Innovation-S-Curve.jpg)

Where will be such a next ‘S’ curve? Here are some hints.

The essence of Communication is secure and reliable “Redirection” which appears at several levels: packet routing, directory services for session creation, search engine applications, and SNS relation such as Facebook. As we know, that essential redirection functionalities are now away from telecommunication operators’ monopoly, and being integrated to web services. Any Internet company who owns data enough to provide redirection functionality may replace the telecommunication operation with their own. Google voice, skype, and twitter are good examples. Only data with a large customer base for redirection has the power to win the game. Thus, operators’ leverage lies in two holds: 1. Scale of data, customer-base, delivery system and sufficient free cash flow, and 2. Trustability, in other word, reliability of redirection.

Redirection may have effects in the following two areas.

Machine Communication.　That means “non cellular phone communication” in a broader sense. The market expansion of wireless internet card shows very positive figures nowadays. Dr. Keiji Tachikawa, the CEO of NTT DOCOMO at that time in 2000, predicted that around 20 million cats and dogs in Japan would wear tracking devices linked to cellular networks by 2010. Although his prediction has not come true yet, his envisaged direction is valid enough to explain the current non-voice cellular application trends.Seeking non-voice communication services is the key to expand the revenue. Here the crucial point is that we design our business model carefully so as to promote our business to be a machine communication platform business. At this moment, most “Machine Communication” businesses have remained at just selling data communication cards.My colleagues at DOCOMO Service & Solution development department invented a digital photo frame service called Otayori Photo Service™ in 2009. (This system was adopted by Korean Telecom and exported to Korea. See http://www.nttdocomo.com/pr/2010/001492.html). That’s a good example to thinks about the next step. Any Machine communication platform should have a redirection function empowered by data or reliability.
Federation of Data Monetization. There are things multiple companies can accomplish working together that they couldn't do alone. That’s O’Reilly’s remark in “What lies ahead: Data” (see http://radar.oreilly.com/2010/12/2011-data.html) .Data mining, to which I devote everything from machine learning algorithm developments to high-performance massive parallel servers deployment at DOMO, is a tough process, since without mining the data we cannot find its value. Data mining involves very ironically self definition. If we are allowed to federate data among data-driven companies, we can reach to critical points where innovations emerge. Let us see what will happen. http://strataconf.com/strata2011 may give us some clue to figure out the future.　Data is the power. Federation is the key to reach the critical points.

That’s it about S-curve identification. In identifying the next ‘S’ and increasing its success rate, we need collaboration with external companies. The open innovation concept described by Chesbrough is essentially imperative to generate the next innovation. That includes, in general, licensing out of patents, collecting ideas, collaborating with other businesses, external R&D or consultants on development and so on.

(http://www.business-strategy-innovation.com/uploaded_images/Open-Innovation-Funnel-718893.jpg)

The term, open innovation, is easy to understand though, it is not easy to implement.

We need the culture transformation Our way of seeking next S-curves with open innovation scheme should move toward High Performance Culture. Those consists of

• Empowered people and cross-functional communication,

• Creating focused, collaborative, results-driven teams; energizes others,

• Integrating existing solutions without not-invented-here syndrome and adding new values,

• Facilitating the creation and communication of a compelling and strategically sound vision,

• Changing our mind from “technology push” to “collaborative innovation” with business departments (ultimately i.e., our customers), and

• Transforming our value from technology consultation to commitment of any necessary technical support until its service launch; that means we need to engage “concurrent engineering.”

Implementation of Open Innovation scheme requires culture transformation.

With the above consideration, I hope year 2011 will give us good developer experience.

1/01/2011

What Innovation entails, i.e., Neuer Kombinationen

Innovation is the source of new business. We need to understand what R&D efforts are needed to generate innovations in this era where the ecosystem is currently undergoing great changes driven by globalization.

Joseph Schumpeter, one of the leading economists of the 20^th century, defined innovation as “new combinations” (Neuer Kombinationen). He asserted that innovation refers to the new goods, new production methods, new markets, and new organizations that are borne out of these “combinations.”

This points to a new way to change society, particularly the process of setting new values and bringing about change by coming up with new combinations from existing elements. I learned from my mentors that development is based on existing technology, i.e., it is made up of validated technologies, and that including unvalidated technologies, i.e., those still in the research stage, is not allowed. An iconic example of using only validated, i.e., dependable, technologies is the Apollo Project, which aimed to land a man on the moon and bring him back to Earth. In that massive system development, only dependable technologies were used. The plan was implemented by combining existing technologies. But research is different. In research, the goal is to come up with innovations based on technological discoveries or inventions. If inventive technologies have foundational versatility, then combinations geared towards practical applications can be made from them later. This is the reason that in the development of scientific technologies, the wave of inventions and discoveries of technical elements and basic theories and the wave of systematization, although the latter did not come right after, came about through mutual interference.

NOTE: The field of communications is still experiencing the systematization wave.

Going back to innovation, the main thing is how to be able to come up with new combinations. As an example, in 1998, i-mode was born out of the combination of DOCOMO’s　always-on mobile packet network and microbrowsers.

You can either seek out combinations around the world, or, more importantly, come up with attractive platforms that the world will seek out. To do this, open innovation and concurrent engineering must be practiced. In open innovation, a new system is designed from a combination of your own company’s and other companies’ technologies. In such cases, even the operation of the system can be delegated to other companies. In concurrent engineering, development of technology is carried out in coordination with the operations department and with a constant evaluation of its relationship with the market and with other companies. In other words, market search and technological development, which includes research, should be done in parallel. These two are indivisible activities and must be linked with investment activities.

When I was fresh out of university and entered the industry, I learned this saying, “The more half-hearted you are as an engineer, the more conservative you become.” The average successful engineer would stick to his technology and work style of ten years ago and would not change and challenge himself to learn about new technical fields. He was content with the status quo. Since innovation is a process that brings about new values and changes to society, it revolutionizes the ecosystem. Around the world, convergence of terminal platforms and consolidation of network services on the Internet are advancing. There is no stopping the wave of innovation; there is the emergence of cloud computing, which enables a service delivery platform that can serve from millions to several hundred millions of clients, as there was the emergence of search services, internet shopping, electronic publications, social network services, and smartphone application stores. Ecosystems for these innovations did not exist ten years ago. Thus, innovation necessitates creation of new ecosystems rather than just adapting to them. In this era, there is no room for conservative and half-hearted engineers. It would be easy to find comfort in existing ecosystems, but this will not encourage innovation. I hope that we can have the readiness to challenge ourselves to create new environments and take up new technical fields.

To conclude, let me reiterate that what innovation entails is facing the challenge to pursue new combinations and to reconstruct ecosystems.

A Wandering Engineer's Remarks