CZ:Special Topics 2010/Template article

A mashup is an integrated application created by combining data and services of multiple applications. On the web, "mashup" typically refers to the combining of geographical location information with a service such as Google maps or Microsoft Virtual Earth. The term has achieved widespread usage in describing this kind of web application since Google introduced its public Google Maps API in 2005. Though not restricted to the web, mashups have become an increasingly popular internet paradigm, leading to the creation of a variety of web based mashups. Tim O'Reilly lists Mashups as one of the Web 2.0 technologies. .

Before the availability of the Google maps API, mashup-like applications were being developed mainly with proprietary, complex geographic information systems (GIS) software packages. Such GIS applications have been available commercially since the 1980's, but it is only since the early 2000's that non-computer-experts have had the tools that allowed such combinations of maps and user-specific data to proliferate on the web. Mashups that do not use spatial or mapping data are also possible, but the mapping application is likely the first kind that comes to mind when one says "mashup" in the context of the world wide web.

Overview
Mashups are a convergent technology of sorts. Convergence of communications is a recognition that a variety of communications can run over the same Internet Protocol-based infrastructure, without building a separate infrastructure for each service. From the standpoint of communications engineers convergence is not necessarily about the user interface or the merging of technologies. That may be a beneficial side effect, but it is not the focus of the groups concerned with convergence, such as the Multimedia Forum. To a communications engineer, mashups are not clearly distinguished from a multi-windowed interface, or even a structured dashboard, presenting multiple services to the end user.

Thanks to Google Maps, Internet mashups have become popular in recent years; however the concept of mashups has been around for a long time in a context completely unfamiliar to typical Internet engineers. Before internet mashups became popular, mashups referred to music. Music mashups are the fusion of two or more songs by overlaying their tunes and lyrics to form a new song. They have been around since the beginning of recorded music. Before this was a popular buzzword, this was called multi track recording and rerecording, where the Beatles made notable advances. Today, music mashups have been extended to incorporate videos and are still prevalent in the entertainment industry. Websites like http://www.mashup-charts.com/ are used to rate amateur music mashups.

The general purpose of mashups can therefore be stated as, merging or overlaying entities in the hopes of obtaining a comprehensive product which will be more useful or interesting, and which will present a broader perspective than the individual entities on their own.

Before music mashups, the concept of merging entities for a specific purpose was used in epidemiology. They were not referred to as mashups at the time, but served a similar purpose as modern day internet mashups. John Snow (1813 - 1858) was a British Physician who is often considered one of the founders of epidemiology. Prior to the early 1800s, experts in the medical field believed that cholera was air borne. John Snow refuted that belief and published an essay in 1849 called On the Mode of Communication of Cholera expressing his views on the subject. Without a concrete way to prove his assertions however, he did not make much headway in convincing others.

In August 1854, a tragic outbreak of cholera occurred in Soho. By plotting the outbreaks of cholera on a map, John Snow was able to identify a water pump as the source of the disease. After having the handle of the pump removed, the cases of cholera immediately began to diminish. This incident helped to prove that cholera was transmitted by the consumption of water from the pump, and got into the body through the mouth. Today, cartographic data is studied by various research institutions like the Centers for Disease Control and Prevention (CDC) and in academia as Geographical Information Science. It is used for the display, storage and analysis of spatial data. The concept of mashups therefore lends itself to various fields.

Following the trend in history, it is no surprise therefore that it took Google Maps, a geographical tool, to popularize web mashups. However, web mashups are not restricted to maps and geographical data. There are mashups that combine travel information, news, shopping information and social networking. Because mashups are created from already existing technology, they are restricted only by the technologies they emulate.

A tally of tags for mashups recorded on http://www.programmableweb.com indicates recent mashup trends.



Mashup popularity
Google mapsis the poster child of web mashups but it was not the first company to introduce internet mapping technology. Prior to it's existence, Mapquest and Yahoo maps dominated the scene. These sites were used mostly to get driving directions and for address lookups. In 2005, Google re-introduced internet mapping with a twist. It not only provided a resource for driving directions or address lookups, it extended the technology by creating an API through which users could create personalized Google map widgets. Yahoo and Microsoft followed suit with Yahoo Maps and Virtual Earth APIs respectively. Since then, mashups have received considerable attention.

An enabling factor of this growth is the fact that Web 2.0 is gaining traction in the enterprise. Web 2.0 embodies the belief that the World Wide Web is breaking away from its origins and evolving into the next stage of human interaction with a computer and the global community. The concept encourages collaboration, reusability, personalization and standardization, which are properties that have fostered the development of mashups – one of the many trends in Web 2.0 (others include blogging, wikis, podcasting, etc). Gradually, the Web is becoming a distribution network of content and service as evidenced by mashups.

Another factor that has helped make mashups popular is, Web browsers have better Ajax support, which implies increased speed. Desktop applications would be much more attractive to businesses than Web based services if the latter are extremely slow.

Also, open source software has grown more popular. The implication is that many more people are getting involved in developing contents that can be used by the general public.

Architecture
Logically, a mashup can be viewed as being composed of three different participants, which are usually physically separated too. They are
 * Content providers
 * Mashup site
 * Client's browser

Content Providers
This refers to the providers of the content being used in a mashup application. The sources of content are disparate and often controlled by different parties. The most popular ways of exposing content for retrieval are
 * APIs e.g. Google Maps, Amazon, eBay
 * Information feeds e.g. Really Simple Syndication (RSS)
 * XML/JSON over HTTP and web pages

API
An Application Programming Interface (API) enables the creation of a web-based mashup by providing a means of gaining access (rules and procedures) to an application or content e.g. Google Maps. This allows for compatible software. APIs should be made as simple as possible if their use is to be encouraged.

APIs can be
 * Proprietary, in which case their use would require the payment of a fee and the signing of a license agreement.
 * Open , hence available to anyone to use for free. However, there might still be binding terms and conditions, a limit to the number of calls that the mashup may make to the provider.

A Web API is usually accessed via HTTP by making a call to some script on a remote server.

Popular websites that offer open APIs include Amazon.com, AOL, eBay, Google, MapQuest, MSN, Shopping.com, UPS.com, US Postal service.

The contents of a web site that lacks an open API can still be accessed via a process referred to as screen scraping, in which unstructured text is pulled from a website.

An example of JavaScript code used to display a Google map of the Philadelphia area is given below. The example makes use of the Google Maps API. The map can be used to show the location of apartments in a web site that offers such apartment listing services.

Apartment Listing  

function initMap {               var phillyMap = new GMap2(document.getElementById("phillyMap")); phillyMap.addControl(new GLargeMapControl); phillyMap.addControl(new GMapTypeControl); phillyMap.setCenter(new GLatLng(39.953333, -75.17), 12); phillyMap.setMapType(G_NORMAL_MAP); }



To use Google Maps, you need to request an API key from Google, which is a relatively easy process. You load the Google Maps API in your website using an HTML tag. The url specified in the src attribute of the tag points to the location of the JavaScript file that includes all of the symbols and definitions you need for using the Google Maps API. You should replace the key in this attribute with the key that was assigned to you. The key in the example above is ABQIAAAAni0_HyJTfcbhvyNrGunJdhQuvnbIrZPj1yxxzdYDS-DWipzTChQL8GeWLFZ2SA-_q3wsWjD16IYlVg.

The HTML  tag acts as a placeholder for the map on your web page. It also specifies a size for the map and assigns itself an identity, phillyMap.

GMap2 is a class that represents a map - we create an instance of this class using the new operator to define a map and assign phillyMap that we described in the previous paragraph as a container for our map.

Next, we need to initialize our map using the setCenter method which takes a GLatLng coordinate and a zoom level as parameters. The GLatLng is an object which specifies the latitude and longitude to be used as center point for the map. I have supplied the coordinates of Philadelphia above.

Information Feeds
Information feeds are a common mashup source because they are in a standardized form and are readily available on the internet. Web feeds like RSS and Atom are in XML format, therefore they are easy to parse. The parsed data is then used to create new information feeds or different mashup types. For example, a user can parse an RSS feed for the New York Times, run the extracted data through content analysis and generate a map with flickr photos relating to the locations referenced in the New York Times articles.

XML or JSON
XML is a standard for data transfer over the internet. For example, RSS and Atom feeds are inherently XML documents. XML is the basis on which AJAX operates. All data that is stored on the web in XML form can be retrieved, parsed and analyzed. After analysis, users can create intelligible mashups from resulting data. XML in it's bare format is used when there is no tool providing data in a packaged form (e.g RSS) to the user. In such a case, the user will most likely scrap the web (called screen scraping) for information. Unlike with APIs and information feeds, a relatively higher level of programming expertise is required to analyze raw XML. Also, if the source content changes (which happens often), the code written for extraction of data breaks since it was dependent on the presentation of the data.

JSON is another data interchange format which can be used to create mashups. Like XML, if data for the mashup is in JSON format, it will be retrieved, parsed and analyzed before it is used

In general, any structured data interchange format can be used for mashups, as long as both provider and user understand the format (i.e., syntax) and meaning (i.e., semantics) of said data. Two images, for example, can have a compatible syntax, but one could be a summer and one could be a winter photograph of the same area; climate could not be inferred by combining them in a mashup. A street map and a geographic photograph may both be in a compatible graphic format, but they are of different scales (i.e., magnification) and different coordinate systems.

Mashup Site
This refers to where the mashup is hosted. It is the application that is created by drawing on content from content providers. This application can be generated using client-side scripting such as JavaScript within the client's browser. Server-side technologies that generate content dynamically could also be used. Examples of these server-side technologies include Java servlets, CGI, PHP and ASP. Client-side scripting has the advantage of reduced communication overhead with the mashup server. After accessing the page, subsequent operations are carried out by communicating directly with the content provider. This is what Google Maps uses.

Client's Web browser
This is where users access and interact with the mashup.

Languages
Most mashups are developed using one or more of the following languages
 * 1) Javascript
 * 2) Ajax – Asynchronous Javascript and XML.  Ajax has the advantage that content in portion of a webpage can be updated easily without reloading the entire page, which has made the language become increasingly popular.
 * 3) HTML
 * 4) PHP

However, there are many tools available today that require no coding at all. Some examples are given in the mashup tools/editors section below.

Mashup Tools/Editors
A number of organizations have developed or are developing tools to allow users develop, deploy and share their own mashups. Some of these tools require substantial programming skills, while others require none at all. Some of these editors are listed below, there are many others available.
 * Pipes, a free online service released by Yahoo in February 2007, allows the creation of mashups using a visual editor. It requires no coding, you simply drag and drop data sources and operators onto the workspace and connect them.
 * QEDWiki which stands for Quick and Easily Done Wiki, is a wiki based mashup maker developed by the IBM Emerging Internet Technologies Group and aimed at building enterprise mashups.
 * Popfly was developed by Microsoft and launched in May 2007. It is a simple tool built on Silverlight technology and has a mashup creator which allows users to combine pre-built blocks to create web services.  The tool can be used by non-programmers.
 * Serena Business Mashup, which is a mashup tool designed for creating visual models of automated business processes and tying these to existing services or applications. The mashup suite provides a visual development environment for building a model of the mashup, and then connects the output to back-end systems within the firewall, publishing the results to a Mashup Server, or publishing to the cloud by subscribing to Serena's software-as-a-service (SaaS ) hosted offering.
 * Google Mashup Editor developed by Google is still being tested and access to the software is limited to a small number of developers. It allows the creation of mashups using popular technologies such as HTML, Javascript, CSS and XML.
 * WebCenter Suite developed by Oracle is a tool used by developers to build mashups.

Consumer Mashups
Mashups that combine visual elements and data from multiple sources.

An example of a consumer mashup is http://www.housingmaps.com which gets rental listings from Craigslist and displays these listings on a Google Map by using Google Maps' API. The results displayed below are searches for 3+ bedroom apartments in Philadelphia that cost between $1000 and $1500.



Data Mashups
Mashups that combine multiple data sources (e.g. RSS feeds) into a single data source.

An example of a data mashup is the travel site http://www.kayak.com. Kayak is a comprehensive travel search engine which gets its data from over 100 other travel sites. Kayak therefore does not sell directly to customers but serves as a portal through which customers can be directed to travel agencies that can serve their needs. The results displayed below are searches for flights from Philadelphia to New York. Kayak displays flights from http://www.cheaptickets.com and http://www.orbitz.com.



Business Mashups
Similar to consumer mashups, but solve business problems. Many enterprises are embracing mashups for various reasons. Some need their software systems to change often to keep up with the rapid rate at which their business needs change. Such businesses find mashups an attractive solution – they make use of available components that have been developed and tested, and can launch their software in shorter time as compared to if they had to build from scratch. Some other businesses do not have the resources or competences required to develop some applications and thus are eager to incorporate such.

Designing Mashups for businesses
Special consideration needs to be given to mashups developed for businesses, especially businesses with sensitive data. With the plethora of services that could serve as mashup content available on the World Wide Web, some concerns arise. First, designing your enterprise’s systems to allow incorporation of services and applications outside your enterprise. Second, designing your firewall to allow you to access these services and applications without compromising your security.

Mashup preparation can be divided into six stages
 * Requirements. In the face of an overwhelming number of applications available as content, there is the tendency to incorporate as many as possible.  However, there is the need for proper planning and identification of the applications crucial to the service that your business aims to provide.  How well these applications fit into your existing architecture is also very important, as is how much change is needed for them to fit in.
 * Design. A thorough design is important to decide the standards of the system to be produced, the various interfaces involved and how they should be exposed, the plan for sustainability and scalability and the management of the interfaces and services made available on the Web.
 * Governance. This involves the creation and enforcement of design time and run time policies.  The primary concern is the management of the service being provided and the composite services.
 * Security. Protective security policies and technology must be in place.
 * Deployment. This involves the selection of the appropriate enabling technology and standards, which should enhance security.
 * Testing. as with all software, extensive unit and integration tests must be carried out throughout the development life cycle.  Compatibility and portability are the primary concerns here, especially since you do not have control over the quality of the application that you are incorporating.

Advantages of Mashups

 * 1) Mashups allow for the reuse of existing applications.
 * 2) They also allow for rapid application development.
 * 3) Development of a mashup does not necessarily involve extensive IT skills.
 * 4) The associated cost of application development is greatly reduced.
 * 5) Applications are better tailored to users' needs since the users can now incorporate content that they were unable to develop themselves due to time or resource constraints.

Disadvantages of Mashups

 * 1) A user might have no control over the quality and features of the content.  The continued support by the owner of the mashup service or API cannot be guaranteed.
 * 2) Even if reliability of the content source is established, a potential problem is scalability.  For example, can the providers of the map you are incorporating support the traffic that your site would generate in two years?
 * 3) The integrity of the content can not be guaranteed either.
 * 4) Most data sources are not yet built on a service-oriented architecture (SOA), so drawing in the information is not easy.  Although mashups can be created without SOA, they are greatly facilitated by it.
 * 5) Only software that can be accessed with a web browser can be included in a mashup, which implies that installed desktop applications cannot be easily incorporated in a mashup.
 * 6) Security of these contents is another issue, especially for enterprises with very sensitive data.  They need to be sure that the contents they are incorporating do not pose a security threat in any way.
 * 7) There are no mashup standards, this only makes it increasingly difficult to design and implement security mechanisms.