library (computer science)

Enlarge picture
Illustration of an application which may use libvorbisfile.so to play an Ogg Vorbis file.


In computer science, a library is a collection of subprograms used to develop software. Libraries contain "helper" code and data, which provide services to independent programs. This allows code and data to be shared and changed in a modular fashion. Some executables are both standalone programs and libraries, but most libraries are not executables. Executables and libraries make references known as links to each other through the process known as linking, which is typically done by a linker.

Most modern operating systems (OS) provide libraries that implement the majority of system services. Such libraries have commoditized the services a modern application expects an OS to provide. As such, most code used by modern applications is provided in these libraries.

Types

Static libraries

Historically, libraries could only be static. A static library, also known as an archive, consists of a set of routines which are copied into a target application by the compiler, linker, or binder, producing object files and a stand-alone executable file. Actual address, references for jumps and other routine calls are stored in a relative address or symbolic which cannot be resolved until all code and libraries are assigned final static addresses.

The linker resolves all of the unresolved addresses into fixed or relocatable addresses (from a common base) by loading all code and libraries into actual runtime memory locations. This linking process can take as much, or more time than the compilation process, and must be performed when any one modules is recompiled. Most compiled languages have a standard library (for example, the C standard library) but programmers can also create their own custom libraries. Commercial compiler publishers provide both standard and custom libraries with their compiler products.

A linker may work on specific types of object files, and thus require specific (compatible) types of libraries. Collecting object files into a static library may ease their distribution and use. A client, either a program or a library subroutine, accesses a library object by referencing just its name. The linking process resolves references by searching the libraries in the order given. Usually, it is not considered an error if a name can be found multiple times in a given set of libraries.

Dynamic linking

Dynamic linking means that the subroutines of a library are loaded into an application program at runtime, rather than being linked in at compile time, and remain as separate files on disk. Only a minimal amount of work is done at compile time by the linker; it only records what library routines the program needs and the index names or numbers of the routines in the library. The majority of the work of linking is done at the time the application is loaded (loadtime) or during execution (runtime). The necessary linking code, called a loader, is actually part of the underlying operating system. At the appropriate time the loader finds the relevant libraries on disk and adds the relevant data from the libraries to the process's memory space.

Some operating systems can only link in a library at loadtime, before the process starts executing; others may be able to wait until after the process has started to execute and link in the library just when it is actually referenced (i.e., during runtime). The latter is often called "delay loading". In either case, such a library is called a dynamically linked library.

The nature of dynamic linking makes it a common boundary in software licenses.

Plugins are one common usage of dynamically linked libraries, which is especially useful when the libraries can be replaced by other libraries with a similar interface, but different functionality. Software may be said to have a "plugin architecture" if it uses libraries for core functionality with the intention that they can be replaced. Note, however, that the use of dynamically linked libraries in an application's architecture does not necessarily mean that they may be replaced.

Dynamic linking was originally developed in the Multics operating system, starting in 1964. It was also a feature of MTS (the Michigan Terminal System), built in the late 1960s.[1] In Microsoft Windows, dynamically-linked libraries are called dynamic-link libraries or "DLLs".

Relocation

One wrinkle that the loader must handle is that the actual location in memory of the library data cannot be known until after the executable and all dynamically linked libraries have been loaded into memory. This is because the memory locations used depend on which specific dynamic libraries have been loaded. It is not possible to depend on the absolute location of the data in the executable, nor even in the library, since conflicts between different libraries would result: if two of them specified the same or overlapping addresses, it would be impossible to use both in the same program.

However, in practice, the shared libraries on most systems do not change often. Therefore, it is possible to compute a likely load address for every shared library on the system before it is needed, and store that information in the libraries and executables. If every shared library that is loaded has undergone this process, then each will load at their predetermined addresses, which speeds up the process of dynamic linking. This optimization is known as prebinding in Mac OS X and prelinking in Linux. Disadvantages of this technique include the time required to precompute these addresses every time the shared libraries change, the inability to use address space layout randomization, and the requirement of sufficient virtual address space for use (a problem that will be alleviated by the adoption of 64-bit architectures, at least for the time being).

An old method was to examine the program at load time and replace all references to data in the libraries with pointers to the appropriate memory locations once all libraries have been loaded. On Windows 3.1 (and some embedded systems such as Texas Instruments calculators), the references to patch were arranged as linked lists, allowing easy enumeration and replacement. Nowadays, most dynamic library systems link a symbol table with blank addresses into the program at compile time. All references to code or data in the library pass through this table, the import directory. At load time the table is modified with the location of the library code/data by the loader/linker. This process is still slow enough to significantly affect the speed of programs that call other programs at a very high rate, such as certain shell scripts.

The library itself contains a jump table of all the methods within it, known as entry points. Calls into the library "jump through" this table, looking up the location of the code in memory, then calling it. This introduces overhead in calling into the library, but the delay is usually so small as to be negligible.

Locating libraries at runtime

Dynamic linkers/loaders vary widely in functionality. Some depend on explicit paths to the libraries being stored in the executable. Any change to the library naming or layout of the filesystem will cause these systems to fail. More commonly, only the name of the library (and not the path) is stored in the executable, with the operating system supplying a system to find the library on-disk based on some algorithm.
Unix-like systems
Most Unix-like systems have a "search path" specifying file system directories in which to look for dynamic libraries. On some systems, the default path is specified in a configuration file; in others, it is hard coded into the dynamic loader. Some executable file formats can specify additional directories in which to search for libraries for a particular program. This can usually be overridden with an environment variable, although it is disabled for setuid and setgid programs, so that a user can't force such a program to run arbitrary code. Developers of libraries are encouraged to place their dynamic libraries in places in the default search path. On the downside, this can make installation of new libraries problematic, and these "known" locations quickly become home to an increasing number of library files, making management more complex. ; Microsoft Windows : Microsoft Windows will check the registry to determine the proper place to find an ActiveX DLL, but for other DLLs it will check the directory that the program was loaded from; the current working directory (only on older versions of Windows); any directories set by calling the SetDllDirectory() function; the System32, System, and Windows directories; and finally the directories specified by the PATH environment variable.[2] ; OpenStep : OpenStep used a more flexible system, collecting a list of libraries from a number of known locations (similar to the PATH concept) when the system first starts. Moving libraries around causes no problems at all, although there is a time cost when first starting the system. ; AmigaOS : Under AmigaOS libraries can be stored in any directory. Application-specific libraries are often stored in the application's directory, while libraries supplied with the OS are stored in the Libs directory.


One of the biggest disadvantages of dynamic linking is that the executables depend on the separately stored libraries in order to function properly. If the library is deleted, moved, or renamed, or if an incompatible version of the DLL is copied to a place that is earlier in the search, the executable could malfunction or even fail to load; damaging vital library files used by almost any executable in the system (such as the C library libc.so on Unix systems) will usually render the system completely unusable. On Windows this is commonly known as DLL hell.

Dynamic loading

Dynamic loading is a subset of dynamic linking where a dynamically linked library loads and unloads at run-time on request. Such a request may be made implicitly at compile-time or explicitly at run-time. Implicit requests are made at compile-time when a linker adds library references that include file paths or simply file names. Explicit requests are made when applications make direct calls to an operating system's API at runtime.

Most operating systems that support dynamically linked libraries also support dynamically loading such libraries via a run-time linker API. For instance, Microsoft Windows uses the API functions LoadLibrary, LoadLibraryEx, FreeLibrary and GetProcAddress with Microsoft Dynamic Link Libraries; POSIX based systems, including most UNIX and UNIX-like systems, use dlopen, dlclose and dlsym. Some development systems automate this process.
See also:

Remote libraries

Another solution to the library issue is to use completely separate executables (often in some lightweight form) and call them using a remote procedure call (RPC) over a network to another computer. This approach maximizes operating system re-use: the code needed to support the library is the same code being used to provide application support and security for every other program. Additionally, such systems do not require the library to exist on the same machine, but can forward the requests over the network.

The downside to such an approach is that every library call requires a considerable amount of overhead. RPC calls are much more expensive than calling a shared library which has already been loaded on the same machine. This approach is commonly used in a distributed architecture which makes heavy use of such remote calls, notably client-server systems and application servers such as Enterprise JavaBeans.

Shared library

In addition to being loaded statically or dynamically, libraries are also often classified according to how they are shared among programs. Dynamic libraries almost always offer some form of sharing, allowing the same library to be used by multiple programs at the same time. Static libraries, by definition, cannot be shared. The term "linker" comes from the process of copying procedures or subroutines which may come from "relocatable" libraries and adjusting or "linking" the machine address to the final locations of each module.

The shared library term is slightly ambiguous, because it covers at least two different concepts. First, it is the sharing of code located on disk by unrelated programs. The second concept is the sharing of code in memory, when programs execute the same physical page of RAM, mapped into different address spaces. It would seem that the latter would be preferable, and indeed it has a number of advantages. For instance on the OpenStep system, applications were often only a few hundred kilobytes in size and loaded almost instantly; the vast majority of their code was located in libraries that had already been loaded for other purposes by the operating system. There is a cost, however; shared code must be specifically written to run in a multitasking environment. In some older environments such as 16 bit windows or MPE for the HP 3000, only stack based data (local) was allowed, or other significant restrictions were placed on writing a DLL.

RAM sharing can be accomplished by using position independent code as in Unix, which leads to a complex but flexible architecture, or by using position dependent code as in Windows and OS/2. These systems make sure, by various tricks like pre-mapping the address space and reserving slots for each DLL, that code has a great probability of being shared. Windows DLLs are not shared libraries in the Unix sense. The rest of this article concentrates on aspects common to both variants.

In most modern operating systems, shared libraries can be of the same format as the "regular" executables. This allows two main advantages: first, it requires making only one loader for both of them, rather than two (having the single loader is considered well worth its added complexity). Secondly, it allows the executables also to be used as DLLs, if they have a symbol table. Typical executable/DLL formats are ELF and Mach-O (both in Unix) and PE (Windows). In Windows, the concept was taken one step further, with even system resources such as fonts being bundled in the DLL file format. The same is true under OpenStep, where the universal "bundle" format is used for almost all system resources.

The term DLL is mostly used on Windows and OS/2 products. On Unix and Unix-like platforms, the term shared library or shared object is more commonly used; consequently, the most common filename extension for shared library files is .so, usually followed by another dot and a version number. This is technically justified in view of the different semantics. More explanations are available in the position independent code article.

In some cases, an operating system can become overloaded with different versions of DLLs, which impedes its performance and stability. Such a scenario is known as DLL hell. Most modern operating systems, after 2001, have clean-up methods to eliminate such situations.

Object Libraries

Although dynamic linking was originally developed in the 1960s, it did not reach consumer operating systems until the late 1980s; it was generally available in some form in most operating systems by the early 1990s. It was during this same period that object-oriented programming (OOP) was becoming a significant part of the programming landscape. OOP with runtime binding requires additional information that traditional libraries don't supply; in addition to the names and entry points of the code located within, they also require a list of the objects on which they depend. This is a side-effect of one of OOP's main advantages, inheritance, which means that the complete definition of any method may be defined in a number of places. This is more than simply listing that one library requires the services of another; in a true OOP system, the libraries themselves may not be known at compile time, and vary from system to system.

At the same time another common area for development was the idea of multi-tier programs, in which a "display" running on a desktop computer would use the services of a mainframe or minicomputer for data storage or processing. For instance, a program on a GUI-based computer would send messages to a minicomputer to return small samples of a huge dataset for display. Remote procedure calls already handled these tasks, but there was no standard RPC system.

It was not long before the majority of the minicomputer and mainframe vendors were working on projects to combine the two, producing an OOP library format that could be used anywhere. Such systems were known as object libraries, or distributed objects if they supported remote access (not all did). Microsoft's COM is an example of such a system for local use, DCOM a modified version that support remote access.

For some time object libraries were the "next big thing" in the programming world. There were a number of efforts to create systems that would run across platforms, and companies competed to try to get developers locked into their own system. Examples include IBM's System Object Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable Distributed Objects (PDO), Digital's ObjectBroker, Microsoft's Component Object Model (COM/DCOM), and any number of CORBA-based systems.

In the end, it turned out that OOP libraries were not the next big thing. With the exception of Microsoft's COM and NeXT's (now Apple Computer) PDO, all of these efforts have since ended.

The JAR file format is mainly used for object libraries in the Java programming language. It consists of (sometimes compressed) classes in bytecode format and is loaded by a java virtual machine or special class loaders.

Naming

GNU/Linux, Solaris and other System V Release 4 derivatives, and BSD variants: libfoo.a and libfoo.so files are placed in directories like /lib, /usr/lib or /usr/local/lib. The filenames always start with lib, and end with .a (archive, static library) or .so (shared object, dynamically linked library), with an optional interface number. For example libfoo.so.2 is the second major interface revision of the dynamically linked library libfoo. Old Unix versions would use major and minor library revision numbers (libfoo.so.1.2) while contemporary Unixes will only use major revision numbers (libfoo.so.1). Dynamically loaded libraries are placed in /usr/libexec and similar directories. The .la files sometimes found in the library directories are libtool archives, not usable by the system as such.
Mac OS X and upwards: The system inherits static library conventions from BSD, with the library being in a .a file, and can use .so-style dynamically-linked libraries (with the .dylib suffix instead). Most libraries in Mac OS X, however, are "frameworks", placed inside of special directories called "bundles", which wrap the library's required files and metadata. For example a library called "My Neat Library" would be implemented in a bundle called "My Neat Library.framework".
Microsoft Windows: *.DLL files are dynamically linkable libraries. Other file name patterns may be used for specific purpose DLLs, e.g. *.OCX for OCX control libraries. The interface revisions are either encoded in the files, or abstracted away using COM-object interfaces. Depending on how they are compiled, *.LIB files can be either static libraries or representations of dynamically linkable libraries needed only during compilation, known as "Import Libraries". Unlike in the UNIX world, where different file extensions are used, when linking against .LIB file in Windows one must first know if it is a regular static library or an import library. In the latter case, a .DLL file must be present at runtime.

See also

References

1. ^ "A History of MTS". Information Technology Digest 5 (5). 
2. ^ Dynamic-Link Library Search Order. Microsoft Developer Network Library. Microsoft (2007-10-04). Retrieved on 2007-10-04.

External links

Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
In computer science, a subroutine (function, method, procedure, or subprogram) is a portion of code within a larger program, which performs a specific task and can be relatively independent of the remaining code.
..... Click the link for more information.
Computer software is a general term used to describe a collection of computer programs, procedures and documentation that perform some task on a computer system. [1]
..... Click the link for more information.
Modularity is a concept that has applications in the contexts of computer science, particularly programming, as well as cognitive science in investigating the structure of mind.
..... Click the link for more information.
executable or executable file, in computer science, is a file whose contents are meant to be interpreted as a program by a computer.

While a file in source form may be executable, such a file is usually referred to as a "script.
..... Click the link for more information.
linker or link editor is a program that takes one or more objects generated by compilers and assembles them into a single executable program.

In IBM mainframe environments such as OS/360 this program is known as a linkage editor.
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
Routine may refer to:
  • Choreographed routine, an orchestrated dance involving several performers
  • Comedy routine, a comedic act or part of an act
  • Visual routine, a visual cognitive means of extracting information from a scene

..... Click the link for more information.
In computer science, object code, or an object file, is the representation of code that a compiler or assembler generates by processing a source code file. Object files contain compact code, often called "binaries".
..... Click the link for more information.
The C standard library is a now-standardized collection of header files and library routines used to implement common operations, such as input/output and string handling, in the C programming language.
..... Click the link for more information.
linker or link editor is a program that takes one or more objects generated by compilers and assembles them into a single executable program.

In IBM mainframe environments such as OS/360 this program is known as a linkage editor.
..... Click the link for more information.
In computer science, runtime or run time describes the operation of a computer program, the duration of its execution, from beginning to termination (compare compile time).
..... Click the link for more information.
In computer science, compile time refers to either the operations performed by a compiler (ie, compile-time operations) or programming language requirements that must be met by source code for it to be successfully compiled (ie, compile-time requirements).
..... Click the link for more information.
computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage.
..... Click the link for more information.
Disk storage is a general category of a computer storage mechanisms, in which data is recorded on planar, round and rotating surfaces (disks, discs, or platters). A disk drive is a peripheral device used to collect information from.
..... Click the link for more information.
linker or link editor is a program that takes one or more objects generated by compilers and assembles them into a single executable program.

In IBM mainframe environments such as OS/360 this program is known as a linkage editor.
..... Click the link for more information.
In computing, a loader is the part of an operating system that is responsible for loading programs from executables (i.e., executable files) into memory, preparing them for execution and then executing them.
..... Click the link for more information.
In computing, a loader is the part of an operating system that is responsible for loading programs from executables (i.e., executable files) into memory, preparing them for execution and then executing them.
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
In computing, a process is an instance of a computer program that is being sequentially executed.[1] While a program itself is just a passive collection of instructions, a process is the actual execution of those instructions.
..... Click the link for more information.
In computer science, runtime or run time describes the operation of a computer program, the duration of its execution, from beginning to termination (compare compile time).
..... Click the link for more information.
A software license comprises the permissions, rights and restrictions imposed on software (whether a component or a free-standing program). Use of software without a license could constitute infringement of the owner's exclusive rights under copyright or, occasionally, patent law
..... Click the link for more information.
A plugin (plug-in, addin, add-in, addon or add-on) is a computer program that interacts with a host application (a web browser or an email client, for example) to provide a certain, usually very specific, function "on demand".
..... Click the link for more information.
Multics (Multiplexed Information and Computing Service) was an extraordinarily influential early time-sharing operating system. The project was started in 1964. The last running Multics installation was shut down on October 31, 2000.
..... Click the link for more information.
MTS is an operating system for IBM System/360 and its successors that was developed jointly by the following institutions:
  • University of Michigan
  • Wayne State University
  • Simon Fraser University
  • University of Alberta
  • University of British Columbia

..... Click the link for more information.
Microsoft Windows

Screenshot of Windows Vista Ultimate, the latest version of Microsoft Windows.
Company/developer: Microsoft Corporation
OS family: MS-DOS/9x-based, Windows CE, Windows NT
Source model: Closed source

..... Click the link for more information.
Dynamic-link library

File extension: .dll
MIME type: application/x-msdownload
Uniform Type Identifier: com.microsoft.windows-​dynamic-link-library
Magic: MZ
Developed by: Microsoft
Container for: shared library
..... Click the link for more information.
Prebinding is a method for reducing the time it takes to launch executables in the Mach-O file format. For example, this is what Mac OS X is doing when in the "Optimizing" stage of installing system software or certain applications.
..... Click the link for more information.
prelink is a free program written by Jakub Jelinek of Red Hat for POSIX-compliant operating systems, principally Linux (because it modifies ELF executables). It is intended to speed up a system by reducing the time a program needs to begin.
..... Click the link for more information.
Address space layout randomization (ASLR) is a computer security technique which involves arranging the positions of key data areas, usually including the base of the executable and position of libraries, heap, and stack, randomly in a process' address space.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.