Neural Networks: From Biological Systems to Advanced Artificial Intelligence Models

Neural network and its classification
Conventionally, the word Neural Network used to refer to a set of connections or route of biological neurons. Currently, this word is frequently used to refers to ANN (Artificial Neural Networks), which are tranquil of nodes or artificial neurons. Thus this term has two different usages:
Figure
Biological neural networks are fabricated of genuine biological neurons that are linked or functionally connected in the central nervous system or peripheral nervous system. In Neuroscience area, these are frequently recognized as neurons groups that execute a precise physiological task in laboratory investigation.
Artificial neural networks are fabricated of artificial neurons which are interconnected with each other (encoding constructs that copy the properties of biological neurons). ANNs may either be used to achieve a perceptive of biological neural networks or for solving artificial brainpower problems without essentially creating a sculpt of a genuine biological structure. The genuine, biological nervous system is extremely multifaceted and includes a number of features that may appear excessive based on an acceptance of artificial networks.
Figure
Generally, a biological neural network is collection of a set or sets of chemically linked or functionally linked neurons. A single neuron can be associated to several other neurons and the total number of neurons and links in a system may be broad. Connections, called synapses, are formed from axons to dendrites, though dendrite microcircuits and other connections are possible. Apart from the electrical signaling, there are other forms of signaling that arise from neurotransmitter diffusion, which have an effect on electrical signaling. As such, neural networks are extremely complex.
Figure
Artificial intelligence and cognitive modeling try to replicate some properties of neural networks. In similar techniques, the earlier one want to solving particular problems, while the latter aims to construct arithmetical models of biological neural systems.
In the area of artificial intelligence, ANNs have been deployed fruitfully to Speech Recognition, Analysis and adaptive control of Images, Construction of software agents (3D games and computer ) or robots. Mainly, the presently ANNs are engaged for artificial intelligence, based on Statistical assessment, Optimization and Control hypothesis.
The cognitive modeling area involves the physical or arithmetical modeling of the actions of neural systems; ranging from the individual neural level (e.g. modeling the spike response curves of neurons to a stimulus), through the neural bunch to the total organism (e.g. behavioral modeling of the organism’s response to stimuli). Neural networks, Artificial intelligence and cognitive modeling are paradigms of processing information encouraged by the method biological neural systems data processing.
2.2 History of Neural Network
The neural networks theory started in late-1800s. it was an attempt to explain how the human brain worked. These thoughts started being functional to computational models with Turing’s B-type machines and the perceptron.
In early 1950s Friedrich Hayek was the first to hypothesize the idea of impulsive order in the mind arising out of decentralized networks of simple units (neurons). Late 1940s, Donald Hebb completed one of the first hypothesis for a mechanism of neural plasticity (i.e. learning), Hebbian learning. Hebbian learning is considered to be a ‘classic’ unsupervised learning law and it (and variants of it) was an early model for long term potentiating.
The perceptron is essentially a linear classifier for classifying data specified by parameters and an output function f = w’x + b. Its parameters are adapted with an ad-hoc rule similar to stochastic steepest gradient descent. Because the inner product is a linear operator in the input space, the Perceptron can only perfectly classify a set of data for which different classes are linearly separable in the input space, while it often fails completely for non-separable data. While the development of the algorithm initially generated some enthusiasm, partly because of its apparent relation to biological mechanisms, the later discovery of this inadequacy caused such models to be abandoned until the introduction of non-linear models into the field.
The cognition (1975) was an early multilayered neural network with a training algorithm. The actual structure of the network and the methods used to set the interconnection weights change from one neural strategy to another, each with its advantages and disadvantages. Networks can propagate information in one direction only, or they can bounce back and forth until self-activation at a node occurs and the network settles on a final state. The ability for bi-directional flow of inputs between neurons/nodes was produced with the Hopfield’s network (1982), and specialization of these node layers for specific purposes was introduced through the first hybrid network.
The parallel distributed processing of the mid-1980s became popular under the name connectionism.
Figure
The rediscovery of the back propagation algorithm was probably the main reason behind the repopularisation of neural networks after the publication of “Learning Internal Representations by Error Propagation” in 1986 (Though back propagation itself dates from 1974). The original network utilized multiple layers of weight-sum units of the type f = g(w’x + b), where g was a sigmoid function or logistic function such as used in logistic regression. Training was done by a form of stochastic steepest gradient descent. The employment of the chain rule of differentiation in deriving the appropriate parameter updates results in an algorithm that seems to ‘back propagate errors’, hence the nomenclature. However it is essentially a form of gradient descent. Determining the optimal parameters in a model of this type is not trivial, and steepest gradient descent methods cannot be relied upon to give the solution without a good starting point. In recent times, networks with the same architecture as the back propagation network are referred to as Multi-Layer Perceptrons. This name does not impose any limitations on the type of algorithm used for learning.
The back propagation network generated much enthusiasm at the time and there was much controversy about whether such learning could be implemented in the brain or not, partly because a mechanism for reverse signaling was not obvious at the time, but most importantly because there was no plausible source for the ‘teaching’ or ‘target’ signal.
2.3 Functioning of Neural Network
Neural networks, as used in artificial intelligence, have traditionally been viewed as simplified models of neural processing in the brain, even though the relation between this model and brain biological architecture is debated.
A subject of current research in theoretical neuroscience is the question surrounding the degree of complexity and the properties that individual neural elements should have to reproduce something resembling animal intelligence.
Historically, computers evolved from the von Neumann architecture, which is based on sequential processing and execution of explicit instructions. On the other hand, the origins of neural networks are based on efforts to model information processing in biological systems, which may rely largely on parallel processing as well as implicit instructions based on recognition of patterns of ‘sensory’ input from external sources. In other words, at its very heart a neural network is a complex statistical processor (as opposed to being tasked to sequentially process and execute).
Figure 5
An artificial neural network (ANN), also called a simulated neural network (SNN) or commonly just neural network (NN) is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionistic approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network.
In more practical terms neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data.
2.4 Types of Neural Networks
Feed Forward Neural Network – A simple neural network type where synapses are made from an input layer to zero or more hidden layers, and finally to an output layer. The feed forward neural network is one of the most common types of neural network in use. It is suitable for many types of problems. Feed forward neural networks are often trained with simulated annealing, genetic algorithms or one of the propagation techniques.
Figure
Self Organizing Map (SOM) – A neural network that contains two layers and implements a winner take all strategy in the output layer. Rather than taking the output of individual neurons, the neuron with the highest output is considered the winner. SOM’s are typically used for classification, where the output neurons represent groups that the input neurons are to be classified into. SOM’s are usually trained with a competitive learning strategy.
Hopfield Neural Network – A simple single layer recurrent neural network. The Hopfield neural network is trained with a special algorithm that teaches it to learn to recognize patterns. The Hopfield network will indicate that the pattern is recognized by echoing it back. Hopfield neural networks are typically used for pattern recognition.
Simple Recurrent Network (SRN) Elman Style – A recurrent neural network that has a context layer. The context layer holds the previous output from the hidden layer and then echos that value back to the hidden layer’s input. The hidden layer then always receives input from its previous iteration’s output. Elman neural networks are generally trained using genetic, simulated annealing, or one of the propagation techniques. Elman neural networks are typically used for prediction.
Simple Recurrent Network (SRN) Jordan Style – A recurrent neural network that has a context layer. The context layer holds the previous output from the output layer and then echos that value back to the hidden layer’s input. The hidden layer then always receives input from the previous iteration’s output layer. Jordan neural networks are generally trained using genetic, simulated annealing, or one of the propagation techniques. Jordan neural networks are typically used for prediction.
Simple Recurrent Network (SRN) Self Organizing Map – A recurrent self organizing map that has an input and output layer, just as a regular SOM. However, the RSOM has a context layer as well. This context layer echo’s the previous iteration’s output back to the input layer of the neural network. RSOM’s are trained with a competitive learning algorithm, just as a non-recurrent SOM. RSOM’s can be used to classify temporal data, or to predict.
Figure
Feed forward Radial Basis Function (RBF) Network – A feed forward network with an input layer, output layer and a hidden layer. The hidden layer is based on a radial basis function. The RBF generally used is the Gaussian function.
Figure
Several RBF’s in the hidden layer allow the RBF network to approximate a more complex activation function than a typical feed forward neural network. RBF networks are used for pattern recognition. They can be trained using genetic, annealing or one of the propagation techniques. Other means must be employed to determine the structure of the RBF’s used in the hidden layer.
2.5 Application of Neural Network
Neural Networks in Practice Given this description of neural networks and how they work, what real world applications are they suited for? Neural networks have broad applicability to real world business problems. In fact, they have already been successfully applied in many industries.
Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:
Sales forecasting
Industrial process control
Customer research
Data validation
Risk management
Target marketing
But to give you some more specific examples; ANN are also used in the following specific paradigms: recognition of speakers in communications; diagnosis of hepatitis; recovery of telecommunications from faulty software; interpretation of multimeaning Chinese words; undersea mine detection; texture analysis; three-dimensional object recognition; hand-written word recognition; and facial recognition.
Neural networks in medicine Artificial Neural Networks (ANN) are currently a ‘hot’ research area in medicine and it is believed that they will receive extensive application to biomedical systems in the next few years. At the moment, the research is mostly on modeling parts of the human body and recognizing diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).
Neural networks are ideal in recognizing diseases using scans since there is no need to provide a specific algorithm on how to identify the disease. Neural networks learn by example so the details of how to recognize the disease are not needed. What is needed is a set of examples that are representative of all the variations of the disease. The quantity of examples is not as important as the ‘quantity’. The examples need to be selected very carefully if the system is to perform reliably and efficiently.
Modeling and Diagnosing the Cardiovascular System Neural Networks are used experimentally to model the human cardiovascular system. Diagnosis can be achieved by building a model of the cardiovascular system of an individual and comparing it with the real time physiological measurements taken from the patient. If this routine is carried out regularly, potential harmful medical conditions can be detected at an early stage and thus make the process of combating the disease much easier.
A model of an individual’s cardiovascular system must mimic the relationship among physiological variables (i.e., heart rate, systolic and diastolic blood pressures, and breathing rate) at different physical activity levels. If a model is adapted to an individual, then it becomes a model of the physical condition of that individual. The simulator will have to be able to adapt to the features of any individual without the supervision of an expert. This calls for a neural network.
Another reason that justifies the use of ANN technology, is the ability of ANNs to provide sensor fusion which is the combining of values from several different sensors. Sensor fusion enables the ANNs to learn complex relationships among the individual sensor values, which would otherwise be lost if the values were individually analyzed. In medical modeling and diagnosis, this implies that even though each sensor in a set may be sensitive only to a specific physiological variable, ANNs are capable of detecting complex medical conditions by fusing the data from the individual biomedical sensors.
Electronic noses ANNs are used experimentally to implement electronic noses. Electronic noses have several potential applications in telemedicine. Telemedicine is the practice of medicine over long distances via a communication link. The electronic nose would identify odors in the remote surgical environment. These identified odors would then be electronically transmitted to another site where an door generation system would recreate them. Because the sense of smell can be an important sense to the surgeon, telesmell would enhance telepresent surgery.
Instant Physician An application developed in the mid-1980s called the “instant physician” trained an auto associative memory neural network to store a large number of medical records, each of which includes information on symptoms, diagnosis, and treatment for a particular case. After training, the net can be presented with input consisting of a set of symptoms; it will then find the full stored pattern that represents the “best” diagnosis and treatment.
Neural Networks in business Business is a diverted field with several general areas of specialization such as accounting or financial analysis. Almost any neural network application would fit into one business area or financial analysis.
There is some potential for using neural networks for business purposes, including resource allocation and scheduling. There is also a strong potential for using neural networks for database mining, that is, searching for patterns implicit within the explicitly stored information in databases. Most of the funded work in this area is classified as proprietary. Thus, it is not possible to report on the full extent of the work going on. Most work is applying neural networks, such as the Hopfield-Tank network for optimization and scheduling.
Marketing There is a marketing application which has been integrated with a neural network system. The Airline Marketing Tactician (a trademark abbreviated as AMT) is a computer system made of various intelligent technologies including expert systems. A feed forward neural network is integrated with the AMT and was trained using back-propagation to assist the marketing control of airline seat allocations. The adaptive neural approach was amenable to rule expression. Additionally, the application’s environment changed rapidly and constantly, which required a continuously adaptive solution. The system is used to monitor and recommend booking advice for each departure. Such information has a direct impact on the profitability of an airline and can provide a technological advantage for users of the system.
While it is significant that neural networks have been applied to this problem, it is also important to see that this intelligent technology can be integrated with expert systems and other approaches to make a functional system. Neural networks were used to discover the influence of undefined interactions by the various variables. While these interactions were not defined, they were used by the neural system to develop useful conclusions. It is also noteworthy to see that neural networks can influence the bottom line.
Chapter 3 Fundamentals of learning algorithms
3.1 Learning process of intelligent system
Designing of intelligence computer system from characteristic associated with intelligence in human behavior.
Example: – 1. Neural Network
2. Fuzzy Logic
3. Expert system
4. Probabilistic reasoning.
Types: 1. Hard computing
2. Soft computing
Characteristics: – 1. Cognition
a) Awareness
b) Perceiving
c) Remembering
2. Logical Interface
3. Pattern Recognition
Human brain has two properties: – Human brain is getting experienced to adapt themselves to their surrounding environments. So as a result the information processing capability of the brain is rendered, when this happen the brain becomes plastic.
1. Plastic: – Capability to process information, capability of adding. Must preserve the information it has learn previously.
2. Stable: – Remain stable when it is presented with irrelevant information, useless information.
Learning is a fundamental component to an intelligent system, although a precise definition of learning is hard to produce. In terms of an artificial neural network, learning typically happens during a specific training phase. Once the network has been trained, it enters a production phase where it produces results independently. Training can take on many different forms, using a combination of learning paradigms, learning rules, and learning algorithms. A system which has distinct learning and production phases is known as a static network. Networks which are able to continue learning during production use are known as dynamical systems.
Learning implies that a processing unit is capable of changing its input/output behavior as a result of changes in the environment. Since the activation rule is usually fixed when the network is constructed and since the input/output vector cannot be changed, to change the input/output behavior the weights corresponding to that input vector need to be adjusted. A method is thus needed by which, at least during a training stage, weights can be modified in response to the input/output process. A number of such learning rules are available for neural network models.
A learning paradigm is supervised, unsupervised or a hybrid of the two, and reflects the method in which training data is presented to the neural network. A method that combines supervised and unsupervised training is known as a hybrid method. A learning rule is a model for the types of methods to be used to train the system, and also a goal for what types of results are to be produced. The learning algorithm is the specific mathematical method that is used to update the inter-neuronal synaptic weights during each training iteration. Under each learning rule, there are a variety of possible learning algorithms for use. Most algorithms can only be used with a single learning rule. Learning rules and learning algorithms can typically be used with either supervised or unsupervised learning paradigms, however, and each will produce a different effect.
Overtraining is a problem that arises when too many training examples are provided, and the system becomes incapable of useful generalization. This can also occur when there are too many neurons in the network and the capacity for computation exceeds the dimensionality of the input space. During training, care must be taken not to provide too many input examples and different numbers of training examples could produce very different results in the quality and robustness of the network.
3.2 Learning rules of neural network
The learning rules determine how ‘experiences’ of a network exert their influence on its future behavior. There are, in essence, three types of learning rules: supervised, re-inforcement, and non-supervised or unsupervised.
3.2.1 Supervised learning
The term supervised is used both in a very general and narrow technical sense. In the narrow technical sense supervised means the following. If for a certain input the corresponding output is known, the network is to learn the mapping from inputs to outputs. In supervised learning applications, the correct output must be known and provided to the learning algorithm. The task of the network is to find the mapping. The weights are changed depending on the magnitude of the error that the network produces at the output layer: the larger the error, i.e. the discrepancy between the output that the network produces’ the actual output ‘ and the correct output value ‘ the desired output ‘, the more the weights change. This is why the term error-correction learning is also used.
3.2.2 Reinforcement learning
Reinforcement learning is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment.'( [Sutton and Barto,1998]). If the teacher only tells a student whether her answer is correct or not, but leaves the task of determining why the answer is correct or false to the student, we have an instance of reinforcement learning. The problem of attributing the error (or the success) to the right cause is called the credit assignment or blame assignment problem.
3.2.3 Unsupervised learning
In unsupervised learning, the agent learns recurring patterns without any tutoring input. Essentially, the neural system detects correlations between neuronal _ring patterns and between those patterns and the structure of inputs to the network. These correlations are strengthened by changes to ANN weights such that, in the future, portions of a pattern such to predict/retrieve much of the remainder.
3.3 Learning rate of neural network
Most of the network structure undergoes learning procedure during which synaptic weights W and v are adjusted. Learning rate coefficient determines the size of the weights adjustments made at each iteration and hence influences the rate of convergence. Poor choice of coefficient can result in a failure in convergence. If learning rate coefficient too large, the search path will oscillate and convergence more slowly in a direct descent. If the coefficient is too small the descent will progress in small steps significantly increasing time to converge.
3.3.1 Perceptron
The Perceptron present by Rosenblatt in 1959. The essential innovation was the introduction of numerical weights and a special interconnection pattern. In the original Rosenblatt model the computing units are threshold elements and the connectivity is determined stochastically. Learning takes place by adapting the weights of the network with a numerical algorithm. Rosenblatt’s model was refined and perfected in the 1960s and its computational properties were carefully analyzed by Minsky and Papert [15]. In the following, Rosenblatt’s model will be called the classical Perceptron and the model analyzed by Minsky and Papert the Perceptron.
The Perceptron forms a network with a single node and set of input connection along with a dummy input which is always set to 1 and a single output lead. the input pattern which could be a set of numbers is applied to each of the connections to the node.
The Perceptron learning algorithm updates the strength of each connection to the node is in such a way that output from the node happens to be with in some threshold value for each class represented by input patterns. Thus the preceptron equation for the class label CK
Ck = W0 + W1I1 + W2 I2 ”””. WnIn
3.4 Introduction to proposed algorithms
If the output is correct then no adjustment of weights is done.
W ij(k1)= Wijk
If the output is 1 but should have been 0 then the weights are decreased on the active input links.
W ij(k+1)= Wijk – ”.xi
Where ” is the learning rate.
If the output is 0 but should have been 1 then the weights are increased on the active input links.
W ij(k+1)= Wijk+ ”.xi
W ij(k+1) is new adjusted weight and W ijkis old weight.
Step 1: create perceptron with (n+1) input neurons X0 X1 ‘Xn where X = 1 is the bias input. Let O be the output neuron.
Step 2: Initialize W= ( Wo,W1’Wn) to random weights.
Step 3: Iterate through the input patterns X of the training set using the weight set (i.e.) compute the weighted sum of input net j = for each input pattern j.
Step 4: Compute the output Y using the step function
Y= f(net j) = 1 net j > 0
=0 otherwise
Step 5: compare the computed output Yj with the target output Yj for each input pattern j. If all the input patterns have been classified correctly output the weights and exist.
Step 6: Otherwise, update the weights as given below:
If the computed output Yj is 1 but should have been 0 ,
Wi=Wi – ”.xi
If the computed output Yj is 0 but should have been 1 ,
Wi=Wi + ”.xi
Where ” is the learning rate.
Step 7: Go to step 3.
Chapter 4 Analysis and Synthesis of shortest path routing
4.1 Fundamentals of Shortest path optimization
The most important algorithms for solving this problem are:
a) Dijkstra’s algorithm – Dijkstra’s algorithm conceived by Dutch computer scientist EdsgerDijkstra in 1956 and published in 1959,[1][2] is a graph search algorithmthat solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing as a subroutine in other graph algorithms, or in GPS Technology. Dijkstra’s algorithm solves the single-source shortest path problems.
b) Bellman’ Ford Algorithm-Bellman’Ford algorithm solves the single-source problem if edge weights may be negative. The Bellman’Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra’s algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers. The algorithm is usually named after two of its developers, Richard Bellman and Lester Ford, Jr., who published it in 1958 and 1956, respectively; however, Edward F. Moore also published the same algorithm in 1957, and for this reason it is also sometimes called the Bellman’Ford’Moore algorithm. Negative edge weights are found in various applications of graphs, hence the usefulness of this algorithm. However, if a graph contains a “negative cycle”, i.e., a cycle whose edges sum to a negative value, then there is no cheapest path, because any path can be made cheaper by one more walk through the negative cycle. In such a case, the Bellman’Ford algorithm can detect negative cycles and report their existence, but it cannot produce a correct “shortest path” answer if a negative cycle is reachable from the source.
c) A* search algorithm -A* search algorithm solves for single pair shortest path using heuristics to try to speed up the search. A* is a computer algorithm that is widely used in path finding and graph traversal, the process of plotting an efficiently traversable path between points, called nodes. Noted for its performance and accuracy, it enjoys widespread use. (However, in practical travel-routing systems, it is generally outperformed by algorithms which can pre-process the graph to attain better performance. A* uses a best-first search and finds a least-cost path from a given initial node to one goal node (out of one or more possible goals). As A* traverses the graph, it follows a path of the lowest expected total cost or distance, keeping a sorted priority queue of alternate path segments along the way.
d) Floyd’Warshall algorithm – Floyd’Warshall algorithm solves all pair’s shortest paths.Floyd’Warshall algorithm (also known as Floyd’s algorithm, Roy’Warshall algorithm, Roy’Floyd algorithm, or the WFI algorithm) is a graph analysis algorithm for finding shortest paths in a weighted graph with positive or negative edge weights (but with no negative cycles, see below) and also for finding transitive closure of a relation R. A single execution of the algorithm will find the lengths (summed weights) of the shortest paths between all pairs of vertices, though it does not return details of the paths themselves. The algorithm is an example of dynamic programming. It was published in its currently recognized form by Robert Floyd in 1962. However, it is essentially the same as algorithms previously published by Bernard Roy in 1959 and also by Stephen Warshall in 1962 for finding the transitive closure of a graph. The modern formulation of Warshall’s algorithm as three nested for-loops was first described by Peter Ingerman, also in 1962.
e) Johnson’s algorithm- Johnson’s algorithm solves all pair’s shortest paths, and may be faster than Floyd’Warshall on sparse graphs. Johnson’s algorithm is a way to find the shortest paths between all pairs of vertices in a sparse directed graph. It allows some of the edge weights to be negative numbers, but no negative-weight cycles may exist. It works by using the Bellman’Ford algorithm to compute a transformation of the input graph that removes all negative weights, allowing Dijkstra’s algorithm to be used on the transformed graph. It is named after Donald B. Johnson, who first published the technique in 1977.
A similar reweighting technique is also used in Suurballe’s algorithm (1974) for finding two disjoint paths of minimum total length between the same two vertices in a graph with non-negative edge weights.
4.2 Different algorithms of shortest path optimization
The following technique is widely used in many forms, because it is simple and easy to understand. The idea is to build a graph of the subnet, with each node of the graph representing a router and each arc representing a communication line (link).To choose a rout between a given pair of routers , the algorithm just finds the shortest path between them on the graph. The shortest path concept includes definition of the way of measuring path length. Deferent metrics like number of hops, geographical distance, the mean queuing and transmission delay of router can be used. In the most general case, the labels on the arcs could be computed as a function of the distance, bandwidth, average traffic, communication cost, mean queue length, measured delay, and other factors.
There are several algorithms for computing shortest path between two nodes of a graph. One of them due to Dijkstra.
4.2.1 Flooding
That is another static algorithm, in witch every incoming packet is sent out on every outgoing line except the one it arrived on. Flooding generates infinite number of duplicate packets unless some measures are taken to damp the process.
One such measure is to have a hop counter in the header of each packet, which is decremented at each hop, with the packet being discarded when the counter reaches zero. Ideally, the hop counter is initialized to the length of the path from source to destination. If the sender does not no the path length, it can initialize the counter to the worst case, the full diameter of the subnet.
An alternative technique is to keep track of which packets have been flooded, to avoid sending then out a second time. To achieve this goal the source router put a sequence number in each packet it receives from its hosts. Then each router needs a list per source router telling which sequence numbers originating at that source have already been seen. Any incoming packet that is on the list is not flooded. To prevent list form growing, each list should be augmented by a counter, k, meaning that all sequence numbers through k have been seen.
A variation of flooding named selective flooding is slightly more practical. In this algorithm the routers do not send every incoming packet out on every line, but only on those going approximately in the right direction.(there is usually little point in sending a westbound packet on an eastbound line unless the topology is extremely peculiar).
Flooding algorithms are rarely used, mostly with distributed systems or systems with tremendous robustness requirements at any instance.
4.2.2 Flow-Based Routing
The algorithms seen above took only topology into account and did not consider the load. The following algorithm considers both and is called flow-based routing.
In some networks, the mean data flow between each pair of nodes is relatively stable and predictable. Under conditions in which the average traffic from i to j is known in advance and to a reasonable approximation ,constant in time, it is possible to analyze the flows mathematically to optimize the routing.
The idea behind the analysis is that for a given line, if the capacity and average flow are known, it is possible to compute the mean packet delay on that line from queuing theory. From the mean delays on all the lines , it is straightforward to calculate a flow-weighted average to get the mean packet delay for the whole subnet. The routing problem then reduces to finding the routing algorithm that produces the minimum average delay for the subnet.
This technology demands certain information in advance. First the subnet topology, second the traffic matrix, third the capacity matrix and finally a routing algorithm (further explanation look at the same reference as above).
4.2.3 Distance Vector Routing
Modern computer networks generally use dynamic routing algorithms rather then static ones described above. Two dynamic algorithms in particular, distance vector & link state routing are the most popular. In this section we will look at the former algorithm. In the following one we will study the later one.
Distance vector routing algorithms operate by having each router maintain a table giving he best known distance to each destination and which line to use to get there. These table are updated by exchanging information with the neighbors.
The distance vector routing algorithm is sometimes called by other names including Bellman-Ford or Ford-Fulkerson. It was the original ARPANET routing algorithm and was also used in the Internet under the name RIP and in early versions of DECnet and Novell’s IPX. AppleTalk & CISCO routers use improved distance vector protocols.
In that algorithm each router maintains a routing table indexed by and containing one entry for each router in the subnet. This entry contains two parts: the preferred outgoing line to use for that destination and an estimate of the time or distance to that destination. The metric used might be number of hops, time delay in milliseconds, total number of packets queued along the path or something similar.
The router is assumed to know the ‘distance’ to each of its neighbors. In the hops metric the distance is one hop, for queue length metrics the router examines each queue, for the delay metric the route can measure it directly with special ECHO packets that the receiver just timestamps and sends back as fast as it can.
Distance vector routing works in theory, but has a serious drawback in practice: although it converges to the correct answer, it may be done slowly. Good news propagates at linear time through the subnet, while bad ones have the count-to-infinity problem: no router ever has a value more then one higher than the minimum of all its neighbors. Gradually, all the routers work their way up to infinity, but the number of exchanges required depends on the numerical value used for infinity. One of the solutions to this problem is split horizon algorithm that defines the distance to the X router is reported as infinity on the line that packets for X are sent on. Under that behavior bad news propagate also at linear speed through the subnet.
4.2.4 Link State Routing
The idea behind link state routing is simple and can be stated as five parts. Each router must:
1) Discover its neighbors and learn their network addresses.
2) Measure the delay or cost to each of its neighbors.
3) Construct a packet telling to all it has just learned.
4) Send the packet to all other routers.
5) Compute the shortest path to every other router.
In effect, the complete topology and all delays re experimentally measured and distributed to every router. Then Dijkstra’s algorithm can be used to find the shortest path to every other router.
4.2.5 Hierarchical Routing
As the network grows larger the amount of resources necessary to take care or routing table becomes enormous and makes routing impossible. Here appears the idea of hierarchical routing that suggests that routers should be divided into regions, with each router knowing all the details about how to route packets within its own region, but knowing nothing about the internal structure of other regions.
Unfortunately the gains in routing table size & CPU time are not free, the penalty of increasing path length has to be paid.
It has been discovered that the optimal number of nested levels for an N router subnet is ln N, requiring a total of eln N entries per router.
4.2.6 Routing for Mobile Hosts
Through the last years more and more people purchase portable computer under natural assumption that they can be used all over the world. These mobile hosts introduce new complication: to route a packet to a mobile host the network first has to find it. Generally that requirement is implemented through creation of two new issues in LAN foreign agent and home agent.
Each time any mobile host connects to the network it collects a foreign agent packet or generates a request for foreign agent, as a result they establish connection between them and the mobile host supplies the foreign agent with it’s home & some security information.
After that the foreign agent contacts the mobile host’s home agent and delivers the information about the mobile host.
Subsequently the home agent examines the received information and if it authorizes the security information of mobile host it allows the foreign agent to proceed. As the result the foreign agent enters the mobile host into it’s routing table.
When the packet for the mobile host arrive its home agent it encapsulates it and redirects to the foreign agent where the mobile host is hosting. Then it returns encapsulation data to the router that sent the packet so that all next packet would be directly sent to correspondent router (foreign agent).
For some applications ,hosts need to send messages to many or all other hosts. Broadcast routing is used for that purpose. Some deferent methods where proposed for doing that.
1) The source should send the packet to all the necessary destinations. One of the problems of this method is that the source has to have the complete list of destinations.
2) Flooding routing. As it was discussed before the problem of that method is generating duplicate packets.
3) Multidestination routing. In that method each packet includes list or a bitmap indicating desired destinations. When a packet arrives router checks all the destinations to determine the set of output lines that will be needed, generates a new copy of the packet for each output line to be used and includes in each packet only those destinations that are to use the line. In effect, the destination set is partitioned between the lines. After a sufficient number of hops, each packet will carry only one destination and can be treated as a normal packet.
4) This routing method makes use of spanning tree of the subnet. If each router knows which of its lines belong to the panning tree, it can copy an incoming broadcast packet onto all the spanning tree lines except the one it arrived on. Problem: each router has to know the spanning tree.
Reverse path-forwarding algorithm at the arrival of the packet checks if the line that packet arrived on is the same one through which the packets are send to the source, if yes it sends it through all other lines, otherwise discards it.
Chapter 5 Rosenblatt neural network perception
5.1 What are Neural Network and its Historical Background?
An Artificial Neural Network (ANN) is an information processing standard that is stimulated by the way genetic nervous systems, for example the brain, process information. The key element of this model is the original structure of processing information system. It is self-possessed of a large number of highly interrelated processing elements (neurons) working in union to solve exact tribulations. Artificial Neural Networks, like public, learn by the sources of example. An Artificial Neural Network is configured for a definite application, such as pattern recognition or classification of data, through a learning process. Learning in biological systems involves adjustments to the synaptic associations that exist between the neurons. This is true for Artificial Neural Networks as well. Neural network simulations come into sight in recent development. However, this area was recognized before the advent of computers, and has survived at least one major setback and several eras. Many significant advances have been boosted by the use of cheap computer emulations. Following an initial period of interest, the field survived a period of aggravation and disgrace. During the period of, when financial support and expert support was negligible, significant advances were made by relatively few researchers. These pioneers were able to develop realistic technology which surpassed the boundaries identified by Minsky and Papert. Minsky and Papert, published a book (in 1969) in which they summed up a general feeling of aggravation (against neural networks) among researchers, and was thus accepted by most without further analysis. Currently, the neural network field enjoys a recovery of interest and a corresponding increase in financial support.
The first artificial neuron was formed in 1943 by the Neurophysiologist Warren McCulloch and the logician Walter Pits. But the technology available at that time did not allow them to do too much.
Neural networks, with their significant capability to develop meaning from complex or inaccurate data, can be used to dig out patterns and identify trends that are too intricate to be noticed by either humans or other computer techniques. A skilled or trained neural network can be considered of as an “expert” in the category of information it has been given to examine. This expert can then be used to provide projections given new situations of interest and answer “what if” questions.
Other advantages include:
1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.
Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage.
Neural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. But computers would be so much more useful if they could do things that we don’t exactly know how to do.
Neural networks process information in a similar way the human brain does. The network is composed of a large number of highly interconnected processing elements (neurones) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable.
On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known and stated in small unambiguous instructions. These instructions are then converted to a high level language program and then into machine code that the computer can understand. These machines are totally predictable; if anything goes wrong is due to a software or hardware fault.
Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural networks. Even more, a large number of tasks, require systems that use a combination of the two approaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency.
5.2 From Human Neurons to Artificial Neurons
Much is still unknown about how the brain trains itself to process information, so theories abound. In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity from the axon into electrical effects that inhibit or excite activity in the connected neurones. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes.
Components of a neuron The synapse
We conduct these neural networks by first trying to deduce the essential features of neurones and their interconnections. We then typically program a computer to simulate these features. However because our knowledge of neurones is incomplete and our computing power is limited, our models are necessarily gross idealizations of real networks of neurones.
Some interesting numbers
BRAIN PC
Vprop=100m/s Vprop=3*108 m/s
N=1010-1011 neurons N=109
The parallelism degree ~1014 like 1014processors with 100 Hz frequency. 104 connected at the same time.
5.3 Rosenblatt perception
In mathematical terms, the neuron fires if and only if; the addition of input weights and of the threshold makes this neuron a very flexible and powerful one. The MCP neuron has the ability to adapt to a particular situation by changing its weights and/or threshold. Various algorithms exist that cause the neuron to ‘adapt’; the most used ones are the Delta rule and the back error propagation. The former is used in feed-forward networks and the latter in feedback networks.
The most influential work on neural nets in the 60’s went under the heading of ‘perceptrons’ a term coined by Frank Rosenblatt. The perceptron turns out to be an MCP model (neuron with weighted inputs) with some additional, fixed, pre–processing. Units labeled A1, A2, Aj, Ap are called association units and their task is to extract specific, localized featured from the input images. Perceptrons mimic the basic idea behind the mammalian visual system. They were mainly used in pattern recognition even though their capabilities extended a lot more.
In 1969 Minsky and Papert wrote a book in which they described the limitations of single layer Perceptrons. The impact that the book had was tremendous and caused a lot of neural network researchers to loose their interest. The book was very well written and showed mathematically that single layer perceptrons could not do some basic pattern recognition operations like determining the parity of a shape or determining whether a shape is connected or not. What they did not realize, until the 80’s, is that given the appropriate training, multilevel perceptrons can do these operations.
5.4 Working of Rosenblatt perception
5.4.1 Elementary Perceptron
1. Has the ability to learn to recognize simple patterns.
2. The elementary perceptron is a two-layer, heteroassociative, nearest-neighbour pattern matcher.
3. Can accept both continuous valued and binary input.
4. The perceptron learns offline, operates in discrete time and stores the pattern pairs
a. (Ak, Bk), k = 1,2,…, m using the perceptron error-correction (or convergence) procedure,
b. where the kth pattern pair is represented by the analog valued vector Ak = (a1k,…,ank) and
c. the bipolar [-1, +1] valued vector Bk = (b1k,…, bpk).
5. a perceptron decides whether an input belongs to one of two classes.
6. The net divides the space spanned by the input into two regions separated by a hyperplane or a line in 2-D (a decision plane).
5.4.2 Perceptron Convergence Procedure
Step 1. Initialize weights and thresholds.
‘ set the connection weights wi and the threshold value ‘ to small random values.
Step 2. Present new input and desired output.
‘ present new continuous valued input x0,x1,’.,xn-1 along with the desired output d(t).
Step 3. Calculate actual output.
y(t) = fn ( ‘ I=0 n-1 wi(t) xi(t) ‘ ‘ )
Step 4. Adapt weights.
‘ when an error occurs the connection weights are adapted by the formula:
wi(t + 1) = wi(t) + ‘[d(t) – y{t)] xi(t)
where ‘ is a positive gain fraction that ranges from 0.0 to 1.0 and controls the
adaption rate.
Step 5. Repeat by going to Step 2

Discover more:

Essays on artificial intelligence

Essay: Neural Networks: From Biological Systems to Advanced Artificial Intelligence Models

Essay details and download:

Text preview of this essay:

Discover more:

Recommended for you

About this essay:

Essay details and download:

Text preview of this essay:

Discover more:

Recommended for you

About this essay:

Essay Categories: