July 8 2015 / Day 13

Today, we had the privilege of listening to Tracy Ballinger’s presentation. She focuses on using graph theory to reconstruct the evolutionary history of cancer. She specializes in genome sequencing and bioinformatics. Her work is especially pertinent to our world today because 1 in 4 people will die from cancer in the U.S. Cancer is a genetic disease. They all have different perturbations and mutations, which make them very hard to cure in unique individuals. Cancer accumulates mutations over time; these mutations are called passenger mutations. Her work responds to the need to distinguish passenger mutations from initiators of the cancer. Using Copy Number Ancestral Variation Graphs (CN-AVG), Ms. Ballinger builds potential evolutionary histories from copy number changes. An evolutionary history is made up of a sequence of rearrangement events. The timing of when the cells were developed can be inferred through prevalence and parsimony. Ms. Ballinger also uses Markov Chain Monte Carlo (MCMC) sampling to generate history predictions. As the MCMC samplings iterate, the sampling results of independent histories reach a steady state of agreement. Her work uses AI in an innovative way.

Mr. Alston also spoke about his own research in Real-time Experience-driven Procedural Content Generation. His work is based on the assumption that different players play, experience, and enjoy games differently. The problem is that today’s games do not provide a different experience for individual players. ED-PCG incorporates player modeling, machine learning, and procedural content-generation methods. ED-PCG will hopefully occur online and in real-time in the near future. There is subjective, objective, and gameplay based player experience modeling. Subjective player experience modeling is how the players feel and is self reported. Objective player experience modeling forms from multiple modalities of player input. Finally, game play player experience modeling are things that come out of the game. This approach is online, necessary, parameter vector stochastic, generate and test. Its components include player and game data, content quality, content representation, and content generation. Mr. Alston demonstrated his game using Pro-D Procedural Dungeon Generation. It uses preference learning. The gameplay is assessed based on accuracy, feature subset, individual feature contribution, and training time. Hopefully video games can evolve to be adaptive systems based on player modules.

These two uses of AI are both impressive and inspiring. AI is being applied in concrete ways to benefit society and make boundary-pushing technological advances.

July 7 2015 / Day 12

Artificial intelligence is crucial to robotics. The manipulators or configuration of robot is specified by a number n — it has n degrees of freedom. 6 is the minimum number required to position end effectors arbitrarily. Non-holonomic robots are robots that have more degrees of freedom than controls. These robots cannot generally transition between two infinitesimally close configurations. Robots use various sensors as percepts to determine facts about its environment. For range finding, they use RADAR, SONAR, LIDAR, GPS, and tactile sensors. Imaging sensors include cameras. Proprioceptive sensors include shaft decoders, inertial sensors, force sensors, and torque sensors.

Localization is the act of computing current location and orientation (pose) given observations about the environment. Robots can assume Gaussian noise in motion prediction and sensor range measurements. Robots can also use particle filtering to produce an approximate position estimate. Robots also can use an extended Kalman filter, which assumes that landmarks are identifiable; otherwise, the posterior is multimodal. Localization can be described that given map and observed landmarks, a robot updates its pose distribution. A robot maps when it, given orientation and observed landmarks, updates map distribution. SLAM, or Simultaneous Localization And Mapping, is when a robot, given observed landmarks, updates its orientation and map distribution. Motion planning is when a robot plans in configuration space defined by the robot’s degree of freedom. The solution is a point trajectory in free configuration space.

Configuration planning stems from the problem that there are an infinite number of states. As a result, the robot must convert the configuration state into a finite state space. There are two ways to do so: by cell decomposition and skeletonization. Cell decomposition is when the robot divides up space into single cells, each of which can be traversed easily. Skeletonization is when a robot creates a diagram that is a locus of points equidistant from obstacles.

We then discussed Bayesian networks and Bayes rule, which is P(a|b) = (P(b|a)Pa) / P(b). Bayes rule describes the probability of an event, based on conditions that might be related to the event. Bayesian networks are simple, graphical notations for conditional independence assertions and hence for compact specification of full joint distributions. Its syntax is a set of nodes in a directed, acyclic graph that conditionally distributes for each node given its parents. Bayesian learning views learning as a Bayesian updating of a probability distribution over hypothesis space. Predictions use a likelihood-weighted average over the hypotheses. Since the world changes, we need to track and predict it given change in time and uncertainty. Markov problems rely on Bayes networks from parent variables.

July 6 2015 / Day 11

Logic comprises many different types of logic, such as propositional logic, first-order logic, temporal logic, probability theory, and fuzzy logic. Logics are formal languages for representing information such that conclusions can be drawn. Its syntax defines the sentences in the language and its semantics define the meaning of the sentences. A knowledge base is a set of sentences in a formal language that specifies the declarative approach to building a logical agent. A knowledge based agent must be able to represent states and actions, incorporate new percepts, update internal representation of the world, deduce hidden properties of the world, and deduce appropriate actions .

The simplest logic is propositional logic. It illustrates basic ideas such as negation, conjunction, disjunction, implication, and bicondition. There are many advantages to propositional logic: it is declarative, allows partial/disjunctive/negated information, compositional, and its meaning is context-independent. Its disadvantage is that it has limited expressive power.

First order logic is like a natural language in the way that it assumes the world contains objects, relations, and functions.

A sentence is valid if it is true in all models. A sentence is satisfiable if it is true in some models. A sentence is unsatisfiable if it is true in no models. A clause is a sentence and includes different types such as conjunctive natural form clauses, literal clauses, horn clauses, definite clauses, and goal clauses.

Modus Ponens, Latin for “the way that affirms by affirming”, is a simple argument form and rule of inference given certain information.

Forward chaining and backward chaining are strategies used to find information quickly from the knowledge base. Forward chaining finds any rule whose premises are satisfied in the knowledge base, and then its conclusion is added to the knowledge base until the query is found. Backward chaining works backwards from the query to avoid loops and repeated work. Resolution is a rule of inference leading to a refutation theorem-proving technique for sentences in propositional logic and first order logic.

Quantification can be used to express relationships. For instance, universal quantification is the conjunction of instantiations of P; ∀<variables><sentence>. ⇒ is the main connector. Existential quantification is the disjunction of instantiations of P; ∃<variables><sentence>. ^ is the main connector.

We briefly looked at a tutorial for Prolog, which is an interpretive program that requires the programmer to write statements into the knowledge base one by one. The three basic constructs are facts, rules, and queries. Prolog is declarative (not procedural), recursive (it has no loops), and relational (no functions).

July 3 2015 / Day 10

Today, we continued our discussion about machine learning algorithms by talking about decision tree learning. Decision trees are kinds of classifiers that require supervised learning. The decision tree relies on a data input (X). Its aim is to create a predictor/classifier for it. These are represented by a feature vector, and specifically for decision trees, discrete vectors. The output (Y) can be continuous (regression) or discrete (classification). Decision trees have two kinds of nodes: leaf nodes (which are class labels determined by the majority vote of examples trained on the leaf), and internal nodes (a question on features, and branches according to the answers).

Supervised learning requires a training set that contains n-pairs of examples. There is also a predictor (f: x –> y). The hypothesis space is a space of predictors with a set of dth order polynomials. The aim is to find the best function in the hypothesis space that generalizes well.

Programmers can evaluate classifiers during training, during testing (for new data, classifier generates predicted labels), and test set accuracy. The aim is to have pure leaf nodes.

Entropy is disorder in a system. It measures the probability, randomness, or impurity of a node. This is also the “outcome” of the draw, which is a random variable Y with probability (p1, p2… p3). A subcategory of entropy is conditional entropy; the same as entropy only given that the value of another random variable X is known. Information gain in decision trees tells us how important a given attribute of the feature vectors is. It is used to decide the ordering of attributes in the nodes of a decision tree.

Overfitting in regression occurs when the vectors are fitted too perfectly to the training data, but prediction outside the training data worsens; therefore, outliers are badly accommodated.

Neural networks play an extremely important role in artificial intelligence. The McCulloch-Pitts “unit”, although it was a gross oversimplification of real neurons, developed a greater understanding of what networks of simple units can do. Its output is a squashed linear function of the inputs.

The activation function of a node defines the output of that node given an input or set of inputs. Every boolean function can be implemented with a logical function in a perceptron as long as it is linearly separable. Perceptrons learn by iteratively updating weights to reduce error on the training set. The perceptron learning rule converges to a consistent function for any linearly separable data set. In addition, there are multi-layer perceptrons that are good at complex pattern recognition tasks. Back propagation is used to train multi-layer perceptrons. To prevent problems like slow-convergence and local minima, multi-layer networks can be trained by gradient descent and can use simulated annealing.

There are two types of network structures: feed-forward networks, and recurrent networks. Feed-forward networks are parameterized families of nonlinear functions. They have single-layer or multi-layer perceptrons. Feed-forward networks implement functions and have no internal state. Recurrent networks have directed cycles with delays. They have internal states  (like flip-flops) and can oscillate.

Project Write-Up

Title: Generative Art (making art that is mainly driven by a computer program; using programming to generate artwork that is algorithmically defined and created.)

Team members: Alex Tsai and Iris Nayki

General Category: Computational Creativity

Specifics: Each time the program is run, a new piece of artwork is displayed.

Tools/Languages: Processing

Deliverables: To create art using already entered images.

Schedule:

6/30/2015 Research/Brainstorm/Create a Schedule for our project/Download necessary materials
7/1/2015 Begin Implementation/Code/Work on Presentation
7/2/2015 Computer History Museum/Code/Work on Presentation
7/3/2015 Code/Work on Presentation
7/4/2015 Sat
7/5/2015 Sun
7/6/2015 Code/Work on Presentation
7/7/2015 Finalize Code and Presentation
7/8/2015 Finish Presentation/Practice
7/9/2015 Presentation

Computational Creativity Tutorials
http://videolectures.net/iccc2014_wiggins_creativity_tutorial/
http://videolectures.net/iccc2014_veale_creativity_tutorial/

Realtime programmer-computer collaborative project (16:00 in the video)
http://videolectures.net/iccc2014_davis_artistic_computer/

Inspiration: Artist Joshua Davis
Website: http://www.joshuadavis.com/
One of his talks: https://www.youtube.com/watch?v=LJS4fBjdPM4
Introduction to generative art: http://www.skillshare.com/classes/design/Programming-Graphics-I-Introduction-to-Generative-Art/782118657

Examples of generative art: 
http://zenbullets.com/blog/?p=825

Click to access GenArt-Sample-Intro.pdf

Four generative works created by code that iterates through code that produces a different still image every time it runs
Four generative works created by code that iterates through code that produces a different still image every time it runs

Computer History Museum / Day 9

“Computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.” — Donald Knuth 1974

The urge to create is in all of us. Computational creativity plays a very prominent role in computer history. It is the study of building software that exhibits behavior that would be deemed creative in humans. Such creative software can be used for autonomous creative tasks, such as inventing mathematical theories, writing poems, painting pictures, and composing music. However, computational creativity studies also enable us to understand human creativity and to produce programs for creative people to use, where it acts as a collaborator alongside human programmers.

Pixar Image Computer, 1986

Ever since the 1960s in Bell Labs, researchers have been experimenting with computer graphics. (https://www.youtube.com/watch?v=M4nql28E_AE). Another milestone in animation and computational creativity was the development of movies generated entirely by computer. Later in 1986, Pixar and Disney jointly developed the Pixar Image Computer that housed software allowing the computer to color hand-drawn images for traditionally animated Disney cartoons. The first full length animated film, Toy Story, was produced by Pixar and Disney in 1995. It was a smash hit.

AARON and Cohen

The creative aspect of computer programming was also deeply explored. Artist Harold Cohen wrote AARON, a software program, that creates original artistic images. It has been in continual development since 1973. Initial versions of AARON created abstract drawings that grew more complex through the 1970s. More representational imagery was added in the 1980s; first rocks, then plants, then people. In the 1990s more representational figures set in interior scenes were added, along with colour. AARON returned to more abstract imagery, this time in colour, in the early 2000s. AARON is the first robot in history to paint original art. AARON mixes its own paints, creates striking artwork, and even washes its own brushes. AARON was built using C but most of its development is in LISP. According to Cohen, “AARON has to know what it’s doing, and has to spend most of its time building an internal representation of the developing drawing so that it can decide what to do next. I don’t tell it what to do. I tell it what it knows, and IT decides what to do.”

AARON and its artwork on display at the museum
AARON and its artwork on display at the museum

I left the museum feeling awestruck at the progress mankind has made in the development of computer technology within the past 2000 years. Every day, progress is being made to optimize the future of artificial intelligence and computer technology. For instance, Jeff Hawkins, founder of Palm, is conducting studies on the brain and specifically the neocortex. The neocortex uses one framework for vision, audition, language, motor, planning, robotics…. Hawkins believes the future of AI lies in memory systems that store and play back experiences to help us predict, intelligently, what will happen next. I can’t wait to see how further research can bring new perspectives to AI and technology.

Article Review / Week 2

http://vfacstaff.ltu.edu/lshamir/publications/wm_pollock.pdf

Lior Shamir, a computer scientist at Lawrence Technological University, has developed an algorithm for authenticating Jackson Pollock “drip” paintings by analyzing their fractal dimensions. The computer is correct 93% of the time. Counterfeit Pollocks are difficult for even pundits to distinguish from the genuine.

Pollock’s paintings obey fractal geometry. Moving around a large canvas laid on the ground, the artist let paint fly from all angles, using his whole body. Human motion is known to display fractal properties when people restore their balance, says Taylor, and films of Pollock seem to show him painting in a state of ‘controlled off-balance’. Second, the dripping and pouring itself could be a chaotic process. The machine extracted 4,024 image content descriptors from each Pollock painting and found fractal features and numerical  image content descriptors such as Zernike polynomials, Haralick textures, and Chebyshev statistics. The machine compares the images with the Weighted Nearest Neighbor classification such that the Fisher discriminant scores of the content descriptors were used as weights.

Shamir obtained 26 paintings known to be by Pollock and a second set of counterfeits. The pieces were normalized to contain 640,000 pixels and then divided up into 16 equal-sized areas. Twenty works were used to train the software, then the remaining six were used for testing purposes. The analysis was then repeated multiple times to provide a greater statistical power. Given these paintings as input, the computer was accurate over 90 percent of the time when asked to determine whether a painting was real or fake. But previous computer analyses had suggested Pollock’s style evolved over time, so Shamir went back and found 26 paintings that were done in the first half of the 1950s. With these as the training set, the accuracy reached 100 percent. This shows the true power of machine learning. It is amazing how a computer can quantify the details at the pixel by pixel level once a painting has been digitized and “see” details and patterns that we do not consciously detect.

July 1 2015 / Day 8

We continued our discussion about optimization. One way this can be achieved is by parameter tuning, beginning with an objective function then going to an optimization algorithm. A loss function measures unhappiness if you use the weighted vector to make a prediction on x but the correct output was y. Examples of loss functions for binary classification include zero one loss, perceptron loss, hinge loss, and logistic loss. Examples of loss functions for regression include squared loss and absolute deviation loss.

A loss minimization framework helps to minimize the training loss (also known as the training error or empirical risk), which is the average loss over all the training examples. However, there are two losses that are typically used to minimize the training loss. The first is the mean, which tries to accommodate every example in the training data set. The second is the median, which is more robust to outliers.

A general approach is to optimization is to use iterative optimization, which essentially starts at some starting point w and tries to tweak w so that the objective function value decreases. To do this, we rely on the gradient of the function, which tells us which direction to move in to decrease the objective the most. This is called gradient descent. Gradient descent has two parameters, the step size η (which specifies how aggressively we want to pursue a direction) and the number of iterations T. However, each iteration requires going over all training examples, expensive when you have a lot of data points.

All that’s left to do before we can use gradient descent is to compute the gradient of the objective function TrainLoss. For squared loss, the gradient descent is the residual (prediction – target) times the feature vector φ(x).

We moved our discussion to clustering and segmentation. Image segmentation is the process of identifying groups of pixels that go together and separating images into coherent objects. Clustering is the process of grouping similar points together and representing them with a single token. It is important to cluster because it helps to summarize data, count, segment, and predict. There are two types of clustering, bottom-up and top-down. One aspect I found particularly interesting was the Gestalt theory for perceptual grouping, which is the theory that grouping is key to visual perception and that elements in a collection can have properties that result from relationships. This theory makes intuitive sense but it is actually very difficult to translate into algorithms.

Agglomerative clustering sets every point as its own cluster, finds the most similar (i.e. Euclidian distance) pair of clusters, and merges it into parent clusters. This creates a dendogram. Another clustering theory is the k-means theory, which is an iterative algorithm that places centroids at random initial locations, assigns the nearest points to the cluster, and for each cluster, creates a new centroid which is the  mean of all points x. The program stops when none of the cluster assignments change. K-means++ prevents arbitrarily bad local minima by choosing the new centroids according to a probability. To evaluate the clusters, we test their generative quality  and discriminative quality.

June 30 2015 / Day 7

Machine learning is arguably the most important aspect of Artificial Intelligence. It focuses on linear classification to minimize loss. This happens by making predictions, types of which include classification (binary, multiclass, and multilabel) regression, ranking, and structured prediction. A predictor is a function f that maps an input x to an output y.

We use binary classification to predict whether an email is spam or not spam. In the context of classification tasks, f is called a classifier, and y is called a label. Binary classification could technically be seen as a regression problem if the labels are −1 and +1.

The starting point of machine learning is the data, which is the main resource that we can use to address the information complexity of the prediction task at hand. The training data is a set of examples which forms a partial specification of desired behavior of the predictor. The framework works as shown: Training data –> feature extraction –> parameter tuning –> f. Learning is about taking the training data Dtrain and producing a predictor f, which is a function that takes inputs x and maps them to y = f(x). Given an input, the predictor outputs a set of (feature name, feature value) pairs. The predictor then changes this output to a feature vector. The general principle is that features should represent properties which might be relevant for predicting y. Each input x represented by a feature vector φ(x), which is computed by the feature extractor φ. Feature engineering uses domain knowledge such as natural language (words, parts of speech, capitalization pattern) and computer vision (HOG, SIFT, image transformation, smoothing, histograms) to learn the task; later, it automatically learns the features.

A weight vector specifies the contributions of each feature vector to the prediction.

Linear predictors are based on a feature extractor and a weight vector that calculates a score.

The perceptron algorithm was originally developed by Frank Rosenblatt in the late 1950s. Training patterns are presented to the network’s inputs; the output is computed. Then the weight vectors are modified. The key idea is mistake-driven learning.

A score is how confident we are in predicting a label. A margin is how correct we are in our prediction.

A support vector is a linear combination of feature extractors that had errors.

June 29 2015 / Day 6

Today, we focused on a very integral aspect of Artificial Intelligence, games, or adversarial searches. Adversarial searches examine problems that arise when we try to plan ahead in a state where other agents are planning against us. Games are different from search problems in the opponent’s unpredictability and in the time limits involved. Because of these two restrictions, game algorithms must approximate to find a a strategy that specifies a move for every possible opponent reply.

Games can be deterministic or stochastic. They can have perfect information or imperfect information. It is often beneficial to draw a 2-player, deterministic game tree to better visualize the “max” and “min”. A game can be defined as a kind of search problem with the following elements: the initial state, players, actions, results, terminal tests, and utility.

The minimax algorithm was designed for perfect play for deterministic, perfect information games. The algorithm chooses to move to the position with the highest minimax value to achieve the best payoff against an optimal opponent. The search algorithm is complete if the game tree is finite. It is optimal against the optimal opponent. Since the utility is only achieved at the end of the search, the time complexity is O(bᵐ). Since minimax uses depth-first exploration, its space complexity is O(b*m).

α–β pruning (alpha-beta pruning) makes the minimax algorithm more efficient by eliminating subtrees that are provably irrelevant. α is the best value that max has found so far along the path to the route. if v is worse than α, max will avoid the branch (prune it). β is α but defined by the minimum. Good move ordering improves the effectiveness of pruning. However, even with α–β pruning, there are resource limits. To avoid these, we can use cutoff-tests instead of terminal tests (i.e. depth-limit searches) or we can use evaluation instead of utility (the evaluation function estimates the desirability of a position).

Expectiminimax is an algorithm for nondeterministic games that gives perfect play but also handles chance nodes. As depth increases, its probability of reaching a given node shrinks.

For games with imperfect information, the algorithm calculates the probability for each possible deal. It computes the minimax value of each action in each deal, then chooses the action with the highest expected value over all deals.

Proper analysis is the intuition that the value of an action is the average of its values in all actual states is wrong. With partial observability, the value of an action depends on the information state or belief state the agent is in.

In summary, since perfection in games is unattainable, game algorithms must approximate. Uncertainty therefore constrains the assignment of values to states. Optimal decisions depend on the information state, not the real state.