Friday, December 5, 2008

problem-solving.episode

so i decided to do do my problem solving episode on my problem set 5.

Prove that if language L has a finite number of strings, then there is is some regular expression that denotes L.

step 1: understand the problem
- the problem is proving that if language L is a set consisting of a finite number of strings, then there is some regex that denotes it.
- the unknown here is the regex denoting the language (which will be known at the end of he proof).
- the known data are that language has a finite number of strings (say n strings).

step 2: devising a plan
1) test some cases of a regex denoting the strings in L.
2) once a pattern is obvious, find a general case (possibly to help cary out the induction step.
3) using the given statement, create my own conjecture and claim to help make carrying out the proof easier using known facts that i derived myself.
4) once enough information about the language and the strings contained in it are available, start carrying out a proof of this statement.

step 3: carry out the plan
1)
testing out some cases:
over the alphabet \sigma = {0, 1}:
L_1 = {0, 1} is denoted by the regex (0 + 1)
L_2 = {0, 1, 01} is denoted by the regex (0 + 1 + 01)
L_3 = {0000, 001, 010, 00} is denoted by the regex (0000 + 001 + 0101 + 00)

2)
looking at the test cases above, it is apparent that for every string x_i in language L, where 1 <= i <= n=|L|, the regex that denotes language L is just simply the alternation (x_1 + x_2 + ... + x_n). 3) i create my conjecture based on the statement to be:

3)
P(n): If language L has n number of strings, namely {s_1, s_2,...,s_n} (where for any 1 <= i <= n, s_i is a string of any length), Then there is some regular expression denotes it.

Claim: \forall n \in N, P(n)

4)
Proof:
Base Case:
P(0): L(\null) = \null => regular expression denoting this language is \null.

Induction Step:
Assume n is any natural number. Assume P(n) holds. We hen want to show that P(n + 1) also holds. Indeed, if the language L has (n + 1) strings, namely {s_1, s_2,..., s_n, s_(n + 1)}, then there is some regular expression that denotes it, namely:
((((s_1 + s_2) + s_3)...+ s_n) + s_(n + 1))
where ((s_1 + s_2)...+ s_n) is true by the IH
So, P(n + 1) holds true. Therefore, \forall n \in N, P(n) => P(n + 1), and by PSI, P(n) is true.

step 4: looking back
the results obtained from this proof can be checked by working out some more test cases. i believe there is another method of proving this statement and that is by utilizing the length of each language to show that the language consists of finite number of strings. but i do believe my method of proof is good enough to prove this statement.

week.thirteen

on monday we touched on the last topic of the course: pushdown automata. prof heap showed us that every fsa can be translated into a cfg, ad vice versa - simply by substituting the labels of each edges in the fsa by the prefix of every terminal established in the cfg. if every right-hand side of a production has either \epsilon or a single symbol prepended by a string of terminals, we say that the cfg is right-linear cfg. if the string has length <= 1, we say the cfg is strict right linear cfg. i'm glad that we get to touch on fsa again because i get it better than cfg - and knowing that every fsa can be translated into a cdf (and the other way around) gives me a big sigh of relief because in case i don't know how to construct a cfg directly, i can certainly do it indirectly by first constructing a fsa and deriving a cfg from it. well, this is probably the last post of my slog. i hope everyone who visited my blog had (some) fun reading them. special thanks to prof danny heap for another great semester in theory - i really hope you get to teach one of my theory courses again in the future; he was good in 165, and great in 236.

and in the words of truman burbank:
good morning, and in case i don't see ya, good afternoon, good evening, and good night!

Thursday, November 27, 2008

week.twelve

reminder: assignment 3 due friday!

on monday, we learned a proving technique to prove that some regexes are impossible to be denoted by a language (such as the language that consists of the the same number of 0s and 1s. what prof heap did in the lecture look too overwhelming for me, so i am going to have to refer the course text to bette understand it. but i'm really hoping that it won't be on the test, nor in the exam - especially the fact that it involves proof by contradiction to prove such a language isn't regular.

on wednesday, we learned about a new topic on context-free grammar, which derives a lot fo its principles similar to those of a fsa. cfg does what a fsa can't do - denote an infinite language, which means that it is non-deterministic. it has a properties similar to a dfsa, namely the quadriplet G = (V, \sigma, P, S), where V is a set of variables, \sigma is a set of terminals, P is a set of production, S is the start symbol. it is good to note that V and \sigma are disjoint, and terminals work as "rules" that tells you what you "may do so", rather than "must do". the diagramatic representation of a cfg is like those of a dfsa, only there aren't any nodes, and the arrows lead to one or more (combination of) variables and terminals (due to the non-deterministic characteristic of a cfg). prof heap also showed examples of languages denoted by cdf that consists of binary strings with even number of 0s and 1s, as well as one that contains matching parentheses. it is clear how grammars can denote an infinite language because of the property that a variable can produce a variable infinitely many times as we wish (kind of like how recursion works).

on friday, we went on to proving the correcteness of the cfg that denotes the language of strings with the same number of 0s and 1s. the proving involves sort of the equivalent of state invariants, but for cfgs, denoted S =>* \alpha, which means that strig \alpha is derived from S thorugh a number of productions. i had no problem keeping up with the fsas, but this topic in particular i thought was closely related to fsas, but it turned out to be a whole other dynamic in denoting a language - and now i'm having troubles keeping up and catching up with the materials. i will try to read on them i the text nad hopefully be ready for them in the exam. also, the last assignment was due tonight, which i thought was quite challenging, but easy once i get on the rhythm of it. and question 4 seemed hard on the outside, but it's pretty simple once i figured out the machines for both languages - it's only a matter of combining the two through means of cartesian product. all in all, it was the first assignment that i actually spent time on doing - so that must count for some effort, right?

Thursday, November 20, 2008

week.eleven

reminder: problem set #6 due friday!

on monday, we continued the dfsa topic by proving the correctness of the machine - i.e: the correctness of the state invariants. the key is to set a string x as a concatenation of the string y and a symbol a (x = ya). when we assume that y takes the machine from the starting state to some state q, what will happen if the machine transitions from state q and the machine reads in the symbol a = 0, 1 - i.e.: what state would a take the machine next? this is taken care of in the induction step of the proof. the proof seemed a bit complex for me - it takes into account understanding the machine as a whole and knowing where each state transitions into upon reading a symbol from the alphabet. because of this, it is common for me to get lost in the induction step of the proof.

on wednesday, we started on he topic of non-deterministic finite state automata (nfsa). prof heap showed that every fsa and regex are the same if the denote the same language. furthermore, for every regular expressions, there's a nfsa that denotes it. the definition of a nfsa awfully similar to dfsa - the only difference being the transition function, which takes in a state, and a symbol from the alphabet unioned with the empty string, \epsilon, and it returns a set of states. an important fact is that every regex has a nfsa and every nfsa has dfsa - thus by transitivity, every regex has a dfsa (well, almost all). what about the converse? does every dfsa has a regex that denotes it? moreover, fsa machines can denote many of the language operations - namely the complement, reversal (where the starting state becomes the accepting state, and vice versa), and union. it is, however, almost imposible to construct a machine that denotes the intersection of two languages - for this case, cartesian product is used. the difference in a cartesian product machine is, yet again, in the transition function. instead of accepting a state, it accepts an ordered pair of states, and also returns an ordered pair of states - each state in the cartesian product belongs to either machine denoted by either language. this technique sure saves a lot of time rather than constructing a machine that denotes an intersection of two languages from scratch.

in friday, as it turns every dfsa has an equivalent regex that denotes it. in order to obtain a regex from a dfsa, we need to add new states as the starting and accepting states with \epsilon-transitions going out of the starting and coming into the accepting states. and start removing the states one by one until there are only the starting and accepting states left. this may sound easy aesthetically, but the process in obtaining the particular rfinal regex can be confusing enough for anyone to get lost if you miss recovering the pathway in and out of a state, while at the same time creating a regex to represent that that particular pathway. this technique is particularly challenging, but positively challenging that helps you understand the interrlationship between dfsa and regexes. i wouldn't mind doing a question like this on the final exam.

Thursday, November 13, 2008

week.ten

reminder: problem set 5 is due on friday!

on monday, we learned about a paper-and-pencil machine called the deterministic finite state automata. what it is essentially is a way to represent a language that a egular expression can't (as such the intersection of two languages, which is not impossible to do but would take a longtime to derive). a dfsa machine consists of the quintiple M = (Q, \sigma, \delta, s, F), where Q is the set of states in the machine (and is finite), \sigma is the set that contains the alphabet the machine will go over, \delta is the transition function where it takes in an ordered-pair of a state from Q and a symbol from \sigma and it returns a new state (may be the same state), s is the starting state, and F is the set of acception states (a subset of Q). in class, prof heap showed us an example of a language that consists of all strings that have even 0s and even 1s, and is denoted by a dfsa. it is interesting to see something new like an automaton because i have never come across any before, and the fun goes to determining where each labelling edge transitions to, given a state and an input symbol. although, i'm still blurry with my language operations (let alone regexes), i think i would have more fun with constructing automata than analyzing a bunch of 0s and 1s.

on wednesday, we continued on with our machine that accepts binary strings with even 0s and 1s.  in particular, we elaborated the transition function and deriving the extended transition function from it, which works almost the same way as the transition function, only instead of taking in an input symbol, it takes in a string, and the it returns a state. in short, \delta*(q, x) = q' means the string x takes the dfsa machine from state q to q'. in a sense it works like a patch where the characters of the string (from left to right) is a labelled edge, forming a path, from state q to the state q'.  the extended transition function is also used in proving that a dfsa works by means of state invariant, where the function starts from the starting state and reads in a string to one of the states in the machine - the state invariant indicates what is true about the string when it enters that particular state. we then moved on to designing a dfsa - one that accepts binary strings that are multiple of 3 in base 2. it turns out that the key idea in designing this dfsa is the left-shifting of an binary number. when x is a binary string, the concatenation x0 simply doubles the binary number's value in decimal form, and similarly x1 doubles and adds 1 as the decimal of that binary number.  i thought this experimenting of designing a dfsa is particularly interesting because it utilizes the 0s and 1s characters of binary numbers to cater to the course's convention of using the alphabet {0, 1}.  i don't find this topic more challenging than, say, proving iterative correctness because the fun goes into thinking about the state invariants and performing a trial and error conducts to determine where each symbol leads a state to another.

on friday, the problem set was due. i didn't find it particularly difficult mainly because it deals with finite number of strings. here's to hoping for good marks. we designed yet another dfsa hat accepts a binary string that includes the (sub)string 110. it's important to note that because this machine accepts a substring, any input symbols that are read after it has accepted that particular substring will be accepted regardless until the whole string is finished being inputted. furthermore, after the machine land onto the accepting state, that state also works as a trap state - any substring after 110 will transition into the state. this was another good case showing the ability of a dfsa to accept a substring, rather than a string as a whole. i don't think it was as challenging as designing a machine as yesterday's. 


Thursday, November 6, 2008

week.nine

reminder: test #2 on friday!

on monday, we started on yet another brand new topic. unfortunately, i was late to class, so i got in time when they had moved on to the operations on language. i didn't know what language was (formally), so i had a hard time keeping up with the rest of the lecture. but here's what i know about languages so far: it's actually a set; of string; over an alphabet (which is also a set, but only consists of singular symbols like {"a", "b"} or {"0", "1"}). the examples are done heavily on binary strings which only has two symbols {0, 1} so it makes it easier to follow in class than having the entire english alphabets. we then talked about regular expressions, which are closely related to the notion of language - regular expression can denote almost any language (some languages are impossible to denote with regular expressions). regular expressions themselves have a set of operations that can be applied to represent a complex regular expression: alternation (union), concatenation, and the kleene star. alternation and concatenation do what you expect them to do - either/or, and combining two regexes (order matters), respectively. the kleene star behaves in a way that it repeats its preceeding subexpression zero or more times (so the empty string also belong to the language denoted by a regex involving a kleene star). i'm really glad to have a whole topic dedicated to regexes because i ahve come across it in my 207's a1 starter code and i didn't have a clue what it was. and we are also touching on this topic in 207 as well - so it helps either way.

on wendnesday, we continued on with regular expressions. prof. heap started off with some example and asked us what language each regex on the slide denotes. i, ahem, got to class late again. consequently, i had a hard time distinguishing all from each other because they all look the same to me haha. but as the class progresses, it became clear to be the importance of brackets as logical grouping so operation like the kleene star can be applied to it. prof heap also showed us how to prove the equivalence of a language and a language denoted by a regex. this is done via mutual inclusion because languages are sets so to prove two languages are equivalent is the same as proving that they are subsets of each other. as it turns out, this principle is crucial in proving languages. even proving equivalence requris this property of a set because to prove that two regexes are equivalent is the same as proving that the languages they each denote are equivalent.

on friday, we had our second term test. i thought it wasn't all that bad. i did choke in proving the last question tho - downward() method. i guess i was still unclear about proving partial correctness and termination, and on top of that finding a suitable loop invariant. i wish someone would clarify that for me...i regret that i was late for most of the week this week because i think the times i were late the the most crucial times in understanding complex languages like regular expression in depth.

Thursday, October 30, 2008

week.eight

on monday, we discussed the topic of 'named repetition'. essentially, if a function has no loop/recursion, we can trace the sequence of statements one by one and prove that it is correct by stating the return value. otherwise, if the function contains a loop, then there exists a loop invariant that works as an indicator of the loop's iterations and it is true for every iteration of the loop. there are two things to prove when proving a program with loop(s): partial correctness and termination. we applied this to proving a program on power/exponents as an example. as it turns out, a loop invariant helps to prove partial correctness, but proving loop invariant itself may not depend on the postcondition (usually precondition, though). in the end, a correct iterative/recursive program means that the precondition implies the postcondition.

on wednesday, we continued on with our pow program. partial correctness is pretty much works like proving by simple induction (using the loop invariant), and to prove that the loop (and consequently the program) terminates is to find a (strictly) decreasing sequence such that i is the loop index and prove that E_(i+1) < E_i. what i thought the hardest part in proving an iterative program is finding a suitable loop invariant, because they could be anything and one is better than the other - the challenge is finding one that is the best. one thing for sure that i find is that the expression(s) in the loop invariant will involve some (if not all) the program's variable - but it is also important that we don't use these variables explicitly in the induction, rather set the variale, say v, as v_i, where v_i is the value of v in the ith iteration.

on friday, i was late =D. i shall refer to the course slide to catch up (hopefully).