LearningC/LearningC_Notes at master · kjgilbert/LearningC · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
// this is a quite messy notes page, contains bits of code here and there but mostly descriptions


C is a compiled language  (uses a 'recipe' to turn commands into machine code)

R is an interpreted language  (goes more step by step, Jeremy's cooking analogy)

compile time error - compiler can't do something
run time error, after compiling, OS says program has done something wrong, usu outside the allowed memory

linker, another step, can put sets of machine code together, e.g. used to link the gsl
compiler writes machine code


commenting:

// one line of code comment
/*
       */


every C program has a "main" function
	int main
		{......
			...}

	the main function is the defining thing of what happens in the code

	so how should you plan your code?
		there will be decision points, and loops
		(or recursion, when a function calls itself)

1. Plan (pseudocode, numbered steps)
2. Compiler, linker, Unix, debugger knowledge
3. Shell script knowledge
4. Tool to analyze resultant data: R!
5. How to test and validate code? ABSOLUTELY must have
6. Source of outside help when needed
7. How to archive, make your code reproducible
8. The raw programming itself


*** COMPILING CODE ***

in the terminal:
> gcc name.c

then it makes an executable (default will be "a.out")
or can get warnings where it will have compiled anyway
or can get errors, and often gives a line number


to change that, do:
> gcc name.c -o name

run:
> ./name

chmod +x name   // to give yourself permission to run

run time errors, usually segmentation fault or bus error


*** VARIABLES ***

every variable must be declared once and only once in the code
	this means you tell C what type of data goes in that variable, and it can never be changed within the code
		type of data, size (i.e. memory amount per variable)
	possible data types: integers, floating point numbers (continuous decimals), text
		and versions within
			e.g. integers with size and whether it can be negative or not
				floating point numbers differ in size and precision
				int = integer
				double = floating point number

			declare by saying:  type  name  and potentially more ;
			end line with ";"
			e.g.  int x;
			  double y;
			  int x,y;

			every function will have a specified inpt and output, so output from another function should match input to another

			"scope" is also defined, to say when the variable is what it is
				e.g. within a block (defined by {}) then the variable isn't defined outside of that block

		a declaration doesn't make the variable anything, just defines it
		so can do:   int x = 5;
			or
				int x;
				x=5;

		loops:  for, while, if

		while(     )
		{
		....
		}


	example, don't declare the variable you want to fill your answer into within the block
	int answer;
	while(...)
	{
	answer=calc;
	}


	OR can even initially define it to see if it changes
	int answer = -1;
	while(...)
	{
	...
	}


*** WRITING CODE ***

in a plain text editor, write whatever, and save as .C at the end (a C source file)

white space is usually fine

semicolons at end of line, but not for things like loops


int answer = -1;
int x,y;
while(x < y) //or, DNE !=    , equals ==
{
...
}


C does a weird thing that usually if you have a bug that a number is showing up somewhere instead of a T/F, then if = 0 that is FALSE, >0 is TRUE

printf("hello world/n"); //this is a function


**** 9/18/13

a program to add numbers

get numbers
store numbers
run the function
get the output


functions need to have inputs, what they do, and outputs
they can only take the input you declare

the output of the function is the 'return value'


** arrays

an array is "a row of boxes", each box has a type telling it how big it should be and how it should be interpreted (integer, character...)
one array has boxes of all the same types!!!!
can be accessed by an index

declare an array by using square brackets after the name of the variable:   int a[3];  //gives 3 boxes to this array
find one value of the array by calling it with the number if the box, starting with 0!!!

2nd box would be number 1 :   a[1]

so a[0], a[1], a[2]

a by itself starts at the beginning
a[0] means go to a and just start at the first box

no error will show up if you assign something to a[3] or even a[12]
i.e. it lets you write past the end, so have to remember yourself the length

arrays are bad for ordered things; they can't be expanded, and it's hard to insert values


** define a function

int sq(int x, double y)  // to square two numbers, accepts only ints or doubles


10/9/13

Pointers


 a variable is 2 things: the location of something and the information at that location

 int i=1;
 int j=2;
 int k=j;   // this means take the value in box j and put it into box k
 if I did j=5; k would still be 2
alternatively, you might want two different names that refer to the same box value, so the same box has 2 names
this way, the values change together

we would make a pointer in this case (point to another address in memory)

int j=2;
int *k; //the asterisk makes it a pointer; always want to make it the right variable type, here int
 // it points to nothing if you do the above notation
 //if we want it to point to j, i.e. make k the location of j, not the value, the ampersand around j means take j's location
 int *k= &j;  // take the address of the box labeled j, not the contents

 //this would print the address of k:
 printf("%p\n",k)  // %p means pointer
 printf("%p %p \n", k, &j);  // should give the same thing twice, both of these are addresses
 printf("%d %d \n", *k, j) //should both be the same integer, both of these are the values of the addresses

anytime the '*' is not in the declaration statement, it will follow the address and return the value at that location   %d is for an integer


pointers are useful: 	one variable with 2 diff names
						can be used with arrays
						dynamic memory allocation ( so we don't have to recompile and change the value of our pre-processor macros)
						using arrays and other structure with functions (data structures)


e.g.  make an array:  int x[4];  //makes an array of length 4
int *y =x;   //because the name of an array is already an address, doesn't need &.  y is the location of the start of the array
y = y+2;
printf("%d", *y);   //this would print the value of the third box in the array (location 2) because of the line above it
//here it's really important to have the variable type correct, so C know to go to the second 'int' not the second 'byte' because the second byte is still in the first int b/c of how much memory it takes up
x[2]; is the same as  *(y+2);


Dynamic memory allocation:
	int *x = malloc(sizeof(int)*4);  // this function takes one argument, the number of bytes you want to allocate
			// but never want to state an explicit number b/c can vary by computer type, so instead say we want the size of something
			// the above gives an array of length 4, and it is dynamically allocated
			// same as int y[4];   <- statically allocated
			// can still do the following:
		x[0] = 1;
		y[0] = 1;
	// BUT it's nice to be able to do the following:
	int N = 100;
	int *x = malloc(sizeof(int)*N);

// with malloc (dynamic memory) it has to be given back when you're done with it, by:
	free(x);  // failing to do this makes a memory leak
	// calling something after it's been free will also give an error; it's like going off the end of an array
	// don't free something until you're done using it b/c you cannot access it later
	//in general everything is freed when the program finished running, btu sometimes it's important to free memory space in the meantime
******* so when in a loop, you want to free dynamic memory at the end of the loop each time, otherwise C will use new memory each time
* dynamic allocations also let you "use" more memory because you can ask for some amount, but if you forget to free something and have a memory leak, a program can suck up all the RAM and slow a computer to a crawl


Pointers with functions


	// square all numbers in this array


void squareArray(int *z, int size){  //make a function that has no return type (returns void)
	// return type, name, argument list
	int
	for(i=0; <size; i++) z[i] = z[i]*z[i];
	}

	main{
		int s=10;
		int *x = malloc(sizeof(int)*s);
		***fill the array with things***
		sqArray(x,s); // in spot x we MUST give it a pointer
		// so now x has been squared,

	}


// speed
 how does the number of things that you have to do scale with the problem

 Big-O notation
 describes performance

 e.g. how slow will something become when you make N really big

 O(1)   means program doesn't slow down as N increases - best case scenario
 O(n) is the worst case scenario in the week 6 code, slows down a lot as n increases

 in general:
 O(1) is best
 O(n) is the average case, on average we search through boxes and hit the one we want N/2, worst case is to search to the last box, best to search to the first box
O(n^2) is worse
O(x^n) is the worst


TO CREATE / SEED RANDOM NUMBERS

Don't usually print the seed, but I did here:

	// do this once per running the program - seed the random number generator
	// could do it like this if didn't want to print:  srand((unsigned int)time(NULL));  // cast time as an unsigned int
	unsigned int seed = (unsigned int)time(NULL);
	srand(seed);
	printf("%d\n",seed);


// EVALUATING PERFORMANCE OF A PROGRAM

see big-O notation above also
O(1) is best
O(log(N))
O(n) is the average case, on average we search through boxes and hit the one we want N/2, worst case is to search to the last box, best to search to the first box
O(N*log(N))
O(n^2) is worse
O(N^3)
O(x^n) is the worst, any way to get better than this is worth it

O could stand for 'order', e.g. order 3, to the third power, meaning the cubed term is much much bigger than any ther term and the smaller terms drop out -- this is the same as order N: the number of steps is proportional to N
our algorithm is O(N^2) because N times we draw a number ( the for loop with N in it ) and make a comparison on average 1/2 N times (to find the right parent) then we have other stuff we do, so kind of like 1 + N/2 + 1 steps total, all N times, so 2N + N^2/2 and the 2N drops out


TIMING PROGRAMS:

to time something in the command line, instead of ">  ./program" do ">  time ./program"
that will return something like: real 1:51, user 1:27, system 0:05   meaning: time elapsed in real time, time elapsed for running only the code if my computer was not otherwise occupied, and time spent on things like reading/writing files


speeding up a program, kind of like the I'm thinking of a number problem.  Much faster to guess the middle number first and narrow it down to half the possibility --> this is called BINARY SEARCH

divide set of options in half - pick the middle bin or if even, one or the other
because our 'box edges' array has all the right edges, it essentially has all the left edges too ( except the zero edge )
when we divide things in half, we have to know both edges of the boxes

double r = runif();
int guess = N/2;
while(){
	// is the guess in the right box?
	if(boxedges[guess] > r && boxedges[guess-1] <= r){ we've found the right box, i.e. the correct parent! } // BUT runif never produces 1, but it does produce zero, which is why we use <= in one side but not the other
	// if not, is our guess too high or too low
		// use an else if because the first and second might both be true and we don't want everything to happen
	else if(boxedges[guess > r]){ guess = guess/2} // this is a thing "else" space "if" and it only looks there if the first thing doesn't happen and the second thing is true
	else{guess = bigger half guess}
	}  // repeat and return to beginning of while loop until get correct guess

	// what we really want to keep track of is the possible range of numbers that could be the right answer


**** BUT instead of a while loop we're going to use a recursive equation
	this is a function that can call itself if some thing is met


	int binary( ){


		if(){
			return( binary()); // call the function again with these arguments until we get the condition we want and return answer
			}else{
				return(correctbox);  // return control of the program to whatever initally called this function
			} // VERY IMPORTANT in a call stack to make sure we do have a final return value and fdon't infinitely call the recursion statement
	}


int binary(double randnum, double *boxedges, int LowestPossibleValueOfTrueValue, int HighestPossibleValueOfTrueValue){
	int middle = (low+high)/2;
	if(randnum < boxedges[middle] && (middle==0 || randnum >= boxedges[middle-1]) ){ // BUT if runif draws a zero?, we don't want to ask for boxedge-1, so just make sure we don't do that middle-1 if middle=0, but we know if middle=0 then we know we have the leftmost box as our answer
			// so if true AND (True OR don't evaluate past there if both were true)
			// if FALSE, stops
			// if true AND ((false or true)=TRUE)
		return middle;
		}else if(randnum >= boxedges[middle]){
			return binary(randnum, boxedges, middle+1, high); // we were guessing too low at first, now just look between our highest number and our previous middle+1 as the new lowest number
			}else{
				return binary(randnum, boxedges, low, middle-1); // we were guessing to high at first, now just look between our lowest number and our previous middle-1 as the new highest number
				}

	}


in the main function:


blah blah blah

binary(randnum, boxedges(an array), 0, N-1 (size of N minus 1, the length of that array) )


the above takes about log_2_N steps  log_2(N), so 2^N =x is the answer --> so it only takes one extra guess to find the correct box when we double N because if you double N log base 2 goes up only by 1


Week 9 - Nov 6 ********  other algorithms for speed  *****

box edge method
	simple search  O(N^2)
	binary search  O(NlogN)
	educated guess search
accept/reject method


if we knew the distribution of our fitness values (the distribution of our probabilities), we could do something more educated than binary search which goes to the middle values each time
we could instead start with the largest values and go from there - the boxes get arranged differently so that they decrease in size

because we know most individuals will have high fitness since they're more likely to survive and reproduce, we should search from there - so we'll take a guess and search from there

** educated guess
randnum - draw a random number from a uniform distribution
figure out the average width of a box to then make a guess
	each box on average should have width 1/N

double meanbox = 1.0/N;  // average size of all our boxes, just do this one
int guess = randnum/meanbox;   // should put us in or close to the right box
if(boxedges[guess] > randnum){    // if the right edge of the box we guess is greater than randnum, then we're either in the right box or the right box is anywhere to the let of us
	while(guess > 0 && boxedges[guess-1 => randnum) guess-- ; // iterate down	we were too high so go through the boxes to the left until we find it
	}else{
	guess++; // else we were too low and go up to find the right box
	while(boxedges[guess] < randnum) guess++;
	}
	// guess's scope is outside the loop, so it is output at the end as the correct box (corrent parent)


** Accept Reject
	constant width bins where height of the bin is proportional to fitness
		throw a dart at the histogram, if it lands in a bin, choose that one, if it lands above a bin, throw again, repeat until hit a bin
	want to make the tallest box be at the top of the histogram to eliminate as much empty space as possible so we don't waste "darts"


//pick one of the boxes -> x and y coordinates
int guess = floor(randnum*N); // floor rounds down from a floating point number
double height = runif()*maxFitness;


>>> do while loops, do first step of the loop without checking condition, condition is checked at the end

do{}while();


	e.g.

	do
	{
	....
	}while();  // NOTE THE REQUIRED SEMICOLON HERE

// have our array of fitnesses  popfitness
int guess;
double randnum;

do{  // this algorithm always ends up in the "right box" because we use size of fitnesses, hence making each proportionally selected correctly to make offspring
	int guess = floor(randnum*N); // floor rounds down from a floating point number
	double height = runif()*maxFitness;
}while(popfitness[guess] < height); // we want to keep guess at the end again


///// *******  week 10  *****

text!
	ASCII takes letters, symbols etc and codes them as a number -- 128 total
	now Unicode is more common, a much larger table of graphic symbols/letters etc


char x= 17;  // this would require you to know the ascii table for what letter corresponds to this number
char x='a';  // so now the C compiler will use the table and assign a number to this lowercase a
	// otherwise it might think a is a variable and say it's not found

to print a character
	printf("%c \n", x);  // prints out the letter a  ALWAYS JUST ONE CHARACTER, OTHERWISE NEED A STRING
	printf("%d \n", x);  // that is the format specifier for an integer, it will print out the number for a

it's useful to know that lowercase a-z are continuous sequential numbers, but not necessarily starting at 1
same for uppercase A-Z but not necessarily next to the lowercase ones

	printf("%c \n", x+2);  // prints out the letter c because c is 2 spaces over from a

char *x=malloc(sizeof(char)*10); // make an array x, 10 characters long, we call this a STRING

a string in C is a null-terminated array of characters, it has a special ending character in the last 'box' to say this is the end of the string
	that character is \0 -- the null character for a string
	char x= '\0';

	the null character always must be the end thing

if I want to print "cat"


%s  is a string print designator


// the infinite monkey theorem

want a phrase
	a million monkeys typing, let them do it a bunch of times, eventually one will be right
	versus locking in "good" choices, i.e. mutating them randomly


recursive functions are generally good for things that need to be broken up into smaller parts, example searching, or breaking a number into factors


// there are other functions for strings in C that manipulate them: copying, concatenating, finding length of, etc.  to be learned in the future...