Physical page allocation

As has been covered before in the explanation of virtual memory, we have seen that in our models memory is divided up into pages. On Intel style chips these pages are 4 KB in size, and the physical pages don’t over lap.

Now, to keep things clear, physical pages are pieces of memory that exist in the memory banks of the PC. Virtual pages are address ranges mapped to these physical pages, so we can decide which address we want to give a physical address (Which is a very powerful process, mind you).

So in order to keep straight where we can map these virtual pages, there is a need for something called a page allocator. Knowing all this, lets dig into how this works (once it’s clear, it is surprisingly simple).

First off we have an array of integers, each entry representing a collection of 16 physical pages. The value of these entries generally is the index of the next free page set. If this is not true, the value of these entries is a negative value to indicate that the page set is not allocatable.

On allocation a global integer, called first_free is read, which indexes the page set entry. Now the following steps occur.

  1. first_free is stored in a temporary variable.
  2. first_free is set to whatever the entry at index first_free is
  3. The entry at index first_free is marked as allocated
  4. The value of the temporary variable is multiplied with a certain value to find the pointer and is returned.

In the case of deallocation the reverse happens as follows:

  1. The pointer is divided into an index.
  2. The entry at index is set to first_free, which marks is as allocatable.
  3. first_free is then pointed to index.

These actions are done atomically. That means that even if the task is scheduled out, no other task is allowed to enter the critical parts of these functions.

For future developments it is even possible to make a list for each CPU. This in combination with disabling the interrupts for the CPUs in progress of running the functions, should allow for the removal of the locks, in turn allowing for faster execution.

The only reason for there to me locks in the code with this adaptation would be for one core to nick a couple of pages from another one, because it ran out of pages itself, but because of the fact that each core has its own lock, the chances of a lock actually coming into action are greatly reduced, until memory starts running out.

Bash 101 – files

Now we have a firm grasp of what directories do, we can start looking at the interesting bits. These bits are where the actual data is stored, namely files. The files we will be using mostly, are text files.

A file most systems have (and if they don’t they should) is the “.bashrc” file. Note that this file starts with a period. Files that have this pattern in their name are called hidden because they aren’t showed in a regular listing. To show those files, one can use the “-a” flag in the ls command.

In any event let us study this “.bashrc” file. To do this we’ll start out with the following command: “file ~/.bashrc”

This shows us the path to the file, ad then something more interesting: “Bourne-Again shell script, ASCII text executable”. (Or so it says on my system, output may vary from system to system, especially when using bash on a Windows platform.

Basically if the file command says a file is text, it means it can be read by humans. Dependent of the data structure in the file, the exact output may vary though. In order to read our .bashrc file, we need to dump it to the terminal, for which we use the “cat” command, so let’s do: “cat ~/.bashrc”.

Now this action will show us a lot of weird stuff, we don’t understand yet, or be essentially empty, dependent on the installation, and how your system admin has configured it. One of the first thing Bash does once it is started, is look into this file, for commands to run. This is very powerful, as your entire bash environment can be configured by the hand of this single file.

cat tells us the content, but we still don’t have an idea what to do with it. If we’re lucky this file is a bit long, so we can scroll through it, but cat doesn’t present that option, it just dumps it to screen. So in order to actually see what’s at the start of the file, we can either scroll back through the terminal (if the emulator supports that) or use the commands “more” or “less”. I personally prefer the “less” command, as it is more flexible, but for the sake of simplicity (and the fact that “more” is supported on more systems) we’re starting out with more.

So in order to look though this file, we simply replace “cat” with “more” in the previous command, and we should get a dump, that only fills the screen and not anything more.

In order to see more of the file (you see what they did there?) either enter or space is pressed. There are more options but those are beyond the scope of this document. Enter will only scroll one line, while space moves down a complete screen size. The choice is up to you. In order to quit “more” just press ‘q’.

To use “less” the same rule applies as with “more”.  less is a bit more complex, as it also supports scrolling upward. It uses so called “vim” keys to do this. This is because the text editor vim is well known for the use of these keys in order to do scrolling (vim is beyond the scope of this document and I recommend using nano or mcedit to edit documents).

These vim keys are ‘j’ to scroll down, and ‘k’ to scroll up. In order to quit, again, use the ‘q’ button.

Now there is also a quick way to create files from bash, but in order to understand this we first need to understand the way the computer interprets files. Now this isn’t standard across all operating systems, but is true for all Unix like (and by extension all Linux systems).

Now in the early days of computing every form of communication with other devices had their own descriptors. The hard disk, network, printer, screen, keyboard, tape drive and all other devices imaginable had their own, incompatible interfaces. In the early 70’s Ken Thompson and Dennis Ritchie, working on the Multics system on behalf of Bell labs, were pulled out of that endeavour, and had to go back to Bell labs, where every thing was pretty primitive, compared to Multics.

As the story goes, they had an old device lying around. For this device they wrote a simple version of Multics, named Unics (later known as Unix). In order to keep things simple, they created an interface that could be used for everything. This device became the file descriptor. This means, that today, the input you create is first written to an internal file, which is then handed to Bash, which writes it to another file, which in turn places it on the screen.

Since applications also can read from and write to files, and every form of communication goes through this, the output of one command can be turned into the input of another file. This done though a process known as piping, and is commonly done through the pipe symbol ‘|’.

Since every thing is written to a file, we can also have another command create output, and make it write that output to a file using the smaller than symbol. The command that I’m referring to is the echo command. Basically all echo does, is repeat the arguments you put in. So “echo hello world” prints “hello world” to the terminal. If we can take this output and pipe it to an actual file using the ‘>’ symbol, we can now simply write a file.

Let’s try: “echo hello world > hello.txt” and then try to read it with “cat hello.txt”.

We will now see that there is a new file, with the name of hello.txt, which has the contents of “hello world”. This file can be deleted using the “rm” command described earlier.

This is a very powerful concept. The ability to pipe the output of one task to the input of another, for example, is heavily used in compilers and scripts, for example. We will not go into this matter further though as it is beyond the scope of this document.

Please join us in the next chapter on searching or return to the table of content.

Bash 101 – directories

Ok, so by now we have a clear idea of what we’re staring at, and you probably also have an idea what directories and files are, but the way they are implemented in bash remains a mystery. If you already know how directories are laid out, you might want to skip this item, or just skim over it to look for details you didn’t already know. Now this part here assumes a Unix file system layout, and even though most of it also works under Windows, I can not guarantee that it will, because I am less familiar with that environment.

By the way, directories are also commonly known as folders. If we look into how they are actually implemented though, I personally think the word directory is far better.

So first of, unlike Windows, in Unix there is no such concept of a C or D or A drive (The A and B drives are floppy drives, historically speaking). Instead in Unix the whole file system is based on the root directory “/” and all the partitions are then mounted onto this file system.

The root directory itself has several items in it. Currently all we care about is the following list of directories, which can be seen by typing “ls /”

Name Description
. The current directory
.. The parent directory
bin The binaries directory
home The directory containing the home directories

Now, ls doesn’t show the “.” and “..” directories(or at least, on my system), but they can be used. There’s one small thing here though, because “/” is the absolute beginning of the file system, there’s no way to change directories into it’s “..” or is there?

Bash will capture this for you and will navigate you automatically to the root of the file system, which basically means, nothing happens, and no error will be thrown.

On a quick note, these “.” and “..” directories are present in every single directory. They aren’t stored as such on the file system, but they can certainly be used.

The two other directories listed above, are “bin” and “home“. Bin is the directory where a lot of the commands are stored. By changing the code in one of the files in that directory, the behaviour of the system is changed. Bash itself is stored in that directory.

Home is the directory where your default directory is stored. This directory, also known as your home directory, is where you can write all your files. If we were to look into /home we would find a list of directories, each named after the user that owns it.

That means that the /home/{username} directory is where you can place your stuff. This directory can be quickly referenced, because bash has linked the ~ symbol to that path. In other words, if you were to do ls ~, you’d see the content of your own home directory.

It is tedious work to type in the path to the directory every time you want to do something over there, and that’s why we’ve invented the cd command. By typing cd followed by the directory you wish to navigate to, the context of your shell should change its directory to what you pointed it to. That means that if I were to do “cd ..” my shell would be transported to the parent directory.

Now sometimes we navigate away, by accident, to the wrong directory, and we want to go back. This can be achieved by doing “cd -“, which will navigate your shell to the previous directory. It will only store one directory back though so be careful.

We can get to our home directory in 3 ways. First of by typing “cd /home/{username}”, secondly by typing “cd ~” and thirdly by just typing “cd”.

Now directories don’t just come out of the blue, they have to be created. This is done through the mkdir command.

In order to make a directory, navigate to the directory where you want to create the new directory, and then run “mkdir {new directory name}”. Don’t worry if you get the directory name wrong, we can fix it with the mv command later.

Another way to create the same directory, is to write the absolute path of the new directory. So to wrap up that command, if we want to create a directory in our home directory, there’s 3 ways to do that. “cd” and then “mkdir {dirname}”, we could just go for “mkdir ~/{dirname}” or do “mkdir /home/{username}/{dirname}”.

The latter 2 references to the new directory is what we call an absolute path. Technically they both start at “/” and then reference the directory from there. The former one, is what we call a relative path. It is based on the current directory, and is the quicker way to go about it.

One thing to remember though is that existing directory names can often be tab completed as explained earlier.

The last thing we’re covering about directories is what to do if they have become useless. If this is the case, we can remove them. Generally this is done through rm or rmdir. Now rmdir is only used for removing empty directories, and first cleaning out a complete directory structure, and then removing the directory itself, can be quite tedious, especially when the directory tree to be removed is quite large.

The rm command however, comes to the rescue, since it has the option to recursively remove files. This is done through the rm -r command. Use this with care however, as it is very easy to remove files you didn’t intend to. Once these files have been removed they can only be recovered with special tools, and only if the disk is left alone between recovery and deletion. If the files are overwritten, there truly is no way to retrieve the files.

The syntax for rm is “rm {file}” or “rm -r {directory}”. Again, be very careful when using the recursive flag.

The final thing to cover in this chapter is links. Unix like file systems support 2 kinds of links, namely hard links and symbolic links. In order to understand these types of links, we will be digging into the file system layer a bit, to take a look into how files are stored.

Files are referenced from a directory. The reason this thing is called a directory is because it is a listing of the files in it. Files them selves are a chunks of data on the disk, called inodes. These inodes in turn have listings of blocks in which the actual data resides. These inodes describe everything about a file besides its name.

Now, in order to create a link, there’s 2 things we can do. Firstly we can create another directory entry, which references the same inode. This creates a so called hard link. To the “ls” command it will look like 2 separate files, unless the -i flag is used with ls (to show the inode numbers). When one of either entries is deleted, the other entry will remain valid.

The other option, is to create a reference to the path to the file to be linked to. This is called a symbolic link and has as drawback that if the original path is changed, the soft link doesn’t change with it, and thus becomes invalid. In “ls”, if the -l flag is used, the symbolic link stands out by having an arrow beside the name pointing to the actual path.

An important note to make here, is that this won’t work on Windows file systems. This is because FAT and NTFS seem to store the inode data in the directory entry itself. Windows has something analogous to soft links, which they call short cuts, but these seem to be incompatible with Linux. Hard links simply don’t exist on Windows.

In order to create a link the “ln” command is used. “ln” by default creates hard links and first takes the source path, and secondly the destination path. So let’s say we have a file called apple. If we then do “ln apple pear” we then have a file called apple and one called pear. However, if we change something in one file, it also changes in the other.

If we want to make symbolic links, we have to tell this to “ln” and this is done using the “–symbolic” or “-s” flag. So now if we remove the pear file, and use the “ln -s apple pear” command, the system will build a symbolic link to apple, called pear. Again the behaviour is the same.

Please joing us in the next chapter on files, or go back to the table of content.

Bash 101 – baby steps

The prompt

[Screenshot bash:1]

In the image above we see a common terminal emulator running bash. The black square with text in it is the bit we’re interested in, as that’s the actual terminal. The text we see there is what we call the prompt.

In my case it reads “bemk@bemkdeb:~$”. This might not mean much to you, so we’ll investigate this a little further. “bemk” is the username I use on my computer, so basically the prompt is showing me my username. “bemkdeb” is the name I gave to my machine when installing it. What it’s saying here is that I’m user bemk on machine bemkdeb.

There’s another strange bit here reading “:~$”. Users on Linux have a so called, home directory. This is a directory that they have file rights to. This isn’t actually true for all user accounts on Linux systems, but for now this definition will do. This home directory has the absolute path of “/home/{username}”, so in my case that would be “/home/bemk”.

“So what’s this strange wave doing there?” you might ask. Simply said the tilde is a substitute for the complete path to the home directory. If you change the current directory (I will explain this shortly) this thing will change, to remind me what directory I’m working with.

Then there’s the last bit, namely the dollar sign. What the heck is that dollar sign doing there? This dollar sign is there to denote the end of the prompt. If however I become the super user (the user that can do everything on the system) this dollar sign ($) turns into a hash (#).

The most important thing to note is that these prompts may be configured differently, and thus might not look the same on the system you’re practising with. If that is the case, you can ask the administrator to change it for you as changing the prompt is beyond the scope of this document.

Commands

If we look further, to the right of the prompt we see a white rectangle. This is the cursor. Dependent on your installation and terminal emulator this might show itself as a blinking underscore as well. On my system it’s a blinking white rectangle.

When we type stuff this is where our text goes. There are a couple of basic commands everybody should know about. Below is a short listing.

command explanation
ls Show a listing of the directory contents (on some systems dir also works)
cd Change directory
touch Create a new file
mkdir Create a new directory
mv Change the location or name of a file or directory
cp Copy a file or directory
rm Remove files or directories
rmdir Remove empty directories
cat Dump a file to screen
more Also dump a file to screen, but this time allow for scrolling
less Again dump a file to screen, but this time with more advanced scrolling
grep Find a patter in a file or multiple files
wget Download a file from the interwebz
curl The same as wget, but more advanced
man Show a manual about a command
file Tells the type of file

These commands are by no means the only commands available, but these are the basic few that everybody needs to get actual work done. All these commands are actually binary files that can be found in /bin or /usr/bin. These 2 directories are special because where to run a binary, one always has to specify the complete path, however, the binaries found in these directories don’t have that requirement allowing you to just type ls, and the contents of your current directory is dumped to screen.

Now Bash has a special feature up its sleeve. It’s often tedious work to type the entire command, and therefore there is what we call, tab completion. By typing the first few characters and then pressing tab, the system often already knows which command you want to execute, and therefore tries to complete it to the best of its abilities.

On some systems this works less well than on others. There are some systems available that take this quality very far, making the use of the terminal on those systems quite quick and effective.

One rule to remember, is that for each command to take effect one has to press the enter key. If not the shell will not know that this is the command in its full form, and that you want to have it executed.

Please join us in the next chapter on directories, or go back to the table of content.

Bash 101

As I look around me I see the command line interface or CLI for shot, being used less than it used to. Few of my class mates (a computer science course, mind you) know how to use a command line, and even fewer use it on a regular basis.

The git course I wrote earlier, actually assumed the reader had some familiarity with the command line, but from feedback I have been receiving, this very often isn’t the case.

The command line was invented to use a computer that only has a terminal. This terminal is a device which only has a keyboard, and device to “print” output to. Back in the good old days, printers and tv screens were often used as output devices. Back then a mouse was still a rodent that should be exterminated if found indoors.

Today this terminal is emulated in the computer. Connected to these emulated terminals are shells. It is the shell that we will be investigating today. Don’t expect this tutorial to go deep into the matter. It should just get you up to speed on the workings of the CLI.

There’s a number of shells out there, but for the ease of investigation we will focus on only one, named the Bourne Again Shell (bash). Bash is default on many Linux distributions, as is it distributed with the Github app for Windows.

Table of content:

  1. Baby steps
  2. Directories
  3. Files
  4. Searching
  5. File exchange
  6. File compression

Memory management

As the memory management in the Andromeda kernel is being redesigned, this could be considered as good a time as any, to explain the new memory management scheme in Andromeda. Below there is a list of articles that go into more detail, but first a quick summary on what will be covered here.

Appendix A; Conventions and best practices in Git

There’s not many rules to adhere to when using Git, but the few that do exist are listed below. It’s not mandatory to adhere to them, but when working in larger project groups, the value of some of these best practices quickly becomes obvious.

  1. Once something is published upstream, do not go do an interactive rebase on it to remove it.
    Published is published, and to remove part of the history, while others are depending on it, is probably useless as they might just as well continue on their own branch anyway. If however they take your request into consideration, it will take effort on behalf of all the project members to take care of it.
    Make sure that what you are about to publish upstream is good enough for upstream and then leave it as it is.
  2. Because many git users are working in a terminal, with a limited width (some people may add still to this phrase), it is best to keep your messages within the boundaries of this terminal width, of generally 80 characters.
    Because of some indentation in Git log messages, the convention of 52 characters for the first line in the commit message and 72 characters of width for the rest of the message is recommended.
  3. In order to keep your self, or somebody else, sane when you are going back through the git log, or have found the commit in question using git bisect and want to know what it did, make sure you’ve got a clear commit message format.
    Most projects outline the subsystems in the first commit line, along with a brief description. Paragraphs are then separated by white lines and the commit message below the first line goes into more detail on what the commit actually does. For example:

    mm: vm: range_alloc: Add alloc buffer
    
    In the old situation, allocation of range descriptors was a potential
    problem when the allocator ran out of memory because:
    
    1) The VM descriptor requires the object allocator for range desriptors
    2) When out of memory, the allocator requires a new VM range
    3) The allocator could provide a descriptor for the required range.
    
    In order to solve this, a buffer has now been created to stash some
    range descriptors.
    
    Each allocator is supposed to call the update function, in order to make
    sure the range descriptors are topped up once the descriptors in buffer
    dip below the maximum allowed.
    
    This should make it possible to always have a range descriptor
    available, up until the point where memory truly runs out.
    
    Signed-off-by: Bart Kuivenhoven <bemkuivenhoven@gmail.com>
  4. Dependent on the project rules it might be important to use the Signed-off-by: line, which denotes a developer certificate of origin. Basically it is a legal way of saying: “Either I wrote this, aware that this will fall under the terms of the project, or somebody else did, and gave me permission to publish it.” Here is a link to a more frequently used certificate of origin.
  5. Git is good with branches, but don’t push it. Often people are seen making a branch for just about anything, making tracking of causality, and bug finding a pain in the neck at times.
    Branches are meant to have clear topics, and clear topics only. When working on two branches, continuously merging them together, to make sure that the other branch can be developed further, it is probably better to just work on a single branch, unless there is a very good reason not to.
  6. Don’t work on master, but use topic branches. Each topic should have its own clear purpose. Only when two branches start forming a dependency, should they be merged together, and probably go further as one.
    A topic branch may only be merged with upstream, to receive updates critical updates to dependencies, and may only be merged to upstream, when it is stable, and certain not to break anything.
  7. Keeping the build alive. In the entire tree of your git repository, there should be no commit in which compilation fails. If there is, the finding of bugs can be hindered, which is sure to cause pains in the neck at some point.

Take a look at the table of content page for more interesting links.

Version control

The extent of this article

Version management is an important aspect of product development. This article was written to educate the audience about the different forms of version management, as well as what the different storage models in use by the relevant tools are.

Also this article will cover some of the common mistakes made, and a quick note on how to prevent this from happening. Now lets get cracking at the main article.

Definition

This you probably have seen many times over: software with a version number. For example WordPress 3.4, Linux 3.2.0 or MySQL 5.1. Now you might have wondered what that means and basically it all comes down to this:

A version is the final result of a development iteration or a snapshot from a given moment.

This definition does have an implication. Once a version has been labelled, nothing can be changed about that version. Sometimes that’s a burden, but there are workarounds for that.

Importance

Imagine the following. You’re in a product development team, and you have to use components from a piece of software, named foo. Everybody gets a copy of foo, and when you’re well into the development phase, all of a sudden you notice that some things are existent in your copy, that do not exist in the copies of others or vice versa. What’s the first thing you do?

If you answered you’d check the version, you gave the correct answer. The most likely cause of the issues are that not everybody is using the same publication of the same product. But what if there is no way to be sure which release you’re using and millions of euros have been invested?

Well, basically you’re screwed and if the scary thought about millions of euros doesn’t bring the message home, I don’t know what will.

How do you keep this from happening? Start keeping track of versions so they comply with the definition above.

Numbering

An interesting question to ask is how to keep track of versions. There’s a number of ways this is done. For example, some projects make a snapshot, call that version X (when its stable). Minor revisions or release candidates are used to turn version X into a stable release, before it is released as being version X.

Another way to go is to keep count of the iterations. The result of sprint 1 is numbered 1, and the same is true for every other iteration. No thought is put into stabilizing each version, as that should have been done in the iteration itself, and if the time provided by the iteration isn’t sufficient, the iteration is simply extended, to actually stabilize the product, before releasing it.

Yet another way is roughly the same as for the snapshot, but instead of a simple number, the release date is used for version numbering. Ubuntu is one of the projects doing this. For example the 12.04 is the latest release, published in April-2012. Release 12.04.1 will be the first patch, to continue into 12.04.2 all the way until support for this release is dropped.

Something that is generally done, besides version numbers is to actually name the release. For example Ubuntu 8.04 was called “Hardy Heron”, Mac uses cat-species names, and Debian uses the names of figures in Toy Story. These names usually refer to the latest release of the same major release.

Maintaining

This is maybe the most difficult part. Not so much to understand, as the whole concept of version keeping is pretty simple, but the difficulty is more in actually doing it, and there are several ways to get from version to version, and to maintain multiple versions at once.

Firstly this chapter will cover how to get from version X to version Y, and secondly we’ll go into maintaining version X while developing version Y (note that this second phase requires some parallel thinking on top of the normal development process).

Active development

This part will probably get the most attention, so we’ll kick of with that. A common way of developing a product is in a project group. This project group will have documents to share such as the blue prints, source code and project management documents.

Most projects organise them selves in such a fashion that these documents are available in a central place. In the case of software development this is called the source repository. Each time a change set is made either the file is copied and then replaced, or if a tool such as SVN or Git is used, the file is just amended, and the source control system takes care of preserving the older states. A third way of doing this is by keeping track of the change sets them selves, with so called patches. An example of which is showed in the figure 1.

Once the product is ready to be assigned a version, the time has come to make a copy of the latest version of the files and put them into a directory or as is done with source control systems, a tag is created (which does basically the same).

Maintaining multiple releases

In most development strategies, the new versions include new features, and improvements on the existing features, while older releases only get bug fixes and security patches. This sometimes presents a little bit of a challenge. For example, when fixing a bug, this should be done in the oldest version first, only to apply all changes to make it work with the newer versions.

Doing it the reverse way, will often work, but might in some cases lead to conflicts, and with tools like Git it is often easier to fix it in the older versions, before bringing it to the newer versions.

As the versions have been tagged, the tags can be copied into a working directory or cache, where they can be used to implement those bug-fixes. This is necessary, in order to preserve the consistency of each release. Remember from the definition, releases should not be changed, instead updates should be published.

This will create a couple of branches of development. For example if in version 7 a bug has been found that is persistent into version 8 and 9, the bug will be fixed in the branch for version 7, and it will then be merged into the branches for 8 and 9 (here is where those tools like SVN and Git start becoming real assets to the development process).

Centralised vs Distributed development

To figure out what this means, we probably should introduce you to a new definition. Lets get cracking at the centralised approach.

In centralised systems the workflow revolves around the latest revision available on the central server.

You shouldn’t have to be a rocket scientist to figure out what that means. Basically it’s the way SVN does its work. People make a copy of the latest version, do their work on top of that, and put it back in the server, applying updates before doing so, if available.

Lets see what the definition for distributed means.

In distributed systems the workflow revolves around the selected version available on the local system.

So basically this is the exact opposite from SVN. There is a local repository available, which can be used to do development, independently from what others are doing, and when the need arises, the work of others is simply merged into the local branches. This can be because others have published an update you need to continue working, or because the time has come to publish a new version.

Centralised strategies are most commonly used in projects where a clear top down approach is used. This is the project leader decides who gets access, and what gets done by whom. This can be done by making less privileged send in patches, which then have to be applied by people with commit access, so that the code can be managed on a line by line basis.

Distributed approaches are generally seen in environments where the project members get much more freedom to do whatever is necessary. The project members set up their own private and public repository, and use a central repository managed by the project leader, in order to get changes downstream. The project leader is in charge of merging the code in from the project members after they’ve sent a pull-request.

Figure 1 – Patch

From b785dcc1644e30d60507ff686202a29a766a0a28 Mon Sep 17 00:00:00 2001
From: Bart Kuivenhoven <bemkuivenhoven@gmail.com>
Date: Mon, 18 Jun 2012 19:13:14 +0200
Subject: [PATCH] mm: pte: test: Removed heap dependencies

The new code uses the stack instead of the heap for string formatting.

Signed-off-by: Bart Kuivenhoven <bemkuivenhoven@gmail.com>
---
 src/mm/paging/pte/pte_init.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/mm/paging/pte/pte_init.c b/src/mm/paging/pte/pte_init.c
index 48cade8..78b9451 100644
--- a/src/mm/paging/pte/pte_init.c
+++ b/src/mm/paging/pte/pte_init.c
@@ -130,11 +130,11 @@ x86_pte_set_entry(struct pte_shadow* pte, idx_t idx, void* phys)
                 return -E_NULL_PTR;
         if (idx > PTE_SIZE)
                 return -E_INVALID_ARG;
-
         if (pte->pte == NULL)
                 return -E_NULL_PTR;

         pte->pte->entry[idx].pageIdx = (addr_t)phys >> PTE_OFFSET;
+ pte->pte->entry[idx].present = TRUE;

         return -E_SUCCESS;
 }
@@ -229,6 +229,7 @@ void pte_dump_tree(struct pte_shadow* pte, char* prefix, int depth)
         if (pte == NULL)
                 return;

+ demand_key();
         int i = 0;
         for (; i < PTE_SIZE; i++)
         {
@@ -239,17 +240,12 @@ void pte_dump_tree(struct pte_shadow* pte, char* prefix, int depth)
                 if (depth == PTE_DEEP-1)
                         continue;

- char* pref = kalloc(255);
- if (pref == NULL)
- return;
+ char pref[255];
                 memset(pref, 0, 255);

                 sprintf(pref, "%s%X-", prefix, i);
                 pte_dump_tree(pte->children[i], pref, depth+1);
-
- kfree(pref);
         }
- demand_key();
 }

 int pte_test()

Tools

As mentioned before, there are a couple of tools available to do the work described here. To name a few of the centralised tools:

  • CVS and
  • SVN

and to name a couple of the distributed tools:

  • Mercurial and
  • Git.

Only two most popular have been described on this site, namely SVN and Git.

Common mistakes

A couple of common mistakes you should avoid while working with these version management tools.

  1. Putting confidential data into a public repository. It might seem easy to get this out, but the entire purpose source control systems is to preserve the content, which can make it surprisingly difficult to remove any kind of data.
  2. Making the change sets between commits too large. In centralised systems, whenever there’s an update from upstream, it has to be merged into the local source tree before the local changes can be published to the upstream location. This kind of discourages people to make commits, but it is better to keep the commits as small as possible, as it will keep the chances of merging conflicts as small as possible. Keep the commits small, and commit often, and the problem of merge conflicts will be as easy to deal with as it can get. Another reason to keep the change sets as small as possible is that it makes easier bug finding, by bisecting the history.
  3. Copying the source tree to another location, only to copy the files them selves back into the repository. In theory there is nothing wrong with this approach, but in practice, people tend to first update the local repository, before copying the updated files into the local repository before committing. This results in an absence of merge conflicts, but does revert all the changes made in the updates applied to newly updated files. Always make sure the local changes are present in the local repository before updating, to prevent the undesired roll back of the changes in the updates. An easier way is to just work in the files in the local repository (prevents a lot of frustration and headaches).

Wrapping it up

So a version is the final product of an iteration or a snapshot from a given time. Managing those versions is desirable so there is consistency in the distribution of the product downstream.

Numbers can be assigned in a couple of ways, the most popular being the iteration and snapshot count, code names are a nice asset for making talking about a specific version a bit easier.

Development requires all change sets to be preserved, which can be done using source control systems, and maintaining multiple versions should be done by using maintenance branches.

Centralised systems are used in top-down environments, whereas distributed is used in peer-to-peer environments.

For further reading, I’d suggest the following wikipedia pages:

http://en.wikipedia.org/wiki/Versioning

http://en.wikipedia.org/wiki/Version_control

http://en.wikipedia.org/wiki/Release_management

So that’s it, and don’t forget to be awesome!

Synchronising

Now we have established our concept of references, tags, branches and commits, we can continue with keeping up to date with upstream and publishing the work.

There are several ways of synchronisation, the first being simply letting git fetch the changes for you and while it is at it, merge them in (the git pull command). Another option is to apply patches, obtained through e-mail or otherwise (although e-mail is the most likely candidate for this).

Upstream

We will kick off with staying up-to-date with upstream. Here we assume that upstream has a public repository, and the reference is called origin. Also we assume upstream can be trusted, not to introduce any strange things into the repository. So here we go:

git fetch origin

Congratulations, you now have the updates in your tree. The updates are now on the system and in your repository, but they still are useless to you. If all that’s needed is to find the latest version, to compile from source, it probably could’ve been easier to use git pull, which essentially is nothing more than a “git fetch” and “git merge” combined, to also update your working branch. The syntax is simple:

git fetch [remote]
git pull [remote] [branch]

Noteworthy is the fact that git pull requires a second operand, namely the branch to pull. As git doesn’t discriminate between branches, the branch to pull will have to be selected, in the same way as it would be if the branch were merged into the local branch manually.

There are some things to consider before merging in updates from upstream when in a development branch. Development branches should only be updated, when relevant changes have been made in the upstream branches. This means that development branches rarely get updates.

Publishing through a public repository

In the first article on how to get started, setting up a server was discussed. Now it is time to start using it. The workflow in git based projects, generally is as follows.

There is a main repository, which is maintained by the project leader, who has lieutenants to keep track of the different projects of the system. Each of those lieutenants has their own public server, from which the project leader pulls the updates. The lieutenants in their place either pull their updates from the public repositories owned by the developers, or through patches.

The easiest way is to publish your work on a public server, and send the lieutenant or project leader a pull-request. Although github has it’s own style of doing this, most projects appreciate the nicely formatted e-mail, composed by the git request-pull command.

First things first though. Publishing to the public repository. If the refs have been set up this should be a fairly straight forward procedure, using the “git push” command. If not, set up your refs first, and then come back.

Are you ready? Ok, here we go.

 git push [remote]

As I said, pretty straight forward. This command pushes all branches to the public repository. It is understandable if not all branches are desired to be pushed out, so that’s why there is the option to publish only a few branches. This is done by appending the branch names to the command above.

Dependent of the way things are done, the public repository may request a password, or use an ssh key file on your system, when updating. Once this has been done, the public repository is contains your updates and you will be able to let other people get those changes from your public repository.

As mentioned, asking people to get your changes, usually goes by the means of a pull request. Github has a really simple interface for this, and for most projects this is good enough, but projects that require some sort of consistency in the pull-requests, generally format an e-mail using the git request-pull command, using the following syntax:

git request-pull [start commit] [URL]

This command requires the start commit. This can be a branch, a tag or even a loose commit. The url should point to the location of your public repository, and a reference will be replaced by the url so the other side can use it. An example of how this looks is as follows:

bemk@bemkdeb:~/devel/OSdev/andromeda/$ git request-pull testing home pte
The following changes since commit 3cbf96baa0f4a1113308db4492d848ee3dd16bb0:
mm: pte: Fix bug in pte_map (2012-06-13 00:00:21 +0200)
are available in the git repository at:
git://bemk.dyndns.info/pub/andromeda.git pte
for you to fetch changes up to 02cf7035c27c47c9240107be9a3a253096a2f9f7:
        mm: pte: Purge debug symbol from production (2012-06-13 00:07:57 +0200)
----------------------------------------------------------------
Bart Kuivenhoven (1):
        mm: pte: Purge debug symbol from production
src/mm/paging/pte/pte_init.c | 2 ++
 1 file changed, 2 insertions(+)

For most projects, if you start the e-mail with who you are and a quick human readable explanation of what you’ve done, and then put in the output of this command, the pull-request will be more seriously looked at.

Publishing through patches

Another common form of publishing your work is through patches. Git provides a nice framework for this using the following commands.

git format-patch
git sendmail
git am
git apply

How to configure git sendmail is slightly beyond the scope of this article, as it involves setting up E-mail server configurations. Fear not though, if you so desire, there are plenty of resources out there on the internet. All it takes is a quick google search query and a bit of cursing at your mail configuration.

 The “git format-patch [starting commit]” command is used to generate patch files. A patch file is a file that outlines the changes made in a specific commit. The patch files can be used as E-mail bodies, as is usually expected by communities communicating on mailing lists.

In order to use the format-patch command, a starting commit has to be indicated. Format-patch will then go through all subsequent commits, until it reaches the current commit, and format patches for each and every one of those.

When inserting the patch into an E-mail manually, take great care not to alter the file as that may render the file useless. The linux kernel documentation directory has some notes on how to do this.

Alternatively the git sendmail command can be used to send the patch out to the mailing list. This is significantly easier to do once sendmail has been set up.

Sending the patch can also be done in the attachment of the E-mail. Several projects will reject your patch because of the patches are harder to read, but when setting up a private project with friends, this will  be good enough.

“git apply” takes at least one argument in the following form.

git apply [patch]

This will apply the patch but not yet commit it, and is probably the simplest command we’ve covered so far. It takes some other arguments, but yet again, if you want to figure out what those are, that’s what the internet is for.

Because committing the changes is not part of the git apply command the authoring information is lost with this command. Preferably you should attempt to maintain this authoring information, so when committing, use the –author=”insert name here<insert E-mail here>”. Give credit where credit is due. If only to maintain legal compliance.

If you have a patch that you want to simply apply and commit, then the “git am” command is your friend. Am stands for apply mailbox, and takes the patch file or files as its argument. It will apply the patches and commit them immediately. This will set the person who applied the patches as the commiter, but in another field, git remembers the author of the patch.

This is as far as the tutorial will take us. Look at the table of content page for links to other interesting resources. Also take a look at the appendix to get to know some best practices.

Keeping track of servers

Introduction

When synchronising with the servers, which host the git repository, it is quite a nuisance to have to remember and type the URI’s every single time. To solve that problem git has a feature called references, or refs for short.

References can be managed using the git remote command. In this article we’ll only be covering adding and removing of refs. There are many more possibilities using refs, but since we’re trying to keep this simple, we won’t go into that (Google is your friend here).

The most interesting options for git remote are:

git remote add [remote name] [URI]
git remote rm [remote name]
git remote rename [old name] [new name]

In the first article we covered adding the origin reference, but didn’t really go into that. Now we’ll be touching some of the abilities of the refs.

Origin

The origin reference is basically a reference like any other. Git doesn’t discriminate between refs in the same way it doesn’t discriminate between branches. The only reason why the origin reference is mentioned is because it is convention to name your main repository URI this way. As a matter of fact, when git clone is used, the cloning URI is put into the origin reference by default.

git remote add

The git remote add command is used to add references, as we’ve seen in the first article on git. Git remote add has the following syntax:

git remote add [reference] [URI]

In this instance URI is the Universal Resource Identifier of the remote repository that is used. An couple of examples of those URI’s are:

git://orion-os.eu/pub/andromeda.git
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
https://github.com/git/git.git

What the URI looks like is determined by the layout of the server. The server administrator should be able to give you the URI, for example, github has it on its site, and kernel.org has it in the repository browser.

The other options, such as rm and rename work in a very similar fashion. Because of this we won’t go any deeper into this. For those of you that don’t have a lot of experience with the command line, rm means remove. Knowing that, git remote rm [reference] starts speaking for itself.

The next chapter tells us how to actually publish the work done. The table of content can be found here.