Spitter registration

We need to provide a way to let people register to our spittr web application, this is done here relying on the support offered by the Spring framework, that easily support the forms, giving a natural way for processing them.

The Model

A registered user is defined as a Spitter:
public class Spitter {

    private Long id;

    private String username;
    private String password;
    private String firstName;
    private String lastName;
    private String email;
    // ...
}
We'll need to check for equality on different Spitters, so we need to override both equals() and hashCode(). I did it using Objects.equals() and Objects.hash() in their implementions. I feel they helps to keep the code compact and readable without using external libraries.

We'll have a repository where storing our spitters, but we don't need to fix its type here, just the interface that each spitter repository has to implements:
public interface SpitterRepository {
    Spitter save(Spitter spitter);
    Spitter findByUsername(String username);
}
Actually, just to do some smoke testing, it is nice to have a concrete repository to play with:
@Repository
public class JdbcSpitterRepository implements SpitterRepository {
    @Override
    public Spitter save(Spitter spitter) {
        // TODO: actual implementation
        return spitter;
    }

    @Override
    public Spitter findByUsername(String username) {
        // TODO: actual implementation
        return new Spitter(username, "password", "John", "Doe", "jdoe@somesite.dd");
    }
}
I assumed it will be refactored in future to become a real JDBC repository, hence its name. Notice that the class is annotated as Spring repository.

Data Configuration

Having introduced a new repository in the web app, we should say to Spring how to get it. To do that I added a bean (in the sense of Spring Bean) in the already existing DataConfig java class, annotated as Spring configuration.
@Configuration
public class DataConfig {
    // ...

    @Bean
    public SpitterRepository spitterRepository() {
        return new JdbcSpitterRepository();
    }
}
The View

The user enters data filling a simple HTML form, something like that:
<form method="POST">
    First Name: <input type="text" name="firstName" /><br />
    Last Name: <input type="text" name="lastName" /><br />
    Email: <input type="email" name="email" /><br />
    Username: <input type="text" name="username" /><br />
    Password: <input type="password" name="password" /><br />
    <input type="submit" value="Register" />
</form>
Being no action attribute specified in the form tag, the post is done referring to the same URL accessed originally. Not much more to say on this part, you could see the full registerForm.jsp file on GitHub.

We'll also have a way to show a registration, accessing profile.jsp. Our Spring web app should take care of putting in a variable named spitter, an instance of the above discussed Spitter class, that represents the currently registered user. So, to display it, using JSP EL, we'll write something like:
<c:out value="${spitter.username}" /><br />
<c:out value="${spitter.firstName}" />
<c:out value="${spitter.lastName}" /><br />
<c:out value="${spitter.email}" />
The Controller

As usual, the controller keeps model and view together. The Spitter controller reacts to calls based in "/spitter". We want it to convert a call to "/spitter/register" showing the register form view or redirecting it to /spitter/xxx, where xxx is the username, accordingly to the HTTP command used. GET is for the first one, POST for the second. Any "/spitter/xxx" GET call will drive to the profile view.
@Controller
@RequestMapping("/spitter")
public class SpitterController {  // 1
    private SpitterRepository spitterRepository;  // 2

    public SpitterController(SpitterRepository spitterRepository) {
        this.spitterRepository = spitterRepository;
    }

    @RequestMapping(value = "/register", method = GET)  // 3
    public String showRegistrationForm() {
        return "registerForm";
    }

    @RequestMapping(value = "/register", method = POST)  // 4
    public String processRegistration(Spitter spitter) {
        spitterRepository.save(spitter);
        return "redirect:/spitter/" + spitter.getUsername();
    }

    @RequestMapping(value = "/{username}", method = GET)  // 5
    public String showSpitterProfile(@PathVariable String username, Model model) {  // 6
        Spitter spitter = spitterRepository.findByUsername(username);  // 7
        model.addAttribute(spitter);
        return "profile";
    }
}
1. This class is annotated as Spring controller, and it maps calls rooted in "/spitter".
2. We are going to use a repository that has to implement the SpitterRepository.
3. Maps a GET to "/spitter/register" to the registerForm view.
4. Redirects a POST to "/spitter/register" to the "/spitter/xxx" view. A spitter is expected as parameter, and it is going to be saved to the repository before the redirect is performed.
5. Maps a GET to "/spitter/xxx" to the profile view.
6. The username is extracted from the path by the PathVariable Spring binding annotation.
7. The repository is asked to give the spitter associated with the username, then we push it to the model.

Testing

The controller design has been driven by a couple of mock JUnit tests, using Mockito, defined in the SpitterControllerTest class. Besides, having provided a (fake) concrete implementation for the spitter repository, it is possible to run the web app and see its general behavior.

So, GETting "/spitter/register", we'll see the input form:
As soon as we push the Register button, a POST will be generated for the same URL. As seen above, the controller will redirect to "/spitter/jdoe"

So. Our app works. At least in a sort of demo mode. It is easy to spot how weak is, however. We are going to improve its robustness in the next post.

Reference: Processing forms, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section four.

Full Java Spring code available on GitHub.

Go to the full post

Showing a single Spittle

In our Spring Spitter Web App we want to display any single Spittle, just passing its id. This could be easily done accepting a request in the form of "spittles/show?id=42", following the style we have just used for getting a list of spittles, but we don't like it that much here, since we feel that "spittles/42" will be more expressive of the access to a resource.

Good for us, the Spring Framework has an handy approach to this problem.

Stripping down the view to the limit, what we want to get is something like this:
I have get it through a JSP page with JSTL tags, spittle.jsp, accessing an attribute named spittle that contains a Spittle object. Besides that minimal JSP, I have added a mock test to ensure the new functionality works fine. I have not much to say about them, and anyway you can see them on GitHub.

Let's see instead the new method added to SpittleController.
@RequestMapping(value = "/{spittleId}", method = RequestMethod.GET)
public String spittle(@PathVariable long spittleId, Model model) {
    model.addAttribute(spittleRepository.findOne(spittleId));
    return "spittle";
}
The spittle() method returns a string, the name of the view that should be generated for the caller, and expects two parameters, the spittle id and the model where we are going to push the Spittle object.
Notice that its first parameter is annotated as PathVariable, and notice also as the method is annotated as RequestMapping, with value set to a path. the path is built using curly brackets, signalling to Spring what is a variable that has to be extracted and used as parameter.
Being the class itself annotated as RequestMapping on "/spittles", the Spring framework is smart enough to understand what we want to do here.

Reference: Taking input via path parameters, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section 3.2.

You can find the code I have written on GitHub.

Go to the full post

Showing a paged list of spittles

As next step in our little Spring Web App, we are going to add in the view a way to display spittles specifying the "end" id, using the standard convention that it is excluded from the interval, and how many of them we want to show.

To do that, I firstly change the SpittleController so that its method mapped to the GET request method would accept the two parameters, giving them default values so that the previous functionality still works.
@RequestMapping(method = RequestMethod.GET)
public List<Spittle> spittles(
        @RequestParam(value = "max", defaultValue = MAX_ID) long max,
        @RequestParam(value = "count", defaultValue = DEFAULT_COUNT) int count) {
    return spittleRepository.findSpittles(max, count);
}
Actually, this version looks quite different from the previous one. The return value here is a List of Spittles, the one that was put in the spittleList model, and so we should wonder how Spring could guess which JSP call to generate a view.

However, internally, they are equivalent - if we don't consider how findSpittles() is called. The job of generating the JSP page and the attribute name is done by the mighty Spring framework, that understands that the first should be extracted from the URI passed as string to the class RequestMapping annotation, "/spittles", stripping the leading slash, and builds the latter from the method return type.

Just a minor nuisance. As the Java compiler states, "The value for annotation attribute RequestParam.defaultValue must be a constant expression". So I can't define MAX_ID converting on the fly the Long.MAX_VALUE to a String:
private static final String MAX_ID = Long.toString(Long.MAX_VALUE); // WON'T WORK!
I have to operate the conversion "by hand", instead.
private static final String MAX_ID = "9223372036854775807"; // Boring!
Besides the already existing shouldShowRecentSpittles() test, that should still work, I added a new shouldShowPagedSpittles() test that is very close to the other one but passes parameters to the (mock) call to the controller. See the tester on GitHub for more details.

Given the fake implementation I gave to the JdbcSpittleRepository, I can also see a result from the browser.
Reference: Accepting request input, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section three.

Full code for this Spring project is on GitHub.

Go to the full post

From Model to View through a Spring Controller

Another improvement to our little Spittr Spring Web App. We'll fake a JDBC model, create a controller that would get some data from it and pass it to the view, a JSP page that uses EL to access the attribute.

In the end we want to display to the user a very simple web page like this one:
enter in the details of the JSP page, you can find it in my GitHub repository, I just mention the fact that the controller puts a list of Spittles in an attribute named spittleList, and the page reads it by EL looping through a core forEach JSTL tag.

The data is moved around in a POJO, spittr.Spittle, that contains the data identifying a spittle. There is not much to say about it, besides it has its own equals() and hashCode() implementation, that are respectively based on the equals() and hash() methods from the utility class Objects:
// ...

public class Spittle {
    private final Long id;
    private final String message;
    private final LocalDateTime time;
    private Double latitude;
    private Double longitude;

 // ...

    @Override
    final public boolean equals(Object that) {  // 1
        if(this == that) {
            return true;
        }
        if (!(that instanceof Spittle))
            return false;

        Spittle other = (Spittle) that;
        return this.id == other.id && Objects.equals(this.time, other.time);  // 2
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, time);  // 3
    }
}
1. I do not expect the class Spittle to be extended. In case a subclass would be created, it's implementation won't modify the assumption that two spittles are considered equals if they have same id and creation timestamp, as stated in this equals() method. To enforce it, I mark it as final.
2. Using Objects.equals() saves me from the nuisance of checking the time component against null, keeping the code more readable.
3. Since id and time define Spittle equality, it makes sense using them to determine the hash code.

I have written a test case to ensure the Spittle class works as I expected. There is nothing very interesting in it and, as usual, you can look it on GitHub.

The model will be based on the interface spittr.data.SpittleRepository:
public interface SpittleRepository {
    List<Spittle> findRecentSpittles();

    List<Spittle> findSpittles(long max, int count);

    Spittle findOne(long id);

    void save(Spittle spittle);
}
The idea is creating a JDBC repository, for the moment I have faked it. Here is its findSpittles() method:
@Repository
public class JdbcSpittleRepository implements SpittleRepository {
    // ...
    @Override
    public List<Spittle> findSpittles(long max, int count) {
        List<Spittle> result = new ArrayList<>();
        for (long i = 0; i < count; i++) {
            result.add(new Spittle(i, "Message " + i, LocalDateTime.now(), 1.0 * i, 2.0 * i));
        }

        return result;
    }
    // ...
}
The repository interface is used in the SpittleController to find spittles an put them as attribute to be consumed by the view.
@Controller
@RequestMapping("/spittles")
public class SpittleController {
    private SpittleRepository spittleRepository;

    @Autowired
    public SpittleController(SpittleRepository spittleRepository) {
        this.spittleRepository = spittleRepository;
    }

    @RequestMapping(method = RequestMethod.GET)
    public String spittles(Model model) {
        model.addAttribute("spittleList", spittleRepository.findSpittles(Long.MAX_VALUE, 20));
        return "spittles";
    }
}
I use the concrete repository to see what happens when the app actually runs on my tomcat server, however, the real testing is performed on the interface, using the mocking features included in Spring integrated to the mockito framework. For this reason I added this dependence to the application POM.
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-core</artifactId>
    <version>${mokito.version}</version>
    <scope>test</scope>
</dependency>
I also have to tell Spring where the repository is. For this reason I created a new configuration class, DataConfig, that simply returns the JdbcSpittleRepository as a bean. This configuration class is called by the already existing RootConfig, by importing it.
@Configuration
@Import(DataConfig.class)
public class RootConfig {
}
The web app works fine both running it "by hand" and through the mock test. So I pushed the code on GitHub.

Reference: Passing model data to the view, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section 2.3.

Go to the full post

Mock test for a Spring controller

Having built a simple Spring Web App, now I want to test its only controller. In the previous post I have done it "by hand", running a browser on its url and verifying that I actually get the expected page, now I use the mock testing support provided by the Spring framework.

Firstly I add to the pom file the Spring test dependency:
<properties>
 <!-- ... -->
 <spring-test.version>4.3.0.RELEASE</spring-test.version>
 <!-- ... -->
</properties>
<dependencies>
 <!-- ... -->
 <dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-test</artifactId>
  <version>${spring-test.version}</version>
  <scope>test</scope>
 </dependency>
 <!-- ... -->
</dependencies>
Then I write a JUnit tester for my HomeController, following a simple pattern:
@Test
public void testHome() {
    HomeController controller = new HomeController();  // 1
    try {
        MockMvc mockMvc = standaloneSetup(controller).build();  // 2
        mockMvc.perform(get("/")).andExpect(view().name("home"));  // 3
    } catch(Throwable t) {  // 4
        // ...
    }
}
1. I create an object from my controller.
2. I build a mock MVC test object from my controller.
3. I perform a mock HTTP GET command for "/" and verify I get back a view named "home".
4. There is not much control on what a MockMvc would throw in case something goes wrong, we have to be ready to catch any possible throwable.

It looks very clean and simple. However I had a problem in the form of this exception:
java.lang.NoClassDefFoundError: javax/servlet/SessionCookieConfig at
 org.springframework.test.web.servlet.setup.StandaloneMockMvcBuilder.
  initWebAppContext(StandaloneMockMvcBuilder.java:329)
...
The fact is that I specified in the POM a dependency for the ancient javax.servlet servlet-api 2.5, while I should use at least a version 3 to get the job done. For my purposes, 3.1.0 is more than enough.
Pay attention to the artifact name change that happened right at the time of moving from 2 to 3. Now the servlet api is identified by the id javax.servlet-api.

Let's slightly improve my web app. We want now that the controller returns the home view while getting both root and "/homepage".

I add another test case, testHomeExtra(), that is just like testHome() but it performs a GET on the other URI.
As expected, it fails, giving as reason, "No ModelAndView found".

I change my controller, specifying at class level which URI's it has to manage:
@Controller
@RequestMapping({"/", "/homepage"})
public class HomeController {
    @RequestMapping(method = GET)
    public String home(Model model) {
        return "home";
    }
}
And now I have green light while testing.
Reference: Building Spring web applications, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section two, Testing the controller.

I have committed the project changes to GitHub.

Go to the full post

Spittr - a Spring Web Application

I am going to create a Spring Web Application using as IDE the Spring Tool Suite, friendly known as STS, version 3.8.4, without the Spring Boot support. To do that, we can theoretically use the wizard named Spring Legacy Project, but the support from Pivotal is not so strong as I expected and the result needs so many adjustments that in the end I opted to use the basic Maven Project wizard instead.

I selected the archetype with artifact id maven-archetype-webapp from the group id org.apache.maven.archetypes, gave a group and artifact id to my app, and let the wizard go.

Annoyingly, I got an error and a warning:
The superclass "javax.servlet.http.HttpServlet" was not found on the Java Build Path means that the generated POM lacks that dependency. Actually, opening the POM, I see that it lacks almost any dependency, and the only one present, JUnit, refers to the old 3.8.1 version.
The warning points out to the fact that the wizard was designed to support the ancient Java 5, and we'd probably better adjust the project also in this area.

I changed the POM adding these properties and dependencies:
<properties>
 <java.version>1.8</java.version>
 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 <jsp.version>2.2</jsp.version>
 <jstl.version>1.2</jstl.version>
 <servlet.version>2.5</servlet.version>
 <spring-framework.version>4.3.7.RELEASE</spring-framework.version>
 <junit.version>4.12</junit.version>
</properties>

<dependencies>
 <dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-webmvc</artifactId>
  <version>${spring-framework.version}</version>
 </dependency>
 <dependency>
  <groupId>javax.servlet</groupId>
  <artifactId>jstl</artifactId>
  <version>${jstl.version}</version>
 </dependency>
 <dependency>
  <groupId>javax.servlet</groupId>
  <artifactId>servlet-api</artifactId>
  <version>${servlet.version}</version>
  <scope>provided</scope>
 </dependency>
 <dependency>
  <groupId>javax.servlet.jsp</groupId>
  <artifactId>jsp-api</artifactId>
  <version>${jsp.version}</version>
  <scope>provided</scope>
 </dependency>
 <dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>${junit.version}</version>
  <scope>test</scope>
 </dependency>
</dependencies>
To get rid of the warning, I opened the Project Properties and in the page Project Facets I set Java to 1.8.
The wizard created an index.jsp page in the webapp folder. I removed it and I created instead a home.jsp in the WEB-INF views subfolder. This JSP page uses the JSTL core library and a style sheet named style.css from the resources folder. All of these details are not much relevant with the job of creating the web app, so I won't say more on them. For more reference see the full source code on GitHub.

More interestingly, let see the dispatching stream of our application. We are going to use the MVC Spring structure, at the center of it there is the DispatcherServlet that decides how to manage the user requests to generate appropriate responses. From our side, we provide a Java class to configure it. The job is simplified by extending AbstractAnnotationConfigDispatcherServletInitializer and a couple of configuration classes. Our class would be automatically seen by Spring and used to configure the DispatcherServlet.

At the end of all this dispatching and configuring, we'll have a tiny web app that would have just the home.jsp as available view.
Here is the project structure:

Our initializer should override three methods from its super class:
public class SpitterWebInitializer extends AbstractAnnotationConfigDispatcherServletInitializer {
    @Override
    protected String[] getServletMappings() {  // 1
        return new String[] { "/" };
    }

    @Override
    protected Class[] getRootConfigClasses() {  // 2
        return new Class[] { RootConfig.class };
    }

    @Override
    protected Class[] getServletConfigClasses() {  // 3
        return new Class[] { WebConfig.class };
    }
}
1. It returns the paths that the DispatcherServlet will be mapped to. We are mapping to the root directory, handling all the requests to our app.
2. It returns the configuration classes for the application context created by the ContextLoaderListener. You would usually find here middle and data tier components. In this trivial version of our app, there will be nothing in here. I still delegate to RootConfig, even if it won't do anything, to keep the code clean. More brutally, I could have let this method return null.
3. This is about the standard DispatcherServlet configuration. Controllers, view resolvers and handler mappings are returned by this method.

Let's see our WebConfig:
@Configuration  // 1
@EnableWebMvc  // 2
@ComponentScan("spittr.web")  // 3
public class WebConfig extends WebMvcConfigurerAdapter {  // 4
    @Bean
    public ViewResolver viewResolver() {  // 5
        InternalResourceViewResolver resolver = new InternalResourceViewResolver();
        resolver.setPrefix("/WEB-INF/views/");
        resolver.setSuffix(".jsp");
        return resolver;
    }

    @Override
    public void configureDefaultServletHandling(DefaultServletHandlerConfigurer configurer) {
        configurer.enable();  // 6
    }
}
1. It is a Spring configuration class.
2. This is a typical way to say that our app should enable the Spring MVC.
3. Ask Spring to enable component scanning from the passed package.
4. We extends our configuring class from this configurer adapter provided by Spring
5. We create a very simple view resolver that would set a prefix, the /WEB-INF/views/ folder, and a suffix, the jsp extension. In this way, a view named "home" would be resolved to "/WEB-INF/views/home.jsp".
6. We say to the DispatcherServlet that it should delegate to static resources the requests it gets.

Then I created a controller in the spittr.web package, so to actually provide something to do to WebConfig:
@Controller
public class HomeController {
    @RequestMapping(value = "/", method = GET)
    public String home(Model model) {
        return "home";
    }
}
Very simple, it just maps the GET requests to root returning the string "home".

Now we can Run on Server our app for the first time.
Reference: Building Spring web applications, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section one, Getting started with Spring MVC.
Full code is available on GitHub.

Go to the full post

CodeEval Following Integer

Given a number, we should return its follower, accordingly to a weird rule explained on the CodeEval page of this problem. In a few words, we can only use the digits in the number itself, but we can add a zero if we need to.

The provided samples, that I converted to Python tests, give a better idea of our task:
def test_provided_1(self):
    self.assertEqual(151, solution('115'))

def test_provided_2(self):
    self.assertEqual(2048, solution('842'))

def test_provided_3(self):
    self.assertEqual(80000, solution('8000'))
115 is followed by 151. All the other permutations of the three digits are higher than that.
There is no permutation of 842 that is bigger than it. So we need to add a zero, getting 2048.
Same goes for 8000, we have to add a zero getting 80000.

I felt it was too complicated to check the number and trying to build its follower considering its digits, and I went instead for an almost brute force approach that comes naturally from watching the test case. Check the permutations, and see which one is the next one. If there is no next one, add a zero to the number in second position and return it.

I implemented this algorithm in Python3 adding just a tiny variation. First I generate the number with an added zero and then I check all the permutations. The reason for this inversion should be clear following the code.
digits = []
for c in line:  # 1
    if c != '0':
        insort(digits, c)  # 2
while len(digits) <= len(line):  # 3
    digits.insert(1, '0')
result = int(''.join(digits))
1. Loop on the digits of the passed number, stored in the string named line.
2. I store all the digits but the zeros, sorted in natural order, in the buffer list named digits. The insort function comes from the bisect package.
3. Then I push in the buffer all the required zeros after the first, lowest, significative digit. In this way I get the smaller possible number having one cipher more than the passed one.

Now I am ready to loop on all the permutations for the passed number:
current = int(line)
for item in permutations(line):
    candidate = int(''.join(item))  # 1
    if result > candidate > current:  # 2
        result = candidate
1. I get a permutation, I convert it to an integer. See the itertools library for details.
2. If the current candidate is less than the tentative solution and it is bigger that the number passed in input, I discard the previous calculate result and keep the current one.

At the end of the loop I have in result my solution.

I feared that this intuitive algorithm was too expensive. Fortunately, I was wrong. It has been accepted with good points. Test case and python3 script are on GitHub.

Go to the full post

CodeEval Self Describing Numbers

Tell if the number passed in is self-describing, returning 1 in case of success. That number, also known as self-descriptive number, is such that it has in each digit, starting from the leftmost, the number of that digit in the number itself. If this definition looks confusing to you, you are not alone. Have a look to the CodeEval page for this problem, or to the tests here below.

Based on the provided samples, I wrote these tests:
def test_provided_1(self):
    self.assertEqual(1, solution('2020'))  # 1

def test_provided_2(self):
    self.assertEqual(0, solution('22'))  # 2

def test_provided_3(self):
    self.assertEqual(1, solution('1210'))  # 3
1. '2020' has two 0's, no 1's, two 2's, zero 3's. So it is self describing.
2. '22' has two 2's. To be self-describing it should have two 0's and two 1's. So it is not.
3. One 0's, two 1's, one 2's, zero 3's. Yes, it is.

Got it? we say it is self-describing if all its digits have the value of the respective digits counted in the number.

We could write very compact Python solutions to this problem, almost one-liners, still easy to undestand.

First part, we need to count the digits in the input string. The Counter class in collections is designed right for this task:
from collections import Counter

counter = Counter(map(int, line))
I take the input line (that I name "line"), I map its elements to int, and I use them to initialize an object of the class Counter.

Then I use the built-in all to check all the digit in the range from zero to the string size (right open interval, as python standard). If they are all the same value of the counter for the respective digit, I return 1:
return int(all(counter[i] == int(line[i]) for i in range(len(line))))
Simple and convincing, isn't it?

The complete python3 script and unit test is on GitHub.

Go to the full post

CodeEval Happy Numbers

Check if a number is happy. Meaning that, adding up the squares of its digits we get one, or a number that, following the same recursice procedure, leads to one. A better description is on CodeEval. Now I see that I have already solved this problem a few years ago using C++ as implementation language. This time I used Python, and I have to say that the code looks more readable to me.

The core of the procedure to check if a number is happy requires to add on all its squared up digits. Thinking in a pythonic way, I have seen it as summing all the elements resulting from a custom iteration. I focused on the iterative part of this sub-problem, creating a generator that does the job:
def squared_ciphers(number):
    while number:
        number, cipher = divmod(number, 10)  # 1
        yield cipher ** 2  # 2
1. The handy built-in function divmod() returns the quotient and the remainder of the integer division by its passed parameters. I use it to get the rightmost digit in cipher removing it from number.
2. Each digit is squared and returned.

Now I am ready for the main part of the script. I am about to loop indefinitely until I see the number is happy or sad. Luckly we know that this loop is not infinite. See wikipedia for details.
def solution(line):
    candidate = int(line)
    explored = set()
    while candidate != 1:  # 1
        if candidate in explored:  # 2
            return 0
        explored.add(candidate)
        candidate = sum(squared_ciphers(candidate))  # 3
    return 1
1. If I get a 1, the originating number is happy. I can stop looping.
2. I store all the generate numbers in a set, so that I can easily check if I have already seen the current element. If so, I know I am entering a deadly loop, and so I can safely state the checked number is not happy.
3. I need to generate a new number from the current one. I call the generator to get all its squared ciphers, and I send the results to the sum() built-in function, so to get it.

Happy or not, this script nails it. You can get the full Python3 source from GitHub.

Go to the full post

CodeEval String List

In this CodeEval problem we are given a number n and a string. We have to return, as a comma separated list, the sorted collection of all the possible words sized n from the letters in the string.

I have converted the given samples in python tests:
def test_provided_1(self):
    self.assertEqual('a', solution('1,aa'))

def test_provided_2(self):
    self.assertEqual('aa,ab,ba,bb', solution('2,ab'))

def test_provided_3(self):
    self.assertEqual('ooo,oop,opo,opp,poo,pop,ppo,ppp', solution('3,pop'))
And then I spent some time thinking on them.

The first one remembers us that we need to keep only the unique letters in the string. So I would probably use a set to clean it up from repetitions.
The second one gives away a very important clue. The result is obviously the cartesian product between the first and the second letter. Since I am using Python as implementation language, I thought immediately to itertools product().
The third one tells me not to forget to sort the results. So set is useful as intermediate passage, but then I should put the items in a list, so that I can sort it. Besides, it shows me how I have to proceed, iterating on each partial solution applying each time the cartesian product between the previous result and the initial collection of letters.

Well. It looks quite easy now. It just a matter of writing the code.
def solution(line):
    n, data = line.split(',')
    letters = sorted(list(set(data)))  # 1
    result = letters  # 2
    for i in range(int(n)-1):  # 3
        result = ['{}{}'.format(x[0], x[1]) for x in product(result, letters)]  # 4
    return ','.join(result)
1. I use the set collection to remove duplicates, then I convert it to a list, and I sort it.
2. I need to keep the original letter collection aside, so that I can use it as a factor in the cartesian product, so I copy it in another list.
3. If n is 1, I don't have to do anything more, the plain list of single letters has been already generated. Otherwise have have to apply n-1 times the cartesian product.
4. The itertools product() function returns a tuple, I convert it to a plain string, and I push each result in a new list that overwrite the previous result.

There is not much left out to see, however the full code for the test case and python3 script is on GitHub.

Go to the full post

CodeEval Message Decoding

We are given in input an encoded message, and we have to return it decoded. See the CodeEval page for this problem for more details.

Only one sample is presented, and I didn't bother to convert it to a test case, I just ran my Python script passing the provided input and I had visual check on the result. The point is the problem is not difficult, it just takes some time to get understood properly. At least in my case.

However, if this is the input:
$#**\0100000101101100011100101000
We are expecting this output:
##*\$
Here is the logic behind the transformation.

The first characters represent the dictionary used in this message. We know that they are over when we read a zero or a one. In our case they are five:
$#**\
We convert them in a variable size string of zeros and ones, listing all the numbers from zero on, but skipping the ones represented by 1-only binary numbers:
$ -> 0
# -> 00
* -> 01
* -> 10
\ -> 000
If we had a sixth element, it would have been coded as "001".

Then we have a three-character block, that we have to read as a binary number, and give as the size of the following chunks in the message. Here is "010", meaning 2.
We go on reading elements of this size until when we get a terminator, represented by an all-one sequence:
00 00 10 11
We have three valid elements, and a terminator. We get the first part of our message:
# # *
Back to the size, this time is "011", meaning 3. So we get on reading three by three, until the terminator:
000 111
We already knew the only valid character sized 3 was '\', here we have one on it.

Read again the size, is "001". We read one character at the time.
0 1
There is just a single zero before we get 1, that is the sequence terminator. We add '$' to the decoded message and we read again the size, that is zero:
000
End of transmission. The message was "##*\$". Job done.

Writing a solution in Python, I used a generator to create the keys for the letters in the dictionary:
def key_generator():
    size = 1
    cur = 0
    while True:
        yield format(cur, '0{}b'.format(size))
        if cur == 2 ** size - 2:
            size += 1
            cur = 0
        else:
            cur += 1
I keep the generator status in two variables, size, that knows how long should be the generated key, and cur, that knows the binary value that should be pushed in that space.
The yield statement converts the cur value as expected, using both the built-in format() function and the format() string method. The first gets in the cur value and converts it to a string containing its binary representation. The second provides to the numbers of characters that the binary string should take.
Than we prepare to the next iteration. If increasing the cur we'd get a 1-only binary number, we skip it, resetting cur to zero, and increasing the size. Otherwise it is just a matter of increasing cur.

Now I just have to use the generator to map the provided characters to their coded representation:
mapping = {}
i = 0
for key in key_generator():  # 1
    if line[i] not in ['0', '1']:  # 2
        mapping[key] = line[i]
        i += 1
    else:
        break  # 3
1. I instantiate the generator and loop indefinitely on it.
2. If the current character is valid, I couple the key to it in the dictionary that I named mapping.
3. When I detect a '0' or a '1' I stop filling the dictionary.

I implemented the result generation with a boring double loop:
result = []
while True:
    size = int(line[i:i + 3], 2)  # 1
    if size == 0:
        break
    i += 3
    while True:
        chunk = line[i:i+size]  # 2
        i += size
        if chunk not in mapping:  # 3
            break
        result.append(mapping[chunk])  # 4
1. Convert a three character substring in an integer, considering it as binary. If it is zero, job done. Otherwise move the index in the line and go on reading.
2. Get a chunk of the specified size.
3. There is not such a key in the dictionary, meaning, we got a terminator. Interrupt the inner loop.
4. Otherwise convert the chunk in the actual character and put it in the result list.

We just have to collate the result in a string an return it to the caller.

I pushed the Python3 script to GitHub.

Go to the full post

HackerRank Binary Search: Ice Cream Parlor

Given an integer, the budget for two ice creams, and a list of integer, the price of the available flavors, return the 1-based indeces for the two chosen flavors. As stated in the page problem, we should implement a binary search in our solution.

I converted the provided samples in python tests, and I added to them a couple more when I was thinking on possible algorithms:
def test_provided_1(self):
    self.assertEqual('1 4', solution(4, [1, 4, 5, 3, 2]))

def test_provided_2(self):
    self.assertEqual('1 2', solution(4, [2, 2, 4, 3]))

def test_expensive(self):
    self.assertEqual('3 4', solution(10, [2, 2, 4, 6]))

def test_neighbor(self):
    self.assertEqual('3 4', solution(8, [1, 2, 4, 4, 8]))
Instead of using the suggestion given away in the title, I have been magnetically attracted to use a dictionary. The idea is quite simple, I use as key the flavor price, on which I have to perform the search, and as values a keep a list, so that I can push in it the position of all the flavor having the same price.
def solution(money, prices):
    counter = defaultdict(list)  # 1
    for i in range(len(prices)):  # 2
        counter[prices[i]].append(i+1)

    for price, flavors in counter.items():  # 3
        target = money - price
        if target == price:  # 4
            if len(flavors) > 1:
                return '{0} {1}'.format(flavors[0], flavors[1])
        else:
            others = counter.get(target)  # 5
            if others:
                result = sorted([others[0], flavors[0]])
                return '{} {}'.format(*result)
1. When I try to access a value in this dictionary, if the key was not already in, a new empty list is created. Using defaultdict saves my to call setdefault() on a plain dict.
2. I loop on the input array of prices, appending each flavor position (adjusted to be 1-based) to the associated key reporting its price.
3. Loop on all the price/flavors in the dictionary.
4. If the price I am ready to pay for the second flavor is the same of the amount I would pay for the first, I have only to check if the current bucket is used by more than one flavor. If so, I return the first two of them I found.
5. Otherwise, I check if there is in the dictionary a flavor for the price I am willing to pay. If I see it, I sort the two flavors position (as required by the problem) and return them.

This solution works fine, it is clear and fast, and it is accepted by HackerRank. However it is not what we are expected to produce. So I refactored it to use binary search. The rationale behind it could be thought as if we should run this algorithm in an embedded environment, and we found out that using hash tables eats too much memory, so we need to fall back to a slightly slower algorithm that uses a leaner data structure.

What I do now is putting price/flavors couples in a plain list, sort it and then checking for each element if there is a match to it among the other ones.
def solution(money, prices):
def solution(money, prices):
    counter = sorted([(price, flavor+1) for flavor, price in enumerate(prices)])  # 1
    for i in range(len(counter)):
        other = find_other(counter, i, money - counter[i][0])  # 2
        if other:
            result = sorted([counter[i][1], other])
            return '{} {}'.format(*result)
1. In this list comprehension I enumerate the prices. This generates a number of tuples in the format (index, price), I reverse them, increase the index, since we need it as 1-based, and push them in the list. Then I sort it.
2. For each element in the list, I check if there is a match. I pass to the find_other() function, see it below, the list, the current position, the amount I want spend for the ice cream. I expect it to return the other flavor or nothing, in the sense of python None, if it couldn't find it.

Here I finally perform the binary search:
def find_other(counter, pos, price):
    left = pos+1  # 1
    right = len(counter)-1
    while left <= right:  # 2
        mid = (left + right) // 2
        if counter[mid][0] == price:
            return counter[mid][1]

        if counter[mid][0] > price:
            right = mid-1
        else:
            left = mid+1
    return None
1. I don't care about the left-side elements in the counter list, they have been already checked. This saves some minimal searching time and, more interestingly, avoid the risk of a false positive. Think the case that the total budget is 2x and I am looking for a flavor costing x. I could end up finding the same x I have already found. It won't be difficult to skim this case off, but in this way we have it for free.
2. If there is anything in the current interval, I get its central point, I check it against the expected value, if it is not a match move to the side where I could find it.

Also this solution works fine and it is accepted by HackerRank.

On GitHub: The unit test, the binary search version, and the unofficial hash map version.

Go to the full post

The Spring AOP around advice

I am going to add a plain Spring component to my Spring Boot practice web application. The interesting part of it is that I am developing an AOP aspect that is designed to give a few advices in conjunction with a stated join point.

Here is the component, nothing fancy.
@Component
public class PopConcert implements Performance {
    private static final Log log = LogFactory.getLog(PopConcert.class);

    @Override
    public boolean perform(int level) throws PerformanceException {
        if (level > 8) {
            log.info("A nice gig");
            return true;
        } else if (level > 6) {
            log.info("Not in the mood tonight");
            return false;
        } else {
            throw new PerformanceException("Drunk drummer");
        }
    }
}
Its only method returns a boolean or throws and exception, when something really unusual happens.

I have already talked about Spring AOP with aspects in a previous post, let's see now a sligher more complex aspect, that define more advices:
@Aspect
public class Audience {
    private final static Log log = LogFactory.getLog(Audience.class);

    @Pointcut("execution(boolean Performance.perform(int))")  // 1
    public void performance() {}
    
    @Before("performance()")  // 2
    public void silenceMobile() {
        log.info("Silencing mobile phones");
    }

    @Before("performance()")
    public void takingSeat() {
        log.info("Taking seat");
    }

    @AfterReturning("performance()")  // 3
    public void applaude() {
        log.info("Applauding");
    }

    @AfterThrowing("performance()")  // 4
    public void complain() {
        log.info("Demanding a refund");
    }
}
1. Here I define a pointcut. Not a strict necessity, but it saves some typing, and should improve the code readability. I named it "performance()" and I state it refers to the method perfmorm() defined in the Performance interface. All the advices below refer to it.
2. I have a couple of before advices, as the AspectJ annotation says. Here the point is that I can have more methods associated to the same advice.
3. After the execution of the join point has been completed, Spring should follow this advice.
4. If an exception break the join point run, this advice should be followed by Spring.

As usual I let the magic of putting together everything to be performed by a Spring configuration class:
@Configuration
@EnableAspectJAutoProxy
@ComponentScan(basePackageClasses = Performance.class)  // 1
public class ConcertConfig {
    @Bean
    public Audience audience() {  // 2
        return new Audience();
    }
}
1. In this case the only component in the hierarchy based on Performance is the PopConcert class we have seen above.
2. This method let the aspect spring to life.

I have written a JUnit testcase to see it at work:
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = ConcertConfig.class)
public class PopConcertTest {
    @Autowired
    private Performance performance;

    @Test
    public void testPerform() {
        try {
            assertTrue(performance.perform(11));
        } catch (PerformanceException pe) {
            fail("Unexpected PerformanceException: " + pe.getMessage());
        }
    }

    @Test(expected=PerformanceException.class)
    public void testPerformBadly() throws PerformanceException {
        performance.perform(2);
    }
}
Running it I have the greenlight and I can see in the log that the aspect has done its job

When an exception occurs in the join point, the AfterThrowing advice is executed, and when all works fine, the AfterReturning is called:
That's nice. But let's see how put all this stuff in a single advice.

AspectJ Around

The around advice gets its join point as an annotation parameter, as usual, and it gets in input, as a parameter a reference to a ProceedingJoinPoint. What we have to do is just specifying in its body what we want to be executed before, after, and even in case of exception. In a natural way, but with a few caveats.
@Aspect
public class AudienceAround {
    private final static Log log = LogFactory.getLog(AudienceAround.class);

    @Around("execution(boolean Performance.perform(int))")
    public Object watchPerformance(ProceedingJoinPoint pjp) throws PerformanceException {  // 1
        Object result = null;  // 2
        try {
            log.info("Ensure mobile is off");
            log.info("Take place");
            result = pjp.proceed();  // 3
            log.info("Cheer");
        } catch (Throwable t) {  // 4
            log.info("Demand a refund: " + t.getMessage());
            if (t instanceof PerformanceException) {
                throw ((PerformanceException) t);  // 5
            } else {
                throw new PerformanceException("Unexpected:" + t.getMessage());  // 6
            }
        }
        return result;
    }
}
1. Our join point throws an exception, so also its Around advice has to throw it too.
2. When it does not throw an exception, it returns a value, so we have to be prepared to to the same. Notice that it returns a primitive type, but the proceed() method in ProceedingJoinPoint returns an object. Some boxing and unboxing is expected.
3. We are passing the control to Spring, that is going to give it to the specified join point.
4. If the join point throws an exception, we'll get a Throwable. It is more or less the same issue we have in (2). Since Spring doesn't know beforehand what throws it, it falls back to the mother of all exceptions.
5. It won't be a good idea to swallow the exception, the caller wants to get it. So here I rethrow it.
6. If I have done everything right, this should never happen.

I have swapped the beans in ConcertConfig. Now a AudienceAround is seen by Spring as the aspect it should care about. I run again the tester and nothing changes, just the log, and in the right way:
Reference: Aspect-oriented Spring, from Spring in Action, Fourth Edition by Craig Walls. Chapter four.
Repository on GitHub: See the JUnit test case, and browse the concert package.

Go to the full post

HackerRank Merge Sort: Counting Inversions

I found the description of this problem in its HackerRank page a bit misleading. Better trust its name instead. We have to count the inversions that occur when we run a merge-sort algorithm on list of positive integers.

So, basically, we have to increase a variable to keep track of the inversions and return it to the caller. If you know how to implement merge-sort, large part of the job is done.

There main problem is that they set the timeout very tightly, and you should get a few problem respecting them, especially if you are using an implementation language that is not among the fastest ones.

I use Python here, so I had to be careful go get an accepted solution. The have a nice, well readable python 3 code accepted I had to submit is a PyPy. Getting something accepted as Python3 required some more job.

In the end I came out with this stuff:
def count_inversions(a):
    return merge_sort(a, 0, len(a) - 1)
The required count_inversions() simply calls my adapted merge-sort() function.

Beside sorting, it also stores in the variable result the count of all the inversions, and then will return it.
def merge_sort(a, left, right):
    result = 0
    if left < right:
        center = left + (right - left) // 2
        result += merge_sort(a, left, center)
        result += merge_sort(a, center + 1, right)
        result += merge(a, left, center, right)
    return result
As you see, no surprise, it's a standard merge-sort implementation.

I have spent some sweating on merge, to save to most time I could without getting something still readable:
def merge(data, left, center, right):
def merge(data, left, center, right):
    merged = []  # 1
    result = 0  # 2

    i = left  # 3
    j = center + 1
    while i <= center and j <= right:  # 4
        if data[i] > data[j]:  # 5
            result += (center - i + 1)
            merged.append(data[j])
            j += 1
        else:
            merged.append(data[i])
            i += 1

    for i in range(i, center+1):  # 6
        merged.append(data[i])
    for j in range(j, right+1):
        merged.append(data[j])

    data[left:right+1] = merged  # 7
    return result
1. We need some temporary area where storing the merged data. I use this list.
2. Keep track of the swaps happening here.
3. The main while below runs on two loop variables, i and j, representing the indexes to the left and right elements.
4. Loop until there is at least one element on the left and one on the right to be merged.
5. If an element on the left is bigger than the ones on the right, we have a swap. Or better, as many swaps as the number of elements still on the right.
6. Take care of the leftover.
7. Replace the part of the list on which we worked with the merged elements.

Fast enough to get accepted. Test case and python script on GitHub.

Go to the full post

CodeEval Sum of integers

Given a list of integer numbers, return the largest sum in one of its subsequences of contiguous elements. CodeEval Sum of integers.

I have converted the given samples in python tests, then I added a few more cases to help me thinking about an algorithm.
def test_provided_1(self):
    self.assertEqual(8, solution('-10,2,3,-2,0,5,-15'))  # 1

def test_provided_2(self):
    self.assertEqual(12, solution('2,3,-2,-1,10'))  # 2

def test_plain(self):
    self.assertEqual(5, solution('2,3'))

def test_a_negative_first(self):
    self.assertEqual(3, solution('-2,3'))

def test_a_negative_second(self):
    self.assertEqual(2, solution('2,-3'))

def test_both_negative(self):
    self.assertEqual(-2, solution('-2,-3'))

def test_a_negative_in_between(self):
    self.assertEqual(2, solution('1,-1,2'))
The idea is, I am going to loop on the sequence keeping track of the candidate solution. I initialize it with the first value in the list, then I pass to consider the next element.
It is possible that the sum of the proposed solution with the current value is higher, so that would get the new tentative solution. Otherwise I would keep the original one.
But, consider for instance (1). If I simply add up the two values I get -8, that is better of -10. However I should consider just 2 as a better solution. To do that, I refine the check stated above. If the proposed solution is negative, there is no use in adding it up to the current value, I can discard it and keeping the new value as new candidate solution.
There is another problem, as seen in (2). What I should do when I see a negative number in the series after some positive numbers? The first idea could be consider close a subsequence and go looking for a new one. However this way of thinking is not rewarding here, where the full sequence, leads to a higher sum than its left and right parts.
To overcome this issue I introduce a second variable, acting as a tentative solution that keeps track of all the elements, negative included, until it gets negative.

After this thoughts, I written down this Python function:
def solution(line):
    head, *values = [int(x) for x in line.split(',')]  # 1

    result = head  # 2
    maybe = head
    for value in values:
        maybe = value if maybe < 0 else maybe + value  # 3
        if maybe > result:  # 4
            result = maybe
    return result
1. The input line is split on commas, its values are converted in integers and assigned to two variables, a single number, the head of the list that is managed differently, and a list of numbers, containing all the other elements.
2. I assign the first value to both solution. The currently chosen, result, and the tentative one, maybe.
3. For each following element, I check if it is worthy to keep the previous values, stored in maybe, and add the current value, or throw away all the old stuff and just keep the new value, trying to create a new subsequence.
4. Only if the tentative solution is actually better than the currently chosen one, I change it accordingly.

Well, it works. Python script and unit test are on GitHub.

Go to the full post

Plural Names Iterator

Given a file containing rules to convert a singular word in its plural, write a function that applies the best available rule for any passed name and return it in plural form.

The aim of this problem is the same of the one seen in the previous post, I am going to implement a similar plural() Python function that would get in input a word, as 'boy', and would return its plural form, like 'boys'.

The test case is just the same, plural() would be almost the same, the match_apply() closure I used to return two functions, one for checking the word against the current pattern, the other to apply the relative change to the word, is the same.
The big change is that I won't use a generator but a iterator under the bonnet, with the idea of reading the patterns from file just once, and not any time the user asks for a name pluralization.

So I create a class Rules, implementing the special methods __iter__() and __next__(), that make it both an iterable (for the first) and an iterator (for the latter), and then I would instantiate it in an object that is going to be used by plural(), instead of the generator.
class Rules:
    filename = 'plural_names_rules.txt'  # 1

    def __init__(self, filename=None):  # 2
        if filename:
            self.filename = filename

        self.rules = []
        with open(self.filename) as patterns:  # 3
            for line in patterns:
                pattern, search, replace = line.split()
                self.rules.append(match_apply(pattern, search, replace))  # 4

    def __iter__(self):  # 5
        self.index = -1
        return self

    def __next__(self):  # 6
        self.index += 1
        if len(self.rules) > self.index:
            return self.rules[self.index]
        raise StopIteration
1. By default, the rules are read from this file.
2. When initializing a Rules object, the user could change the rules file.
3. I have moved here the reading from file, so that it is done only once for each Rules object. Notice the with-as statement that delegates to Python the resource cleanup when exiting the block.
4. For each line in the pattern file, I extract the patter, search, and replace token, I pass them to the match_apply() closure, and I get back the relative match() apply() functions, that I push to the rules list.
5. I am stating that Rules is a Python iterable, providing a way to extract from it an iterator, that is the Rules object itself. Here I initialize the index to the internal list (namely rules) to the first item before the actual beginning (that means, minus one).
6. Rules is also a Python3 iterator. Each time it is required a new value from it, this method is called. If the iterator index is still in the range, the relative item is returned, otherwise, as required by the python iterator desing.

The change in the plural() function is minimal, but dense of meaning.
rules = Rules()  # 1


def plural(noun):
    for match, apply in rules:  # 2
        if match(noun):
            return apply(noun)
    return '???'
1. I need a Rules() object to work with. Here I instantiate it.
2. The for-each statement requires an iterable. The object rules has a __iter__() method, so it works fine in this role. Before executing a loop, for-each asks to the iterator provided by rules (that incidentally is rules itself) its next element. Since this is a Python3 script, the __next__() method of the iterator is called, getting the next element in the list of rules, a couple of functions retrieved from the match_apply() closure.

Reference: Dive into Python 3 Chapter 7 Classes & Iterators, sections 6.

Test case and python script on GitHub.

Go to the full post

Plural Names Generator

Given a file containing rules to convert a singular word in its plural, write a function that applies the best available rule for any passed name and return it in plural form.

My target is giving the user a function, named plural(), that gets in input a word and an optional file name for the rules to be applied (defaulted by the filename provided), and returns it as a plural word.

Here is a few test for this functionality:
def test_box(self):
    self.assertEqual('boxes', plural('box'))

def test_bush(self):
    self.assertEqual('bushes', plural('bush'))

def test_soliloquy(self):
    self.assertEqual('soliloquies', plural('soliloquy'))

def test_boy(self):
    self.assertEqual('boys', plural('boy'))

def test_vacancy(self):
    self.assertEqual('vacancies', plural('vacancy'))
The rules are in this format:
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
(qu|[^aeiou])y$ y$ ies
$ $ s
I have four rules, each rule has three tokens.
First token is the tail of the word, that I am going to check to decide how to change it. First rule applies to words ending by 's', or 'x', or 'z'. The second one to words ending by 'h', and having in the previous position a letter that is not 'a', or 'e', or ..., or 't'. The third one to words ending by 'y', preceded by 'qu' or a single letter that is not a vowel. As last resort, the fourth rule is applied to any word.
The second token states what I have to change. A plain dollar sign '$' says that I have to add something at the end of the word, withour removing anything. The couple 'y$' means I have to remove the last 'y' in the word that is going to be replaced with something else.
The third token is what I have to add to the original word to make it plural. It ranges from 's', default case, to 'es', to 'ies'.

The plural() function makes use of a generator, named rules(), that returns a couple of function, one, match() to verify if the current word matches a specific rule in the list, and another, apply() to convert a word in its plural form, following the current rule.
def plural(noun, file='plural_names_rules.txt'):
    for match, apply in rules(file):
        if match(noun):
            return apply(noun)
    return '???'  # 1
1. If we have a list of rules carefully written, we should never get here. We should always get a matching rule for each word.

Let's see the rules() generator:
def rules(file):
    with open(file) as patterns:  # 1
        for line in patterns:
            pattern, search, replace = line.split()
            yield match_apply(pattern, search, replace)  # 2
1. Using the with-as compound statement we delegate to python the nuisance of cleaning up the involved resources as we exit the block - no matter how brutally that could happen. So, here that we are opening a file, we can be sure it will be closed when leaving the scope.

The generator yield the result of calling the match_apply() function, that is going to return a couple of functions. These functions are going to use the three parameters we are passing to match_apply(), and use them in conjunction with a new parameter that they are going to receive from their caller. So we are talking about a closure.
def match_apply(pattern, search, replace):
    def match(word):
        return re.search(pattern, word)  # 1

    def apply(word):
        return re.sub(search, replace, word)  # 2

    return match, apply  # 3
1. The match() function is going to be called on a passed word, and it would apply a regular expression search on the pattern passed to the closure.
2. The apply() function would call the regular expression sub() function using search and replace parameters from the closure, combining it with its word parameter.
3. The two functions are returned to the caller.

If you follow the test run in debugger mode, you will see what actually happens.
The test calls plural(), it loops on the generator rules(), getting from the closure match_apply() the couple of functions that check if the word matches the current rule and in that case apply the change to make the word plural.

Reference: Dive into Python 3, section 6.6.

Unit test and Python script are on GitHub.

Go to the full post

Fibonacci Iterator

Write an iterator that gives back all the Fibonacci numbers up to a given value.
Use it to get the highest Fibonacci number less than 1000, and to check which if there is in that range a multiple of 12.

This problem is thought as a follow-up to the previous post, where we implemented the same functionality as a Python generator. I have extracted it from Dive into Python 3, section 7.5. The idea is showing how python generators are shortcut implementation of iterators.
class Fibonacci:
    def __init__(self, top):  # 1
        self.top = top

    def __iter__(self):  # 2
        self.a = 0
        self.b = 1
        return self

    def __next__(self):  # 3
        if self.a > self.top:
            raise StopIteration  # 4

        result = self.a
        self.a, self.b = self.b, self.a + self.b
        return result
1. Initialize a Fibonacci object setting its top value.
2. The special __iter__() method prepares a Fibonacci object to be used as an iterator.
3. Each call to a Fibonacci object as iterator resolves to a call to its special method __next__()
4. The end of iterations is signaled raising a StopIteration exception. So, in this context an exception is not exceptional at all.

Even though the implementation is much verbose, in this case the functionality is just the same. So, beside studying purpose, the generator version easily wins over this iterator one.

The user is minimally affected by the implementation change, as you can see in these tests:
def test_list_1000(self):
    fib1000 = list(Fibonacci(1000))
    self.assertEqual(17, len(fib1000))
    self.assertEqual(987, fib1000[-1])

def test_multiple_12(self):
    for candidate in Fibonacci(1000):
        if candidate and candidate % 12 == 0:
            break
    else:
        self.fail('No fibonacci multiple of 12 found!')
    self.assertEqual(144, candidate)
If you want, you could check the full code on GitHub.

Go to the full post

Fibonacci Generator

Write a function that generates Fibonacci numbers up to a given value.
Use it to get the highest Fibonacci number less than 1000, and to check if there is in that range a Fibonacci number that is multiple of 12.

You can solve this problem in a myriad of ways. However, the original idea was to push towards writing a Python generator. I extracted this example from Dive into Python 3, paragraph 6.6.1.

Here is a possible implementation:
def fibonacci(top):
    a, b = 0, 1
    while a < top:
        yield a
        a, b = b, a + b
Having a 'yield' statement in its body, we see that this fibonacci function is a generator. That is, a sort of function that behaves like an iterator. Each time we call it, it should yield a value. If not, as in this case when a is not less than top, the generator is consumed, and the iteration stops.

Let's see it in action:
def test_list_1000(self):
    fib1000 = list(fibonacci(1000))  # 1
    self.assertEqual(17, len(fib1000))
    self.assertEqual(987, fib1000[-1])

def test_multiple_12(self):
    for candidate in fibonacci(1000):  # 2
        if candidate and candidate % 12 == 0:  # 3
            break
    else:
        self.fail('No fibonacci multiple of 12 found!')
    self.assertEqual(144, candidate)
1. Here fibonacci(1000) is used to populate a list of seventeen elements, when the generated value is not less than 1000, the generation of elements ends.
2. Each time we call fibonacci(1000) a new Fibonacci number is generated, starting from 0 on.
3. Since 144 is both a Fibonacci number and a multiple of 12, we won't consume completely the generator.

Full code pushed to GitHub.

Go to the full post

HackerRank Sorting: Comparator

Write a comparator for a class Player that has, as data member, name (a string) and score (integer) so that sorting a list of players would put higher score on top. In case of a tie, names should be ordered in natural order.
I don't think this HackerRank problem was conceived with Python in mind, however there is some interest in implementing a solution in this language too.

As usual, I converted the proposed sample in a python test.
def test_provided_1(self):
    data = {'amy': 100, 'david': 100, 'heraldo': 50, 'aakansha': 75, 'aleksa': 150}
    players = []
    for k, v in data.items():
        players.append(Player(k, v))

    players.sort(key=cmp_to_key(Player.comparator))

    self.assertEqual(5, len(players))
    self.assertEqual('aleksa', players[0].name)
    self.assertEqual('amy', players[1].name)
    self.assertEqual('david', players[2].name)
    self.assertEqual('aakansha', players[3].name)
    self.assertEqual('heraldo', players[4].name)
The tie case, where amy and david have the same score, is already sorted. So I added a second test.
def test_ex_aequo(self):
    players = []
    for i in range(3):
        players.append(Player('Tom' + chr(ord('k') - i), 100))

    players.sort(key=cmp_to_key(Player.comparator))

    self.assertEqual(3, len(players))
    self.assertEqual('Tomi', players[0].name)
    self.assertEqual('Tomj', players[1].name)
    self.assertEqual('Tomk', players[2].name)
Here I have Tomk, Tomj, Tomi, same score but in reversed order.
Writing a comparator for the class Player shouldn't be difficult.
class Player:
    def __init__(self, name, score):
        self.name = name
        self.score = score

# ...

    def comparator(self, rhs):
        if self.score == rhs.score:
            return 1 if self.name > rhs.name else -1
        return 1 if self.score < rhs.score else -1
The interesting point is, do we really need it? If the comparator is commonly used by the client code for the class Player, yes, definetely. However, if this is not a standard requirement we should leave the user free to define his own comparator on the fly. Instead of using the functools cmp_to_key() function to convert a comparator in a suitable key function for sorting, we could use the operator attrgetter() function to specify on which attribute of the class the sort should act. Something like:
players.sort(key=attrgetter('name'))
players.sort(key=attrgetter('score'), reverse=True)
I am using the fact that python sort() is stable, so I firstly sort the collection by the second key of interest, in this case 'name', in natural order, and then I sort again the players by their score, this time in reversed order. The advantage of this approach is that it is much more flexible. Any user could decide how to sort its players, and the Player does not have to create a comparator each time a new way of sorting is required. I pushed to GitHub both the python script that I submitted to HackerRank and the unit test that includes also the attrgetter() variation.

Go to the full post

Phone Numbers

Write a function that checks if a given string represents a phone number.

This problem should look to you suspiciously similar to the previous one, that asked to check about Roman numbers. And actually I have extracted it from the same source, Dive into Python 3, chapter 5, that is about Regular Expressions.

So, the problem is defining a good pattern that matches the expected input. Here is a first try:
pattern = re.compile(r'^(\d{3})-(\d{3})-(\d{4})$')
I have asked to the compile() function in the python re library to compile it to a pattern object, so that I can use it to call on it its method search(), as you can see below.
Notice that I passed to compile() a "raw" string, signaled by an 'r' before its begin. It is a useful trick, so that we can avoid backslashing the backslashes to specify them as actual characters and not escape ones.
Then I say that my number should use all the characters in the string, starting from the beginning, as the caret '^' anchor specifies, to the end, given the dollar '$' sign.
In the string I define three groups, round brackets, of digits. I'm using the '\d' shortcut to mean each possible digit. The curly brackets after it give the numerosity of that element. In the first and second case we have exactly 3 digits, in the third case are four.

This pattern works alright for simple numbers, as I proved in a test case:
result = pattern.search('800-555-1212')
self.assertIsNotNone(result)  # 1

groups = result.groups()
self.assertEqual(3, len(groups))  # 2
self.assertEqual('800', groups[0])
self.assertEqual('555', groups[1])
self.assertEqual('1212', groups[2])
1. Seach succeeds.
2. In the result there are three groups, as expected, and each group contains the expected block of the phone number.

However, this pattern is too limited. We want a fourth optional group, representing the number extension; we need to accept any possible kind of separators, and even the total lack of them; and we should expect some extra leading characters that should be skipped.

Last issue is easily solved, it's enough to get rid of the caret at the beginning of the pattern. In this way the number is not forced to start at the beginning of the string.
Adding the extension group is not a problem at all. We should just be prepared to check it, expecting an empty string if it is not present.
Separators require deciding what could actually be accepted in that position. Let's be permissive and accept anything that is not a number.

These considerations lead to a new pattern:
pattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')
Notice the disappearing of the '^' and the mutation of the literal dash to '\D*', meaning any character that is not a digit - uppercase D, where lowercase d represents any possible digit - repeated zero or more times, that's the star '*'.

I have written a few tests that I pushed to GitHub, and I am quite satisfied with the result.

Go to the full post

Roman Numbers

Write a function that checks if a given string represents a Roman number.

We can keep simple the solution to this problem using Regular Expressions. Using the python re library we just have to wrap a call to its search function, passing to it the input string and an appropriate pattern. If it does not find it, it returns None.
re.search(pattern, data)
The problem is identify the right pattern. An hint, some unit test would help. Another hint, you could have a look at chapter 5 section 3 of Dive into Python 3 that uses this example as introduction to pattern matching in Python.

Assuming that the candidate Roman number should start at the beginning of the passed string, and end with it, here is a solution:
pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'
First character, a caret ('^'), acts as an anchor. The matching starts at the beginning of the passed string.
Then we have a M, saying that we expect this character in the passed string, just at its beginning (as stated by the caret before). How many of them? It is specified by the following curly brackets. Zero or more, up to three.
The following stuff is in round brackets. We have three of them. All of them are optional. And then we have a dollar sign, saying that the input string should end there. Nothing not specified in the pattern should follow, otherwise there would be no recognition.

The three optional blocks are very similar. Let's see the first one.
(CM|CD|D?C{0,3})
It says that we are expecting one of three possible alternatives. The first two are just two-character string, 'CM' or 'CD'. The third one is more interesting.

It says we are expecting a 'D'. But, wait, it is followed by a question mark, meaning that it is optional. We can have 0 or 1 of them. We could get the same result writing
D{0,1}  # same as D?
Usually the more compact 'D?' is preferred.

And finally we have zero or more, up to three, 'C'.

I pushed the python unit test I have written when playing with this regular expression on GitHub.

Go to the full post

Hackerrank Tries: Contacts

We are asked to provide a way to store contacts and check how many of them starts with a given prefix. As the name of this HackerRank problem suggests, we are gently pushed in the direction of implementing a trie. The issue is that one of the test cases that the solution has to pass to be accepted - #12 - is a bit too heavy for a plain python implementation.

My initial idea was to create a trie based on this node structure.
class Node:
    def __init__(self):
        self.count = 1
        self.children = {}

trie = Node()
An empty trie is just a node containing an empty dictionary of children.
If I add a new item on it, say a contact named 'x', I create a children with key 'x' and as value a Node created by the above constructor.
If I add a second item with the same starting letter, say 'xy, I increase the counter for the 'x' element and then create a 'y' element as its child.

I need two functions to implement the required interface, add() and find():
def add(node, data):
    pass


def find(node, data):
    pass
Given this skeleton, I written a python test unit based on the sample given by HackerRank:
def test_provided_1(self):
    add(trie, 'hack')
    add(trie, 'hackerrank')
    self.assertEqual(2, find(trie, 'hac'))

def test_provided_2(self):
    add(trie, 'hack')
    add(trie, 'hackerrank')
    self.assertEqual(0, find(trie, 'hak'))
I played a bit around, in the end I came out with this implementation:
def add(node, name):
    for letter in name:
        sub = node.children.get(letter)  # 1
        if sub:
            sub.count += 1
        else:
            sub = node.children[letter] = Node()  # 2
        node = sub  # 3
1. I am looping on each letter of the passed contact name, if the children dictionary in this level contains the relative letter, I access that node and increase its counter.
2. Otherwise I create a new node in the dictionary using as key the current letter.
3. Then I follow the recursive trie structure passing to the node in the level below.

Finding the counter associated to a prefix in a trie is just a matter of following its path in the tree.
def find(node, data):
    for letter in data:
        sub = node.children.get(letter)  # 1
        if not sub:
            return 0
        node = sub
    return node.count  # 2
1. If we don't find a letter in the path we are following in a matching level of the tree, there is no contact with such prefix. Otherwise we get the node among the children and pass to the level below.
2. If we have not intercepted a missing letter, the counter associated to the current node is the answer we want pass back to the caller.

This works fine, it is just a bit slow, with all the nodes that we create for each name we push in the trie.

We need a faster solution. A way would be changing programming language, and it looks very easy to convert this code in plain C. However, I had an idea that could make the algorithm faster.

Basically, I want to reduce the number of nodes created. When I insert the first name, for instance 'hack', I don't know if I really need to split it letter by letter. Maybe it will be the only contact starting by 'h'. So, I could keep it in a single chunk. Just one node instead of four. Good deal.
I pay it when I insert 'hackerrank', here I am forced to split 'hack' letter by letter (actually, I could have been more conservative, and keep the common prefix 'hack' in a single node. I kept it as a note. If the simpler split-all technique I went for was not enough, I would have tried this other approach). Still I let the tail of the new name, 'errank', in a single node.

Let's try it. No pun intended.

I have to store the actual piece of data in any Node:
class Node:
    def __init__(self, data=None):
        self.count = 1
        self.data = data
        self.children = {}

trie = Node()
Adding gets more complicated, since I have to think about splitting:
def add(node, name):
    for i in range(len(name)):  # 1
        sub = node.children.get(name[i])
        if not sub:
            node.children[name[i]] = Node(name[i:])  # 2
            return

        sub.count += 1
        if sub.data == name[i:]:  # 3
            return
        elif len(sub.data) > 1:  # 4
            split = sub.data[1:]
            sub.data = sub.data[0]
            sub.children[split[0]] = Node(split)
        node = sub  # 5
1. I can't use anymore the handy for-each loop, I must fallback to a plain for in range one since I need the current position in the word to split it when required.
2. When I detect a missing node I created it. The node is not associated anymore to a single letter in the name, but to the full chunk starting from the current position to the end of the word.
3. When I get a match, things are getting a bit more complicated. But not in this specific case, when the tail of the current word is a match with the content of the current node. Job done!
4. Here is the real extra work. The current node contains a chunk of two or more letters, and they don't match with the chunk from the new name. I need to split, I don't do any check on what I am splitting, I simply slice off one letter from the current chunk, let the first one in this node and put the rest in the child relative to the first letter of the split.
5. Back on the original track. I move to the node in the level below, and I am ready to the next step.

Finding stays simple, just a minor change:
def find(node, data):
    for i in range(len(data)):  # 1
        node = node.children.get(data[i])
        if not node:
            return 0
        if node.data.startswith(data[i:]):  # 2
            break
    return node.count
1. We need to know the position in the current word, here too.
2. We find the node when the data in it contains the current chunk of the new name, starting from its current position to its end.

Now the algorithm is fast enough so that also this python implementation gets fully accepted. I pushed on GitHub both versions, the "plain" one and the "chunk" one, and the relative unit tests.

Go to the full post

Tree traversal

Given a Binary Search Tree, give its node values following the in-order, pre-order, post-order, and level-order traversals, using both recursive and iterative algorithms. Actually, the HackerRank problem that gave me the idea for this post asks only to implement the level-order traversal algorithm, and you can write the code as you like, so probably in the iterative way, that comes more natural in this case.

Firstly, I sketched a BST on paper, to better visualize the problem. Nothing fancy, but not too basic too. Something like this.
Then I wrote the unit test for the eight functions I am about to write, using Python as implementation language. That lead me naturally to describe the data type Node that they all use, and setting up each test translating the picture I had of my BST in a data structure in my code.
class Node:
    def __init__(self, data, left=None, right=None):
        self.data = data
        self.left = left
        self.right = right

# ...
  
class TestCodeEval(unittest.TestCase):

    def setUp(self):
        node1 = Node(1)
        node2 = Node(2, node1)
        node4 = Node(4)
        node7 = Node(7)
        node5 = Node(5, node4, node7)

        self.root = Node(3, node2, node5)
As you can see, I gave to Node the bare minimal stuff to make it work.
def test_in_order_r(self):
    self.assertEqual([1, 2, 3, 4, 5, 7], in_order_r(self.root))

def test_in_order_i(self):
    self.assertEqual([1, 2, 3, 4, 5, 7], in_order_i(self.root))

def test_pre_order_r(self):
    self.assertEqual([3, 2, 1, 5, 4, 7], pre_order_r(self.root))

def test_pre_order_i(self):
    self.assertEqual([3, 2, 1, 5, 4, 7], pre_order_i(self.root))

def test_post_order_r(self):
    self.assertEqual([1, 2, 4, 7, 5, 3], post_order_r(self.root))

def test_post_order_i(self):
    expected_values = [1, 2, 4, 7, 5, 3]
    result = post_order_i(self.root)
    self.assertEqual(len(expected_values), len(result))
    for expected, actual in zip(expected_values, result):
        self.assertEqual(expected, actual)

def test_level_order_r(self):
    self.assertEqual([3, 2, 5, 1, 4, 7], level_order_r(self.root))

def test_level_order_i(self):
    self.assertEqual([3, 2, 5, 1, 4, 7], level_order_i(self.root))
The 'r' or 'i' at the end of the function names hint which version of the algorithm I plan to implement in there, recursive or iterative.
I have written the post-order iterative test in a slightly different way to hide the fact that I let this function return a deque instead of a plain list. Just to stress that is a bit different from all the other also when it comes to write it. In production code you don't usually want to leave such a thing. Better to convert the return value inside the function.

In order

The in/pre/post order traversals are so intrinsically recursive that could be easily written as a one-liner in Python. I have force myself to be more readable, still the code is very compact and nice to see.
def in_order_r(node):
    if not node:
        return []
    return in_order_r(node.left) + [node.data] + in_order_r(node.right)
I go down to fetch the most to the left node, then up, completing the left branch before looking in the right one.

If we don't want to use recursion, we need to explicitly use a stack.
def in_order_i(node):
    if not node:
        return []

    cur = node
    result = []
    stack = []
    while True:
        if cur:
            stack.append(cur)
            cur = cur.left
        elif stack:
            cur = stack.pop()
            result.append(cur.data)
            cur = cur.right
        else:
            return result
We push the nodes in the stack, until we find a leftmost node, than we pop the last node we pushed to the stack and put its value in the result. Look for what is now the leftmost node, and so on, until we consume all the nodes.

Pre order

The recursive code is almost the same.
def pre_order_r(node):
    if not node:
        return []
    return [node.data] + pre_order_r(node.left) + pre_order_r(node.right)
In this case, first we check the left subtree, than the right one, and finally we get the one on top.

The iterative version asks again for an explicit stack where to store the nodes temporary set apart. The loop is even more natural than the in-order one.
def pre_order_i(node):
    if not node:
        return []

    result = []
    stack = [node]
    while stack:
        cur = stack.pop()
        result.append(cur.data)
        if cur.right:
            stack.append(cur.right)
        if cur.left:
            stack.append(cur.left)

    return result
Push the root on the stack before starting the loop, then consume the top of the stack and add its children.

Post order

Again, a very clean recursive implementation.
def post_order_r(node):
    if not node:
        return []
    return post_order_r(node.left) + post_order_r(node.right) + [node.data]
The iterative version is not so immediate. At least, I spent some time on it thinking how was better to proceed. The trick is looking to the pre-order implementation and change, seeing the structure similarity, and acknowledging the differences. At first sight, they look completely different. However, if you reverse the result of the post-order traversal, you could see that something is there. Beside reversing, we need also to consider the branch in different order. This means, I am pushing the result in a queue (actually a deque, the python Queue is not a good choice in this context) and not in a stack, and I reverse the order of pushing nodes in the stack.
def post_order_i(node):
    if not node:
        return []

    stack = [node]
    result = deque()
    while stack:
        cur = stack.pop()
        result.appendleft(cur.data)  # 1
        if cur.left:
            stack.append(cur.left)  # 2
        if cur.right:
            stack.append(cur.right)
    return result
1. Push to the left in the result, and not to the right as in pre-order.
2. First the left node and then the right one.

Level order

This is kind of different from the other traversals. The recursive implementation is less immediate, and it usually comes more natural to implement it in an iterative way.

Here too it comes handy using a queue (again, here a deque because of Python) but instead of the stack, not for the result. This is because we want to get the value list level by level, so the first node we push to the buffer should be the first one to be consumed.
def level_order_i(node):
    result = []
    buffer = deque([node])
    while buffer:
        cur = buffer.popleft()
        result.append(cur.data)
        if cur.left:
            buffer.append(cur.left)
        if cur.right:
            buffer.append(cur.right)
    return result
If we really want to implement it in a recursive way, it is probably a good idea to have an helper function, since we have to carry on all the nodes at the same level, to have a way to check the level below.

Let the user call it passing the root, as usual, then convert it to a list an call the helper function:
def level_order_r(node):
    return level_order_rs([node])
In the result list we store, as usual, the values of the nodes we have already found, then we populate a list with the children of the nodes on this level, and we pass them to a recursive call.
def level_order_rs(nodes):
    result = []
    below = []
    for node in nodes:
        result.append(node.data)
        if node.left:
            below.append(node.left)
        if node.right:
            below.append(node.right)
    if below:
        result += level_order_rs(below)
    return result
I pushed both the unit test and the python script to GitHub.

Go to the full post